Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Subgradient projectors : theory, extensions and algorithms Xu, Jia 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_May_Xu_Jia.pdf [ 2.7MB ]
Metadata
JSON: 24-1.0300354.json
JSON-LD: 24-1.0300354-ld.json
RDF/XML (Pretty): 24-1.0300354-rdf.xml
RDF/JSON: 24-1.0300354-rdf.json
Turtle: 24-1.0300354-turtle.txt
N-Triples: 24-1.0300354-rdf-ntriples.txt
Original Record: 24-1.0300354-source.json
Full Text
24-1.0300354-fulltext.txt
Citation
24-1.0300354.ris

Full Text

Subgradient Projectors: Theory,Extensions and AlgorithmsbyJia XuM.Sc., Xiamen University, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE COLLEGE OF GRADUATE STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)April 2016c Jia Xu, 2016The undersigned certify that they have read, and recommend to the College of Graduate Studies for acceptance, a thesis entitled:  Subgradient Projectors: Theory, Extensions and AlgorithmsSubmitted by                    Jia Xu                                      in partial fulfillment of the requirements of The degree of       PhD                                                                                  .Dr. Heinz Bauschke, Irving K. Barber School of Arts & Sciences, UBCOSupervisor, Professor (please print name and faculty/school above the line)Dr. Yves Lucet, Irving K. Barber School of Arts & Sciences, UBCOSupervisory Committee Member, Professor (please print name and faculty/school in the line above)Dr. Richard Klukas, School of Engineering, UBCOSupervisory Committee Member, Professor (please print name and faculty/school in the line above)Dr. Mohamed Tawhid, Mathematics, Thompson Rivers UniversityUniversity Examiner, Professor (please print name and faculty/school in the line above)External Examiner, Professor (please print name and university in the line above)April 22, 2016(Date submitted to Grad Studies)AbstractThe subgradient projector plays an important role in convex optimiza-tion, since the subgradient projection algorithm is a classical method forsolving convex feasibility problems. This motivates us to explore fundamen-tal properties of the subgradient projector. One can define the subgradientprojector of a convex function via a selection of the subdi↵erential opera-tor. This opens the door to define the subgradient projector of nonconvexfunctions by using the Mordukhovich limiting subdi↵erential operator.This thesis o↵ers a systematic study of the subgradient projector. First,motivated by Polyak and Crombez, we present and analyze a more generalalgorithm for finding a fixed point of a cutter which is assumed to have afixed point set with nonempty interior. Our results complement and extendconclusions by Crombez for cutters and by Polyak for subgradient projectors.We also present a comprehensive list of properties of the subgradientprojectors which complements work by Pauwels. Special attention is givento continuity, nonexpansiveness and monotonicity. Finally, for subgradientprojectors associated with nonconvex functions, we obtain various charac-terizations.iiiPrefaceThis thesis is based on two published papers [12, 13] and two preprintmanuscripts [10, 11]. The papers [12, 13] form the basis of Chapters 3 and4. Chapters 5 and 6 are based on manuscripts [10, 11], respectively. Theseare joint works with my supervisors, Dr. Heinz H. Bauschke and Dr. ShawnX. Wang, and also with Dr. Caifang Wang, a visiting scholar from ShanghaiMaritime University.For all co-authored papers, each author contributed equally.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary of Notation . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 1Chapter 2: Auxiliary Results . . . . . . . . . . . . . . . . . . . . 32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.1 Linear Bounded Operators . . . . . . . . . . . . . . . 82.3.2 Cutters . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Single-valued and Set-valued Monotone Operators . . 112.3.4 Nonexpansive Mappings . . . . . . . . . . . . . . . . . 112.4 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 Convex Functions . . . . . . . . . . . . . . . . . . . . 19Chapter 3: Finite Convergence Result of a Projected CutterMethod . . . . . . . . . . . . . . . . . . . . . . . . . . 25vTABLE OF CONTENTS3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Finitely Convergent Cutter Method . . . . . . . . . . . . . . . 323.4 Limiting Examples . . . . . . . . . . . . . . . . . . . . . . . . 353.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Chapter 4: On Subgradient Projectors . . . . . . . . . . . . . . 424.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5 Continuity of Gf vs Fre´chet di↵erentiability of f . . . . . . . 524.6 Continuity of Gf vs Gaˆteaux di↵erentiability of f . . . . . . . 554.7 Gf as an accelerated mapping . . . . . . . . . . . . . . . . . . 574.8 Nonexpansiveness . . . . . . . . . . . . . . . . . . . . . . . . . 594.9 The decreasing property . . . . . . . . . . . . . . . . . . . . . 624.10 The subgradient projector of f(x1, x2) = |x1|p + |x2|p . . . . . 664.11 Gf and the Yamagishi–Yamada operator . . . . . . . . . . . . 70Chapter 5: Characterizations of Subgradient Projectors . . . 765.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.2 Preliminary and definitions . . . . . . . . . . . . . . . . . . . 765.3 Calculus for subgradient projectors . . . . . . . . . . . . . . . 825.4 Basic properties of subgradient projectors . . . . . . . . . . . 885.4.1 When is a mapping T a subgradient projector? . . . . 905.4.2 Recovering f from its subgradient projector Gf . . . . 915.4.3 Fixed point closed property and continuity . . . . . . 935.5 When is the subgradient projector Gf a cutter? . . . . . . . . 955.6 Characterization of subgradient projectors of convex functions 100Chapter 6: Linear Subgradient Projectors . . . . . . . . . . . . 1126.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Characterizations of Gf when Gf is linear . . . . . . . . . . . 1126.3 A complete analysis of linear subgradient projectors on R2 . . 126Chapter 7: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 1387.1 Key Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387.2 Future Work and Open Questions . . . . . . . . . . . . . . . . 1397.2.1 Open Questions . . . . . . . . . . . . . . . . . . . . . . 1397.2.2 S-cutter . . . . . . . . . . . . . . . . . . . . . . . . . . 139viTABLE OF CONTENTS7.2.3 Analysis of Parameters . . . . . . . . . . . . . . . . . . 1407.2.4 Generalized Monotonicities . . . . . . . . . . . . . . . 1407.2.5 Recognition Convexity of f from Gf . . . . . . . . . . 140Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Appendix A: Maple Code . . . . . . . . . . . . . . . . . . . . . . . 148A.1 Code to verify Remark 3.24 when f(x) = x2  1 . . . . . . . . 148A.2 Code to verify Remark 3.24 when f(x) = 100x2  1 . . . . . . 152viiList of TablesTable 3.1 Performance of the algorithm for f(x) = x2  1. . . . . 40Table 3.2 Performance of the algorithm for f(x) = 100x2  1. . . 41viiiList of FiguresFigure 2.1 Examples of convex and nonconvex sets . . . . . . . . 15Figure 2.2 Metric projection. . . . . . . . . . . . . . . . . . . . . 16Figure 2.3 Illustration of polar cone. . . . . . . . . . . . . . . . . 17Figure 2.4 The graph of the ceiling function, an example of alower semicontinuous function. . . . . . . . . . . . . . 19Figure 5.1 Illustration for Example 5.12 . . . . . . . . . . . . . . 82Figure 5.2 Graph of f . . . . . . . . . . . . . . . . . . . . . . . . 95Figure 5.3 Illustration for Example 5.56 . . . . . . . . . . . . . . 107Figure 5.4 Illustration for Example 5.57 (1) . . . . . . . . . . . . 108Figure 5.5 Illustration for Example 5.57 (2) . . . . . . . . . . . . 109Figure 5.6 Illustration for Example 5.59. . . . . . . . . . . . . . . 111Figure 6.1 Graph of T defined in Example 6.2. . . . . . . . . . . 114Figure 6.2 Graph of function g. . . . . . . . . . . . . . . . . . . . 129Figure 6.3 Graph of function f . . . . . . . . . . . . . . . . . . . . 130Figure 6.4 Graph of function g. . . . . . . . . . . . . . . . . . . . 130Figure 6.5 Graph of function f . . . . . . . . . . . . . . . . . . . . 131Figure 6.6 Graph of function g . . . . . . . . . . . . . . . . . . . 131Figure 6.7 Graph of function f . . . . . . . . . . . . . . . . . . . 132Figure 6.8 Graph of function g . . . . . . . . . . . . . . . . . . . 132Figure 6.9 Graph of function f . . . . . . . . . . . . . . . . . . . 133Figure 6.10 Plots of g and f when a = 1, c = 1 and K = 1 inTheorem 6.13(iii). . . . . . . . . . . . . . . . . . . . . 134Figure 6.11 Graph of f defined in (6.138). . . . . . . . . . . . . . 135Figure 6.12 Graph of f defined in (6.140). . . . . . . . . . . . . . 136ixGlossary of NotationA : H1 ◆ H2 Set valued operator A from H1 to H2. 12A⇤ The adjoint of operator A. 8C? Orthogonal complement of set C. 19Gfx Subgradient projector of f at x. 44K The polar cone of set K. 18K The dual cone of set K. 18N✏(x) A neighborhood of x. 16PCx The metric projection of x onto set C. 17RC Reflection of PC , RC = 2PC  Id. 32U| The transpose of matrix U . 121Ur,⌘ Relaxation of Ur. 30V A real vector space. 4FixT Fixed points of T , FixT = {x 2 H : x = Tx}.9Gfx The set of subgradient projectors associatedwith selection s. 48ball(x, ✏) A closed ball centre at x with radius ✏. 16coneC Conical hull of a set C. 18convC The convex hull of a set C. 16domA The domain of A. 20graA The graph of A. 12◆C Indicator function of a set C. 21h·, ·i Inner product. 4N Strictly positive integers, 1,2,3,· · · . 7Rn n-dimentional Euclidean space. 7R+ Nonnegative real numbers. 19R++ Strictly positive real numbers. 17, 23R Real numbers. 4B(H1,H2) The set of linear bounded mappings from H1to H2. 23H A real Hilbert space. 5xGlossary of Notationrf(x) The Gaˆteaux gradient of f at x. 23, 26@f Subdi↵erential of f . 25Proxf The proximal mapping of a function f . 23recC Recession cone of C. 64sgn(x) signum function. 30f Moreau envelope of index  of a function f .22dC Distance function to a set C. 21f⇤g Infimal convolution of the functions f and g.21f+ Positive part of a function f . 46xn * x Weak convergence. 6xn ! x Strong convergence. 7|| · ||⇤ The Fenchel conjugate of norm. 51||x|| The norm of x. 5xiAcknowledgementsFirst, I would like to express my heartfelt thanks to my supervisors Dr.Heinz Bauschke and Dr. Shawn Wang, for their careful guidance throughoutmy thesis. Their enthusiasm and passion have deeply a↵ected me in Mathe-matics. Without their help, I could not have completed my thesis. It is mygreat pleasure to work with them.Also, thanks to my committee member Dr. Yves Lucet for providingmany valuable suggestions. I also thank the other members of the exami-nation committee, Dr. Mohamed Tawhid and Dr. Richard Klukas, for theircomments, and Dr. Susan Murch for chairing the defense.Finally, I want to express my gratitude to my colleagues Chayne Planiden,Walaa Moursi, Yipin Guo, Sarah Mo↵at and Dongying Wang who supportedme during my study. They helped me a lot.xiiDedicationFor William Lee.xiiiChapter 1IntroductionThe study of subgradient projectors and their properties is motivated bya large number of problems, for example, convex feasibility problems. Theconvex feasibility problem is to find a point x in the intersection of finitelymany convex subsets Ci of a Hilbert space H. A lot of algorithms have beencreated to solve this problem by making use of subgradient projectors. Themost famous instance of such algorithms is Polyak’s subgradient projectionalgorithm (see Theorem 3.1). Therefore, subgradient projectors are benefi-cial to the study of the convex feasibility problem and we are encouraged toobtain a more complete understanding of the subgradient projector.On the other hand, we can formulate the convex feasibility problem as acommon fixed point problem, where each convex subset Ci = FixUi for someoperator Ui. If we are dealing with only one operator U , then our resultsgeneralize works by Polyak [50] and by Crombez [31].The majority of the theoretical background can be found in Bauschkeand Combettes’ Convex Analysis and Monotone Operator Theory in HilbertSpaces [7]; Cegielski’s Iterative Methods for Fixed Point Problems in HilbertSpaces [20]; Mordukhovich’s Variational Analysis and Generalized Di↵eren-tiation I [40]; and Rockafeller and Wets’ Variational Analysis [53].The rest of this thesis is organized as follows.Chapter 2 gives basic background information on operators, convex anal-ysis, and some related known results. Chapters 3-6 contain new results.My contribution begins in Chapter 3, with a new method for finding afixed point of a cutter. Related properties are also identified. The material inthis chapter is based on [13], which appeared in the Journal of OptimizationTheory and Applications.Chapter 4 provides fundamental properties of the subgradient projector,such as continuity, nonexpansiveness and monotonicity. The Yamagishi-Yamada operator is also discussed. Results in this chapter can be found in[12], which appeared in the SIAM Journal on Optimization.Chapter 5 is based on the manuscript [10]. In the previous chapter, weconsider the subgradient projectors associated with convex functions. Com-plementarily for nonconvex functions, the associated subgradient projectors1Chapter 1. Introductionalso exist. In this part, we provide the basic theory of subgradient projectorsfor possibly nonconvex functions.Chapter 6 mainly focuses on charaterizations of linear subgradient pro-jectors. We explicitly o↵er the closed expression of the subgradient pro-jector, which itself is linear and symmetric. This chapter is based on themanuscript [11].Finally, the key results of my thesis and directions for future researchare presented in Chapter 7.2Chapter 2Auxiliary Results2.1 OverviewThroughout this thesis, we work with Hilbert spaces and Euclideanspaces. This chapter contains some basic mathematical definitions and alsoprovides various examples and results, mostly from Convex Analysis andMonotone Operator Theory [7].2.2 Vector SpacesDefinition 2.1. A real vector space (also called linear space) is a set V ofvectors together with two operations: vector addition + : V ⇥ V ! V andscalar multiplication · : R⇥ V ! V with the following properties:(1) vector addition satisfies:(i) (commutative law) 8u, v 2 V , then u+ v = v + u;(ii) (associative law) 8u, v, w 2 V , then (u+ v) + w = u+ (v + w);(iii) (additive identity) there exists a vector 0 2 V such that 8u 2 V ,0 + u = u+ 0 = u;(iv) (additive inverse) for 8u 2 V , there is a vector u 2 V such thatu+ (u) = 0;(2) scalar multiplication satisfies:(i) (distributive law for vectors) 8c 2 R, 8u, v 2 V , then c · (u + v) =c · u+ c · v;(ii) (distributive law for real numbers) 8a, b 2 R, 8u 2 V , then (a+b) ·u =a · u+ b · u;(iii) (associative law for real numbers) 8a, b 2 R, 8u 2 V , then a · (b · u) =(a · b) · u;32.2. Vector Spaces(iv) (unitary law) there is a vector 1 2 V such that 8u 2 V , 1 · u = u.Definition 2.2. [54, Definition 13.1] Let S be a set and suppose d : S⇥S ![0,1) is a function on S satisfying(i) d(x, x) = 0 for all x 2 S and d(x, y) > 0 for distinct x, y 2 S;(ii) d(x, y) = d(y, x) for all x, y 2 S;(iii) d(x, z)  d(x, y) + d(y, z) for all x, y, z 2 S.Such a function d is called a metric on S. A metric space, denoted as (S, d),is a set S together with a metric d on it.Definition 2.3. [36] An inner product on a real vector space V is a functionon V ⇥ V ! R, (x, y)! hx, yi with the following properties:(i) (8x, y, z 2 V ) hx+ y, zi = hx, zi+ hy, zi;(ii) (8a 2 R, 8x, z 2 V ) hax, zi = ahx, zi;(iii) (8x, y 2 V ) hx, yi = hy, xi, 8x, y 2 V ;(iv) (8x 2 V ) hx, xi  0, and hx, xi = 0, x = 0.An inner product space, denoted as (V, h·, ·i), is a vector space with an innerproduct defined on it.Definition 2.4. [36, Definition 2.2-1] A norm on a real vector space V isa real-valued function on V whose value at x 2 V is denoted by kxk andwhich has the following properties(i) (8x 2 V ) kxk  0. kxk = 0 if and only if x = 0;(ii) (8a 2 R, 8x 2 V ) kaxk = |a|kxk;(iii) (8x, y 2 V ) kx+ yk  kxk+ kyk.The normed space, denoted as (V, k·k), is a vector space with a norm definedon it.An inner product on V defines a norm on V which is given by(8x 2 V ) kxk =phx, xi (2.1)A norm on V defines a metric d on V which is given by(8x, y 2 V ) d(x, y) = kx yk. (2.2)Remark 2.5. [36] A norm on an inner product space given by (2.1) satisfiesthe important parallelogram equality1. If a norm does not satisfy the paral-1Parallelogram equality: kx+ yk2 + kx yk2 = 2(kxk2 + kyk2)42.2. Vector Spaceslelogram equality, then it is not derived from an inner product by the use of(2.1).Definition 2.6. [7] A sequence (xn)1n=1 in an inner product space V is aCauchy sequence if 8✏ > 0, there exists N 2 N, when n,m > N , we haved(xm, xn) < ✏. (2.3)An inner product space V is complete if every Cauchy sequence in Vconverges to a point in V .Definition 2.7. [32, Definition 7.7.2] A real Hilbert space, H, is a completeinner product space.Example 2.8. [36] The space of square summable sequences,`2 = {(xn)1n=1 :1Xn=1|xn|2 <1}, (2.4)with inner product hx, yi =P1n=1 xnyn is a Hilbert space.In Hilbert space H, the following is an important inequality we will usefrequently.Fact 2.9. (Cauchy-Schwarz inequality) Let x, y 2 H. Then|hx, yi|  kxkkyk. (2.5)In a real Hilbert space H, a ball with centre x 2 H and radius ✏ > 0 isgiven byball(x, ✏) = {y 2 H : ky  xk  ✏}. (2.6)Definition 2.10. [36] We say a sequence (xn)1n=1 inH is bounded if (xn)1n=1 ⇢ball(x0, r) for some x0 2 H and r > 0.Definition 2.11. [36, Definition 1.4-1] A sequence (xn)1n=1 in a real Hilbertspace H is said to be convergent if there is an x 2 H, for 8✏ > 0, there isN 2 N, when n > N , thenkxn  xk < ✏. (2.7)x is called the limit of (xn)1n=1 and we write xn ! x.52.2. Vector SpacesDefinition 2.12. (Weak convergence)[20] Let x 2 H and (xn)1n=1 be asequence in H. We say that (xn)1n=1 converges weakly to x, denoted asxn * x, if(8z 2 H) hxn, zi ! hx, zi. (2.8)The following result is well known (see e.g. [36, Theorem 4.8-4(a)]); wegive its proof for completeness.Fact 2.13. Every convergent sequence in H is weakly convergent.Proof. Suppose (xn)1n=1 is a convergent sequence in H and x is its limit. Forevery z 2 H, by the Cauchy Schwarz inequality, we have0  |hxn, zi  hx, zi| = |hxn  x, zi|  kxn  xk|kzk ! 0. (2.9)So (xn)1n=1 is weakly convergent to x.The converse of Fact 2.13 is not true.Example 2.14. Consider (en)1n=1: e1 = (1, 0, 0, 0, · · · ), e2 = (0, 1, 0, 0, · · · ),e3 = (0, 0, 1, 0, · · · ), · · · , the sequence of standard unit vectors in `2. 8x 2 `2,by Bessel inequality2, we have1Xn=1|hen, xi|2  kxk2. (2.10)Thus the seriesP1n=1 |hen, xi|2 is convergent. Then its corresponding se-quence (|hen, xi|2)1n=1 converges to 0, which implieshen, xi ! h0, xi = 0 (8x 2 `2). (2.11)Therefore, (en)1n=1 converges weakly to 0. Since ken  0k = kenk = 1, wededuce that en does not converge to 0.The following result is well known (see e.g., [36, Theorem 4.8-4(c)]); wegive its proof for completeness.Fact 2.15. A weakly convergent sequence of a finite dimensional space isconvergent.2Bessel inequality: Let (en)n2N be the standard orthonormal basis of H, (8x 2 H), wehavePhx, eni2  kxk2.62.2. Vector SpacesProof. Let (xn)1n=1 2 Rm, where xn = (xn(1), xn(2), · · · , xn(m)), convergesweakly to x 2 Rm, by definition of weak convergence, choosing z = ei,i 2 {1, 2, · · · ,m}, a standard unit vector in Rm, then we havehxn, eii ! hx, eii, (2.12)which is equivalent to xn(i)! x(i). Meanwhile,xn(i)! x(i)) |xn(i) x(i)|! 0, (2.13)) |xn(i) x(i)|2 ! 0, (2.14))mXi=1|xn(i) x(i)|2 ! 0, (2.15)) kxn  xk2 ! 0, (2.16)) kxn  xk ! 0, (2.17)) xn ! x. (2.18)By the Uniform Boundedness Principle, we will see that a weakly con-vergent sequence in a real Hilbert space is bounded. See Fact 2.24 below.The following powerful notion of the Feje´r monotone sequence is centralin the study of various iterative methods.Definition 2.16. A sequence (xn)1n=1 in H is called Feje´r monotone withrespect to a nonempty subset S of H if and only if for 8s 2 S, 8n 2 N,kxn+1  sk  kxn  sk. (2.19)Example 2.17. [7, Example 5.2] Let’s assume that (xn)1n=1 in R is boundedand monotonically increasing, i.e.,(8n 2 N) xn  xn+1  · · ·   (2.20)for some constant  > 0. Then sequence (xn)1n=1 is Feje´r monotone withrespect to C ✓ [,+1).Clearly, every Feje´r monotone sequence is bounded. Actually, if (xn)1n=1is a Feje´r monotone sequence with respect to a nonempty set C, then 8c 2 C,kxn+1  ck  kxn  ck  · · ·  kx1  ck. (2.21)That is to say, (xn)1n=1 lies in ball(c, kx1  ck).72.3. Operators2.3 Operators2.3.1 Linear Bounded OperatorsDefinition 2.18. [36, Definition 2.6-1] A linear operator T is an operatorsuch that(i) the domain domT of T is a real vector space and the range ranT ofT lies in a real vector space;(ii) for all x, y 2 domT and a, b 2 R,T (ax+ by) = aTx+ bTy. (2.22)Definition 2.19. [36, Definition 2.7-1] LetH1 andH2 be real Hilbert spacesand T : domT ! H2 a linear operator, where domT ⇢ H1. The operator Tis said to be bounded if there is a real number c such that for all x 2 domT ,kTxk  ckxk. (2.23)We denote B(H1,H2) the set of all linear bounded operators from H1 to H2.Definition 2.20. [36, page 92] The norm of operator T is defined as:kTk = supkxk1kTxk. (2.24)Definition 2.21. [7] Let T 2 B(H1,H2), the adjoint of T is the uniqueoperator T ⇤ 2 B(H1,H2) that satisfies(8x 2 H1)(8y 2 H2) hTx, yi = hx, T ⇤yi. (2.25)Definition 2.22. [20, Definition 1.1.19] We say that a bounded linear op-erator A : H! H is self-adjoint if A⇤ = A.Theorem 2.23. (Uniform Boundedness Principle) [36, Theorem 4.7-3]Let (Tn) be a sequence of bounded linear operators Tn : X ! Y from aBanach space X into a normed space Y such that (kTnxk) is bounded forevery x 2 X, say kTnxk  cx, where cx is a real number depends on x. Thenthe sequence of the operator norms kTnk is bounded, that is 9c > 0 such thatkTnk  c.Now let us prove that a weakly convergent sequence in a real Hilbertspace H is bounded.82.3. OperatorsFact 2.24. Every weakly convergent sequence in a real Hilbert space H isbounded.Proof. Let xn * x. Then for every z 2 H, we have hxn, zi ! hx, zi. Sowe have |hxn, zi| ! |hx, zi|. This implies (|hxn, zi|)n2N is bounded. DefineTn : H ! R : y 7! hxn, yi. Then (Tn)1n=1 is pointwise bounded. Bythe Uniform Boundness Principle, we deduce that (kTnk)1n=1 is bounded.Finally, we show that kxnk = kTnk. Indeed,kTnk = supkzk1|hxn, zi|  supkzk1kxnkkzk = kxnk. (2.26)If xn = 0, then kxnk = 0 and so kTnk = 0. If xn 6= 0, we havekTnk hxn, xnkxnki = kxnk2kxnk = kxnk. (2.27)Altogether, we have kTnk = kxnk.2.3.2 CuttersDefinition 2.25. [20, Definition 2.1.30] We call T : H! H a cutter if andonly if FixT 6= ? and for 8x 2 H, 8y 2 FixT ,hy  Tx, x Txi  0. (2.28)Example 2.26. (Subgradient projector) Let f : H ! R be convex andcontinuous such that {x 2 H : f(x)  0} 6= ?, and let s : H ! H bea selection of @f (for definition of subgradient, see Definition 2.66), i.e.,(8x 2 H) s(x) 2 @f(x). Then the associated subgradient projector, definedby(8x 2 H) Gfx =8<:xf(x)ks(x)k2 s(x), if f(x) > 0;x, otherwise,(2.29)is a cutter. For details, see Fact 4.1 (iv).The following fact says that a cutter is strongly Feje´r monotone3 withrespect to the set of its fixed points, .3(see [20, page 108]) We say an operator T : H ! H is strongly Feje´r monotone withrespect to the set of its fixed points if there exists a constant ↵ > 0 such thatkTx zk2  kx zk2  ↵kTx xk2,for all z 2 FixT .92.3. OperatorsFact 2.27. Suppose that T : H! H has a fixed point.(i) A mapping T : H! H is a cutter if and only if(8x 2 H)(y 2 FixT ) kx Txk2  kx yk2  kTx yk2.(ii) Let T : H! H be a cutter. Then T is always continuous on FixT .(iii) Let T : H! H be a cutter. Then FixT is closed and convex.Proof. (i) Let x 2 H, 8y 2 FixT .kx yk2 = k(x Tx) + (Tx y)k2= kx Txk2 + ky  Txk2 + 2hx Tx, Tx yi.Thus we can conclude thatkx Txk2 + ky Txk2  kx yk2 () hy Tx, x Txi  0. (2.30)(ii) Suppose that T : H! H is a cutter. Let y 2 FixT . From (i) we havekTx Tyk2 = kTx yk2  kx yk2 kx Txk2  kx yk2. (2.31)Now 8" > 0, let  = ". ThenkTx Tyk  ". (2.32)when kx yk  . Therefore T is continuous on FixT .(iii) Suppose that T : H! H is a cutter. By [20, Lemma 2.1.36], we haveFixT =\x2H{y 2 H : hy  Tx, x Txi  0}. (2.33)Denote the right hand side of (2.33) asS :=\x2H{y 2 H : hx Tx, y  Txi  0}. (2.34)First we show S ✓ FixT . Indeed, let y 2 S, i.e., hx Tx, y  Txi  0for all x 2 H. If we take x = y, we get ky  Tyk2  0 and hence,y = Ty, i.e., y 2 FixT . Conversely, let y 2 FixT . Since T is a cutter,we have hxTx, yTxi  0. Then y 2 S. Hence we have FixT = S.Finally, we observe that S is the intersection of a family of sets thatare closed and convex, so is FixT .102.3. Operators2.3.3 Single-valued and Set-valued Monotone OperatorsDefinition 2.28. [20, Definition 1.1.28] We say that an operator T : H! His monotone if(8x, y 2 H) hTx Ty, x yi  0. (2.35)The above definition is just for a single-valued operator. We now extendthis to multivalued or set-valued operators. Let A : H ◆ H i.e., (8x 2 H),Ax ✓ H be a set-valued operator. The graph of A is defined bygraA = {(x, x⇤) 2 H⇥H : x⇤ 2 Ax}. (2.36)Definition 2.29. [7] A : H◆ H is monotone if 8(x, x⇤) (y, y⇤) 2 graA,hx y, x⇤  y⇤i  0. (2.37)Let’s introduce the definition of maximally monotone operator.Definition 2.30. [7] Let A : H ◆ H be monotone. Then A is maximallymonotone if B : H◆ H is monotone and graA ✓ graB ) A = B.2.3.4 Nonexpansive MappingsDefinition 2.31. Let T : H! H. Then T is(i) firmly nonexpansive if 8x, y 2 HkTx Tyk2 + k(IdT )x (IdT )yk2  kx yk2. (2.38)(ii) nonexpansive if 8x, y 2 H,kTx Tyk  kx yk. (2.39)(iii) quasi-nonexpansive if 8x 2 H, 8y 2 FixT ,kTx Tyk  kx yk. (2.40)We record some properties of nonexpansive maps that are most useful forus. It is clear that firm nonexpansiveness implies nonexpansiveness, whichitself implies quasinonexpansiveness. However, the converse is not true. Seefollowing results and counterexamples.Theorem 2.32. [20, Theorem 2.2.4] A firmly nonexpansive operator T :H! H is monotone and nonexpansive.112.3. OperatorsProof. Let T be firmly nonexpansive. ThenkTx Tyk2 + k(IdT )x (IdT )yk2  kx yk2. (2.41)k(IdT )x (IdT )yk2 = k(x y) (Tx Ty)k2 (2.42)= kx yk2 + kTx Tyk2  2hx y, Tx Tyi.(2.43)Then we haveT is firmly nonexpansive, kTx Tyk2  hx y, Tx Tyi (2.44)By the Cauchy-Schwarz inequality, we havekTx Tyk · kx yk  hTx Ty, x yi  kTx Tyk2  0, (2.45)for all x, y 2 H, which yields the monotonicity and the nonexpansivity ofT .Actually, firmly nonexpansiveness is equivalent to monotonicity and non-expansiveness on the real line. Indeed, necessity is obvious from the abovetheorem. For suciency, let T : R ! R be a monotone and nonexpansiveoperator. Then we have(8x, y 2 R) (x y)(Tx Ty)  0 and |Tx Ty|  |x y|. (2.46)Then|Tx Ty|2 = |Tx Ty||Tx Ty| (2.47) |x y||Tx Ty| (2.48)= |(x y)(Tx Ty)| (2.49)= (x y)(Tx Ty). (2.50)Thus, a monotone nonexpansive operator on the real line is firmly nonex-pansive. However, if not on the real line, nonexpansivenss and monotonicitywill not imply firmly nonexpansiveness. See Example 2.35.Lemma 2.33. [20, Lemma 2.1.20] A nonexpansive operator U : H ! Hwith a fixed point is quasi-nonexpansive.The converse of Lemma 2.33 is not true.122.3. OperatorsExample 2.34. [20] Let T : R! R,Tx =(x2, if |x|  34 ,|x| 316 , if |x| > 34 .(2.51)Then T is a continuous quasi-nonexpansive operator, but not nonexpansive.Proof. (i) T is continuous at 34 as limx!( 34 )+ Tx = limx!( 34 )+ x316 =916 ,limx!( 34 ) Tx = limx!( 34 ) x2 = 916 and T (34) =916 . Similarly, we haveT is continuous at 34 . Then T is continuous.(ii) T is quasi-nonexpansive. It is easy to calculate that FixT = {0}.If |x|  34 < 1, then|0 Tx| = |Tx| = |x2|  |x| = |0 x|. (2.52)If |x| > 34 > 316 , then|0 Tx| =|x| 316 = |x| 316 < |x| = |0 x|. (2.53)So T is quasi-nonexpansive.(iii) T is not nonexpansive when x = 12 , y = 1. Indeed,|Tx Ty| =14  (1 316) = 916 >12  1 = |x y|. (2.54)Therefore T is a continuous quasi-nonexpansive operator but not non-expansive.The converse of Theorem 2.32 does not hold.Example 2.35. [20] Let T : R2 ! R2,Tx = (x1 cos ✓  x2 sin ✓, x1 sin ✓ + x2 cos ✓) (2.55)is nonexpansive and monotone for ✓ 2 (0, ⇡2 ], but T is not firmly nonexpan-sive.Proof. Let x = (x1, x2), y = (y1, y2) 2 R2, then132.3. Operators(i) T is monotone. SinceTx Ty (2.56)= ((x1  y1) cos ✓  (x2  y2) sin ✓, (x1  y1) sin ✓ + (x2  y2) cos ✓) .(2.57)Therefore we havehx y, Tx Tyi = (x1  y1)2 cos ✓  (x1  y1)(x2  y2) sin ✓ (2.58)+ (x1  y1)(x2  y2) sin ✓ + (x2  y2)2 cos ✓ (2.59)=(x1  y1)2 + (x2  y2)2cos ✓. (2.60)Since ✓ 2 (0, ⇡2 ], then cos ✓ > 0, so hx y, Tx Tyi  0.(ii) T is nonexpansive. Indeed,kTx Tyk2 (2.61)= ((x1  y1) cos ✓  (x2  y2) sin ✓)2 (2.62)+ ((x1  y1) sin ✓ + (x2  y2) cos ✓)2 (2.63)=(x1  y1)2(cos ✓)2 + (x2  y2)2(sin ✓)2 (2.64)2(x1  y1)(x2  y2) sin ✓ cos ✓ (2.65)+(x1  y1)2(sin ✓)2 + (x2  y2)2(cos ✓)2 (2.66)+2(x1  y1)(x2  y2) sin ✓ cos ✓ (2.67)=(x1  y1)2 + (x2  y2)2 (2.68)=kx yk2. (2.69)(iii) T is not firmly nonexpansive.From (i), we havehx y, Tx Tyi = (x1  y1)2 + (x2  y2)2 cos ✓ = cos ✓kx yk2.(2.70)From (ii), we have kTx Tyk2 = kx yk2.Then kTx Tyk2 > hx y, Tx Tyi for 8✓ 2 (0, ⇡2 ] and distinct x, y.142.4. Convex Analysis2.4 Convex Analysis2.4.1 Convex SetsDefinition 2.36. A subset C ofH is convex if for all x, y 2 C and  2 (0, 1),x+ (1 )y 2 C. (2.71)(a) convex set, (b) nonconvex set.Figure 2.1: Examples of convex and nonconvex setsFor example, a ball is convex. See Fig. 2.1 (a)Definition 2.37. [7, Definition 3.3] Let C ⇢ H. The convex hull of C is theintersection of all the convex subsets of H containing C, i.e., the smallestconvex subset of H containing C. It is denoted by convC.Proposition 2.38. [7, Proposition 3.4] Let C ⇢ H. ThenconvC =(Xi2I↵ixi : I finite, (xi)i2I ⇢ C, (↵i)i2I ⇢ ]0, 1] ,Xi2I↵i = 1).(2.72)A neighborhood of x 2 H, is an open ball of radius ✏, which is defined asN✏(x) = {y 2 H : ky  xk < ✏}. (2.73)The interior of C, is the set of all interior points of C, denoted as intC,intC = {x 2 H : 9✏ > 0, N✏(x) ✓ C}, (2.74)and is also the largest open set contained in C.152.4. Convex AnalysisDefinition 2.39. [20] Let C ✓ H be a nonempty subset and x 2 H. Ifthere exists a point y 2 C such thatky  xk  kz  xk (2.75)for any z 2 C, then y is called a metric projection of x onto C and is denotedby PCx (see Fig. 2.2).Figure 2.2: Metric projection.Fact 2.40. (Projection characterization) [7, Theorem 3.14] Let C be anonempty closed convex subset of H. Then for every x and p in H,p = PCx if and only if (8z 2 C) hz  p, x pi  0 and p 2 C.Definition 2.41. [7] Let C ✓ H. We say C is a cone ifC = R++C. (2.76)R++ := {⇠ 2 R : ⇠ > 0}.Definition 2.42. [7, Definition 6.1] Let C be a subset of H. The conicalhull of C is the intersection of all the cones in H containing C, denoted byconeC.162.4. Convex AnalysisActually, from [7, Proposition 6.2], we have coneC = R++C.Definition 2.43. [7, Definition 6.21] Let K be a subset of H. The polarcone of K isK = {u 2 H : suphK,ui  0}, (2.77)and the dual cone of K is K = K . If K is a nonempty convex cone,then K is self-dual if K = K. If K is a nonempty closed convex cone inH, K is acute if K ⇢ K, and K is obtuse if K ⇢ K.Figure 2.3: Illustration of polar cone.Example 2.44. Let H = R and K = R+,K = {u 2 H : suphK,ui  0} (2.78)= {u 2 R : y · u  0, 8y 2 R+} (2.79)= R (2.80)= K. (2.81)By definitions above, we haveK = K = K. (2.82)Since K = R+ is a nonempty closed convex cone, then K = R+ is a acutecone and an obtuse cone as well.172.4. Convex AnalysisDefinition 2.45. [7] The orthogonal complement of a subset C of H isdenoted by C?, i.e.,C? = {u 2 H : hx, ui = 0, 8x 2 H}. (2.83)Proposition 2.46. [7, Propostion 6.22] Let C be a linear subspace of H.Then C = C?.Definition 2.47. [7] Let D be a nonempty convex subset of X. The reces-sion cone of D isrecD = {x 2 X : x+D ⇢ D}. (2.84)Proposition 2.48. [7, Corollary 6.49] Let C be a nonempty closed convexcone in H. Then recC = C.Proposition 2.49. Let T : H! H be a cutter. Thenran(IdT ) ✓ (rec(FixT )) . (2.85)Consequently, when FixT is a linear subspace,ran(IdT ) ✓ (FixT )?. (2.86)In other words,ran(IdT ) ✓ (ker(IdT ))?. (2.87)Proof. Let x Tx 2 ran(IdT ) and v 2 rec(FixT ). Then for every k > 0,u 2 FixT , u+ kv 2 FixT . We havehx Tx, u+ kv  Txi  0 ) hx Tx, u/k + v  Tx/ki  0. (2.88)When k !1 this giveshx Tx, vi  0. (2.89)Since v 2 rec(FixT ) was arbitrary, x Tx 2 (rec(FixT )) .When FixT is a linear subspace, FixT = rec(FixT ) and (rec(FixT )) =(rec(FixT ))?.182.4. Convex Analysis2.4.2 Convex FunctionsLet f : H ! [1,+1]. The function is proper if 1 /2 f(H) anddom f 6= ?.Definition 2.50. A function f : H! ]1,+1] is said to be convex if itsdomain, dom f = {x 2 H : f(x) < +1} is a convex set and 8x, y 2 H, 2(0, 1),f(x+ (1 )y)  f(x) + (1 )f(y). (2.90)Definition 2.51. Let f : H ! ]1,+1] be a proper function. Then f isstrictly convex if (8x 2 dom f)(8y 2 dom f)(8 2 ]0, 1[),x 6= y ) f(x+ (1 )y) < f(x) + (1 )f(y). (2.91)Definition 2.52. A function f is lower semi-continuous if for every se-quence (xn)1n=1 in H,xn ! x) f(x)  lim infn!1 f(xn). (2.92)In [54], the limit inferior of sequence (sn) in R, is defined bylim inf sn = limN!1inf{sn : n > N}. (2.93)The limit superior is defined bylim sup sn = limN!1sup{sn : n > N}. (2.94)Example 2.53. The ceiling function f(x) = dxe = min{n 2 Z : n  x}(see Fig. 2.4) is lower semicontinuous.Figure 2.4: The graph of the ceiling function, an example of a lower semi-continuous function.192.4. Convex AnalysisIn the following, we define the infimal convolution operator. See [7] formore details.Definition 2.54. Let f and g be functions from H to ]1,+1]. Theinfimal convolution of f and g isf⇤g : H! ]1,+1] : x 7! infy2H(f(y) + g(x y)). (2.95)This operator gives us an alternative way to express certain functions,for example the distance function.The indicator function of a set C ⇢ H, is the function defined by◆C : H! [1,+1] : x 7!(0, if x 2 C;+1, otherwise. (2.96)Fact 2.55. The distance from x 2 H to a set C ✓ H is defined bydC(x) = infy2Ckx yk. (2.97)Then we have dC(x) = (◆C⇤k · k)(x).Proof. For x 2 H, we have(◆C⇤k · k)(x) = infy2H(◆C(y) + kx yk) (2.98)= infy2C(◆C(y) + kx yk) (◆C(y) = +1 for y /2 C) (2.99)= infy2Ckx yk (◆C(y) = 0 for y 2 C) (2.100)= dC(x). (2.101)Now let’s concentrate on a special case of infimal convolution, the Moreauenvelope, as it is from there that the proximal mapping is defined.Definition 2.56. [7, Definition 12.20] Let f : H ! ]1,+1] and let 2 R++. The Moreau envelope of f of parameter  isf = f⇤( 12k · k2). (2.102)202.4. Convex AnalysisThe following is a simple example using the distance and the indicatorfunctions.Example 2.57. [7, Example 12.21] Let C ⇢ H and let  2 R++. Then◆C =12d2C .Proof. For x 2 H, we have◆C(x) = infy2H✓◆C(y) +12kx yk2◆(2.103)= infy2C✓◆C(y) +12kx yk2◆(◆C(y) = +1 for y /2 C) (2.104)= infy2C12kx yk2 (◆C(y) = 0 for y 2 C) (2.105)=12d2C(x). (2.106)Actually, the infimal convolution of a convex function with a power ofthe norm has some particular porperties.Remark 2.58. [7, Proposition 12.15] Let f be a proper lower semicontinuousconvex function on H, let  2 R++ and let p 2 ]1,+1[. Then the infimalconvolutionf⇤ 1pk · kp : H! ]1,+1] : x 7! infy2Hf(y) +1pkx ykp (2.107)is convex, real-valued, continuous and exact4. Moreover, for every x 2 H,the infimum in (2.107) is uniquely attained. Indeed, since 1pk · kp is striclyconvex and supercoercive5, then by [7, Corollary 11.15 (i)], we have shownthat the infimum is uniquely attained.In the case p = 2, Remark 2.58 motivated the following definition.Definition 2.59. [7, Definition 12.23] Let f be a proper lower semicontin-uous convex function on H and let x 2 H. Then Proxf x is the unique point4The infimal convolution is exact at a point x 2 H if (f⇤g)(x) = miny2H f(y)+g(xy),i.e. there exists a point y 2 H such that (f⇤g)(x) = f(y) + g(x y).5Let f : H! [1,+1]. Then f is supercoercive if limkxk!+1 f(x)kxk = +1.212.4. Convex Analysisin H that satisfies1f(x) = miny2H✓f(y) +12kx yk2◆= f(Proxf x)+12kxProxf xk2. (2.108)The operator Proxf : H! H is the proximal mapping of f .Definition 2.60. [7] (Gaˆteaux di↵erentiability) Let f : C ! R, with C ✓H, and let x 2 C be such that (8y 2 H)(9↵ 2 R++) [x, x + ↵y] ✓ C.We say that f is Gaˆteaux di↵erentiable at x if there exists an operatorDf(x) 2 B(H,H), called the Gaˆteaux derivative of f at x, such that(8y 2 H) lim↵#0f(x+ ↵y) f(x)↵= Df(x)y. (2.109)The Gaˆteaux derivative Df(x) in Definition 2.60 is unique whenever itexists. Moreover, since Df(x) is linear, for every y 2 H, we have Df(x)y =Df(x)(y), and we can therefore replace (2.109) by(8y 2 H) lim0 6=↵!0f(x+ ↵y) f(x)↵= Df(x)y. (2.110)Remark 2.61. Let C be a subset of H, let f : C ! R, and suppose that fis Gaˆteaux di↵erentiable at x 2 C. Then by Riesz-Fre´chet representation6,there exists a unique vector rf(x) 2 H such that(8y 2 H) Df(x)y = hy,rf(x)i. (2.111)We call rf(x) the Gaˆteaux gradient of f at x.Definition 2.62. [7] (Fre´chet di↵erentiability) Let x 2 H and let f : U !]1,+1] where U is a neighborhood of x. Then f is Fre´chet di↵eren-tiable at x if there exists an operator Df(x) 2 B(H,H), called the Fre´chetderivative of f at x such thatlim0 6=kyk!0kf(x+ y) f(x)Df(x)ykkyk = 0. (2.112)The Fre´chet gradient of a function f : U ! R at x 2 U is defined as inRemark 2.61.6[7, Fact 2.17] Riesz-Fre´chet representation: Let g 2 B(H,R). Then there exists aunique vector u 2 H such that (8x 2 H) g(x) = hx, ui. Moreover, kgk = kuk.222.4. Convex AnalysisFact 2.63. [20, Theorem 1.1.13, Theorem 1.1.14](i) If f is Fre´chet di↵erentiable at x 2 H, then it is Gaˆteaux di↵erentiableat x.(ii) If f is Gaˆteaux di↵erentiable in a neighborhood of x and its Gaˆteauxderivative is continuous (strong-to-strong) at x, then f is Fre´chet dif-ferentiable at x.Remark 2.64. [16] For locally Lipschitz functions on Rn, Gaˆteaux and Fre´chetdi↵erentiability coincide.Example 2.65. [57, Example 1.1.2] Consider the functionf : R2 ! R : (x1, x2) 7!8<:x1x32x21 + x42, if (x1, x2) 6= 0;0, otherwise.(2.113)Then f is continuous and Gaˆteaux di↵erentiable at (0, 0), but not Fre´chetdi↵erentiable at (0, 0).Proof. Let x = (x1, x2). Observe that f(x) = 0 if x1x2 = 0. Now assumethat x1 6= 0 and that x2 6= 0. Since0  (|x1| x22)2 , 2|x1|x22  x21 + x42, (2.114)we estimate|f(x)| = |x1||x2|3x21 + x42 |x1||x2|32|x1|x22=|x2|2! 0 as x! 0. (2.115)Thus, f is continuous at (0, 0).Let t 2 R++. If x1 = 0, then f(tx)t = 0; otherwisef(tx)t=f(tx1, tx2)t=t4x1x32t2x21 + t4x42=t2x1x32x21 + t2x42! 0 as t # 0. (2.116)Thus f is Gaˆteaxu di↵erentiable at (0, 0), with rf(0, 0) = (0, 0).It remains to show that f is not Fre´chet di↵erentiable at (0, 0). Sincef(0, 0) = 0 and rf(0, 0) = 0, the quotient of interest is simply f(y)kyk , wherey = (y1, y2) 6= (0, 0). We will show that this quotient does not tend to 0 asy ! 0. Observe that ✓f(y)kyk◆2=y21y62(y21 + y22)(y22 + y42)2, (2.117)232.4. Convex Analysiswhich when y1 = y22 leads toy21y62(y21 + y22)(y22 + y42)2=y120y22(1 + y22)4y82=14(1 + y22)! 14as y2 ! 0. (2.118)Hence lim0 6=kyk!0f(y)kyk  12 and therefore f is not Fre´chet di↵erentiable at(0, 0).Definition 2.66. [7, Definition 16.1] Let f : H ! ]1,+1] be a properconvex function. The subdi↵erential of f is the set-valued operator@f : H! 2H : x 7! {u 2 H : hu, y  xi+ f(x)  f(y), 8y 2 H}. (2.119)The elements of @f(x) are the subgradients of f at x.Fact 2.67. [20, Theorem 1.1.57] A continuous convex function f : H ! Ris Gaˆteaux di↵erentiable at x 2 H if and only if its subdi↵erential @f(x)consists of one point. In this case, @f(x) = {Df(x)} = {rf(x)}.Example 2.68. (1) The absolute value f(x) = |x| is subdi↵erentiable atevery x 2 R. Precisely,(8x 2 R) @f(x) =8><>:1, if x > 0,[1, 1], if x = 0,1. if x < 0.(2.120)(2) For a di↵erentiable function f(x) = x2 on R, then its subdi↵erential isa singleton, which is the gradient of f , that is @f(x) = {rf(x)} = {2x} for8x 2 R.(3) The subdi↵erential of the Euclidean norm k · k on Rn has the form(8x 2 Rn) @(kxk) =8<:⇢xkxk, if x 6= 0,ball(0, 1), if x = 0.(2.121)We have now covered the elementary results needed for the followingchapters of this thesis, including subgradient projectors and cutters. In thenext chapter, we present a new method for finding a fixed point of a cutter.24Chapter 3Finite Convergence Result ofa Projected Cutter Method3.1 OverviewThis chapter is based on the paper [13]. We obtain finite convergenceresults for a general class of algorithms provided a constraint qualificationis satisfied. Our results complement and extend conclusions by Crombez forcutters and by Polyak for subgradient projectors. Comparisons of our resultswith existing algorithms are also included. We begin with the motivationfor this research.Let a finite family of closed convex subsets Ci, i 2 I, of a Hilbert space Hbe given. Denote C = \i2ICi. If C 6= ?, then the convex feasibility problemis to find a point x 2 C. A particular way to solve the convex feasibilityproblem is to express the solution set as the fixed point set of a properalgorithmic operator. The most famous instance is Polyak’s subgradientprojector of a convex function.Theorem 3.1. [50] (Polyak’s subgradient projection algorithm) Let C bea nonempty closed convex subset of H. Let f : H ! R be a continuousconvex function. Assume argmin f = {x 2 H : f(x) = inf f(C)} 6= ?.Without loss of generality, we assume min f(C) = 0. Assume @f is boundedon bounded sets. Let (↵n) ✓ ]0, 2[ such that 9 ✏ > 0, ✏  ↵n  2  ✏. Lets 2 @f be a selection and definexn+1 =8<:PC✓xn  ↵n f(xn)ks(xn)k2 s(xn)◆, if f(xn) > 0,xn, if f(xn)  0.(3.1)Then (xn) converges weakly to a minimizer of f over C.For the proof of this theorem, see [50].Theorem 3.2. [20, Theorem 3.5.1] (Opial, 1967) Let C ✓ H be a nonemptyclosed convex subset of a Hilbert space H and U : C ! C be a nonexpansive253.2. Basic Propertiesand asymptotically regular operator with a fixed point. Then, for any x 2 C,the sequence (Ukx)1k=1 converges weakly to a point z 2 FixU .If the suitable algorithmic operator is nonexpansive and asymptoticallyregular7, then Opial’s result implies that the iterates of the operator convergeweakly to a solution.However, whether this solution can be reached within a finite number ofsteps is not fully answered. Censor et al. [21] studied the modified cyclicsubgradient projection (MCSP) and provided finite convergence conditionswhile Crombez [31] focussed on the iterations of a cutter whose fixed pointsset contains a ball of a known radius. Unfortunately, this radius is notnecessarily known in practice.3.2 Basic PropertiesGiven r  0, we follow Crombez [31] and define the operator Ur : H! Hat x 2 H byUrx :=8<:Tx+rkTx xk(Tx x), if x 6= Tx,x, otherwise.(3.2)Obviously, we can rewrite Urx asUrx = x+r + kTx xkkTx xk (Tx x) (3.3)when x /2 FixT . When T is a subgradient projector, then Ur was alsostudied by Polyak [50]. Note that FixUr = FixT .Lemma 3.3. [13, Lemma 2.1] Let y 2 FixT , let r 2 R++, and suppose thatball(y; r) ✓ FixT and that x 2 Hr FixT . Set⌧x := hx y, x Txkx Txki r + kx Txk. (3.4)Then the following hold:(i) ⌧x  0.7An operator U : H! H is called asymptotically regular if for all x 2 H, then we havelimk!1kUk+1x Ukxk = 0.263.2. Basic Properties(ii) kUrx yk2 = kTx yk2  r2  2r⌧x  kTx yk2  r2.(iii) kUrx yk2 = kx yk2  (r + kx Txk)2  2⌧x(r + kx Txk) kx yk2  (r + kx Txk)2  kx yk2  r2  kx Txk2.Proof. (i): Set z = y + r(x  Tx)/kx  Txk. Then z 2 ball(y; r) ✓ FixT .Since T is a cutter, we obtain0  hz  Tx, x Txi (3.5a)= hy + r(x Tx)/kx Txk  Txx Txi (3.5b)= hy  Tx, x Txi+ rkx Txk (3.5c)= hy  x, x Txi+ kx Txk2 + rkx Txk. (3.5d)Rearranging and dividing by kx Txk yieldshx y, (x Tx)/kx Txki  r + kx Txk (3.6)and hence ⌧x  0.(ii): Using (3.2), we derive the identity fromkUrx yk2 (3.7a)=x+ (kx Txk+ r)/kx Txk(Tx x) y2 (3.7b)=(Tx y) + r(Tx x)/kTx xk2 (3.7c)=kTx yk2 + r2 + 2rh(Tx x) + (x y), (Tx x)/kTx xki (3.7d)=kTx yk2 + r2 + 2rkx Txk  2rhx y, (x Tx)/kx Txki (3.7e)=kTx yk2  r2  2r⌧x. (3.7f)The inequality follows immediately from (i).(iii): Using (ii), we obtain=kUrx yk2 (3.8a)=k(x y) + (Tx x)k2  r2  2r⌧x (3.8b)=kx yk2 + kx Txk2 + 2hx y, Tx xi  r2  2r⌧x (3.8c)=kx yk2  kx Txk2  2(⌧x + r)kx Txk  r2  2r⌧x (3.8d)=kx yk2  (r + kx Txk)2  2⌧x(r + kx Txk). (3.8e)The inequalities now follow from (i).273.2. Basic PropertiesExample 3.4. (Ur need not be a cutter) [13, Example 2.2] Suppose thatH = R and that T is the subgradient projector associated with the functionf : R! R : x 7! x2  1. Then FixT = [1, 1]. Letr 2 R+ := {⇠ 2 R : ⇠  0}. (3.9)Then for 8x 2 Rr FixT , we haveUrx =x2+12x r sgn(x), (3.10)where sgn(·) is the signum function defined assgn(x) :=8><>:1, if x > 0,0, if x = 0,1, if x < 0.(3.11)Choosing y := 1 2 FixT and x := y + " /2 FixT , where " 2 R++ and r = 1,theny  Urx = 1 (1 + "2+12(1 + ") 1) = 2 1 + "2 12(1 + ")> 0, (3.12)when " is sucient small. AndxUrx = 1+"(1 + "2+12(1 + ")1) = 2+" 1 + "2 12(1 + ")> 0. (3.13)Then we have hy  Urx, x  Urxi > 0. By definition of a cutter, we caneasily check that Ur is not a cutter when " is suciently small.The following result concerns a relaxed version of Ur.Corollary 3.5. [13, Corollary 2.1] Let y 2 FixT , let r 2 R++, let ⌘ 2 R+,and suppose that ball(y; r) ✓ FixT and that x 2 Hr FixT . SetUr,⌘x := x+ ⌘r + kx TxkkTx xk (Tx x). (3.14)Then the following hold(i) Ur,⌘x = (1 ⌘)x+ ⌘Urx.(ii) kUr,⌘x yk2 = ⌘kUrx yk2 + (1 ⌘)kx yk2  ⌘(1 ⌘)kx Urxk2.(iii) kUrx xk = r + kx Txk.283.2. Basic Properties(iv) kUrx yk2  kx yk2  (r + kx Txk)2 = kx yk2  kx Urxk2.(v) kUr,⌘x yk2  kx yk2  ⌘(2 ⌘)(r + kx Txk)2= kx yk2  ⌘1(2 ⌘)kx Ur,⌘xk2.Proof. (i):Ur,⌘x = x+ ⌘r + kx TxkkTx xk (Tx x) (3.15)= x+ ⌘(x+r + kx Txkkx Txk (Tx x) x) (3.16)= x+ ⌘(Urx x) (3.17)= (1 ⌘)x+ ⌘Urx. (3.18)(ii): Using (i), we obtain kUr,⌘x yk2 = k(1 ⌘)(x y) + ⌘(Urx y)k2.Now use [7, Corollary 2.14] to obtain the identity.(iii): Since Urx = x+r+kxTxkkxTxk (Tx x), thenUrx x = r + kx Txkkx Txk (Tx x). (3.19)So we havekUrx xk = r + kx Txk. (3.20)(iv): Combine (iii) with Lemma 3.3 (iii).(v): Combine (i)-(iv). Indeed, more precisely,kUr,⌘x yk2 (3.21)=⌘kUrx yk2 + (1 ⌘)kx yk2  ⌘(1 ⌘)kx Urxk2 (3.22)⌘(kx yk2  kx Urxk2) + (1 ⌘)kx yk2  ⌘(1 ⌘)kx Urxk2(3.23)=kx yk2  ⌘(2 ⌘)kx Urxk2 (3.24)=kx yk2  ⌘(2 ⌘)(r + kx Txk)2. (3.25)And by definition of Ur,⌘, we have kur,⌘x xk = ⌘(r+ kx Txk), thereforekUr,⌘x yk2  kx yk2  ⌘(2 ⌘)(r + kx Txk)2 (3.26)= kx yk2  ⌘(2 ⌘)(1⌘kur,⌘x xk)2 (3.27)= kx yk2  ⌘1(2 ⌘)kx Ur,⌘xk2. (3.28)293.2. Basic PropertiesDefinition 3.6. We call Q : H ! H a quasi projector of C if and only ifranQ = FixQ = C and for 8x 2 H, 8c 2 CkQx ck  kx ck. (3.29)Example 3.7. (Projectors are quasi projectors) [13, Example 2.3] PC isa quasi projector of C. Since kPCx  xk = infc2C kx  ck  kx  ck andranPC = FixPC = C. More generally if R : H ! H is quasi nonexpansive,i.e., 8x 2 H, 8y 2 FixR, then kRx  yk  kx  yk and C ✓ FixR, thenPC R is a quasi projector of C. Indeed, for 8c 2 C ✓ FixR, thenkPC Rx ck = kPC Rx PCck  kRx ck  kx ck. (3.30)And also we have ranPC R = FixPC R = C.It can be shown (see [1, Proposition 3.4.4]) that when C is an anesubspace (i.e., if C 6= ?, and 8 2 R, C = C + (1  )C), then the onlyquasi projector of C is the projector. However, we will now see that forcertain cones there are quasi projectors di↵erent from projectors.Proposition 3.8. (Reflector of an obtuse cone) [13, Proposition 2.1] Sup-pose that C is an obtuse convex cone, i.e., R+C = C and C := {x 2 H :suphC, xi = 0} ✓ C. Then the reflector RC := 2PC  Id is nonexpansiveand ranRC = FixRC = C.Proof. (i) ranRC = FixRC = C. For 8x 2 H,RCx = 2PCx x = 2PCx (PCx+ PC x) (3.31)= PCx PC x (3.32)2 C  C (3.33)✓ C + C = C. (3.34)So ranRC = C.Let x = RCx, then x = 2PCx  x, which is equivalent to x = PCx,then x 2 C. So we have FixRC = C.(ii) Since PC is firmly nonexpansive, then 2PC Id is nonexpansive. Theninequality (3.29) is satisfied.Therefore, the reflector of an obtuse cone is a quasi projector of C.Corollary 3.9. [13, Corollary 2.2] Suppose that C is an obtuse cone andlet  : H! [1, 2]. ThenQ : H! H : x 7! 1 (x)x+ (x)PCx (3.35)is a quasi projector of C.303.2. Basic PropertiesProof. Since, for every x 2 H, when (x) = 1, then Q(x) = PCx, when(x) = 2, Q(x) = RCx. So we have Q(x) 2 [PCx,RCx]. Both PC andRC are quasi projectors of C from Proposition 3.8. Therefore Q is a quasiprojector of C.Example 3.10. [13, Example 2.4] Suppose H = Rd and C = Rd+. Then RCis a quasi projector.Proof. Referring to Example 2.44, we have C = C. Then C is an obtusecone. The conclusion follows from Corollary 3.9 with (x) ⌘ 2.Remark 3.11. [13, Remark 2.1] A quasi projector need not be continuousbecause we may choose  in Corollary 3.9 discontinuously. For instance, letH = R, C = R+, from Example 2.44, we have C is an obtuse cone. Define(x) =(1, if x  1,2, if x > 1. (3.36)When x  1, then (x) = 1, PCx = 0. Then Q(x) = 0. When 1 < x < 0,then (x) = 2, PCx = 0. Then Q(x) = x ! 1 (x ! 1+). Thus Q isdiscontinuous at x = 1.Fact 3.12. (Raik)[51] Let (xn)1n=1 be a sequence in H that is Feje´r monotonewith respect to a subset S of H. If intS 6= ?, thenPn2N kxnxn+1k < +1and (xn)1n=1 converges strongly to some point in H.Lemma 3.13. [13, Lemma 2.2] Suppose that H is finite-dimensional, letf : H ! R be convex and Fre´chet di↵erentiable such that inf f(H) < 0.Then for every ⇢ 2 R++, we haveinf{krf(x)k : x 2 ball(0; ⇢) \ f1(R++)} > 0. (3.37)Proof. Let ⇢ 2 R++ and assume to the contrary that the conclusion fails.Since H is finite dimensional space, by Bolzano-Weierstrass Theorem8, thenthere exists a sequence (xn)1n=1 in ball(0; ⇢) \ f1(R++) and a point x 2ball(0; ⇢) such that xn ! x and rf(xn) ! 0. Since f is Fre´chet dif-ferentiable, then rf is continuous. Then rf(xn) ! rf(x) = 0 andf(xn) ! f(x)  0. On the other hand, inf f(x) < 0 and rf(x) = 0,then we have f(x) < 0, which is absurd.8Bolzano-Weierstrass Theorem: In Rk, every bounded sequence has a convergent sub-sequence.313.3. Finitely Convergent Cutter Method3.3 Finitely Convergent Cutter MethodAssume that (rn)1n=1 is a sequence in R++ such that rn ! 0 and (⌘n)1n=1is a sequence in ]0, 2]. Let QC be a quasi projector of C. We further assumethat x0 2 C and that (xn)1n=1 is generated byxn+1 :=(QCxn + ⌘n(Urnxn  xn), if xn /2 FixT ,xn, otherwise.(3.38)Note that (xn)1n=1 lies in C. Also observe that if xn lies in FixT , then sodoes xn+1.Theorem 3.14. [13, Theorem 3.1] Assume that (rn)n2N is a sequence inR++ such that rn ! 0 and (⌘n)n2N is a sequence in ]0, 2]. Suppose thatint(C\FixT ) 6= ? and thatPn2N ⌘nrn = +1. Then (xn)1n=1 lies eventuallyin C \ FixT .Proof. We argue by contradiction. If the conclusion is false, then no termof the sequence in (xn)1n=1 lies in FixT , i.e., (xn)1n=1 lies in H r FixT .By assumption, there exist z 2 C \ FixT and r 2 R++ and such thatball(z; 2r) ✓ C \ FixT . Hence8y 2 ball(z; r) ball(y; r) ✓ C \ FixT. (3.39)Since rn ! 0, there exists m 2 such that n  m implies rn  r. Now letn  m and y 2 ball(z; r). Using the assumption that QC is a quasi projectorof C, that y 2 C, (3.39) and Corollary 3.5 (v), we obtainkxn+1  yk =QCxn + ⌘n(Urnxn  xn) y (3.40a) kxn + ⌘n(Urnxn  xn) yk (3.40b) kxn  yk. (3.40c)Hence the sequence(xm, xm + ⌘m(Urmxm  xm), xm+1, xm+1 + ⌘m+1(Urm+1xm+1  xm+1), . . .)(3.41)is Feje´r monotone with respect to ball(z; r). It follows from Fact 3.12 andCorollary 3.5 (iii) that+1 >Xnm⌘nkxnUrnxnk =Xnm⌘nrn+kxnTxnk  Xnm⌘nrn, (3.42)which is absurd becausePn2N ⌘nrn = +1.323.3. Finitely Convergent Cutter MethodCompared to Theorem 3.14, the following theorem has a less restric-tive assumption on (FixT,C) but a more restrictive one on the parameters(rn, ⌘n).Theorem 3.15. [13, Theorem 3.2] Assume that (rn)n2N is a sequence inR++ such that rn ! 0 and (⌘n)n2N is a sequence in ]0, 2]. Suppose thatC \ int FixT 6= ? and that Pn2N ⌘n(2  ⌘n)r2n = +1. Then (xn)1n=1 lieseventually in C \ FixT .Proof. Similarly to the proof of Theorem 3.14, we argue by contradictionand assume the conclusion is false. Then (xn)1n=1 must lie in HrFixT . Byassumption, there exist y 2 FixT and r 2 R++ such that ball(y; r) ✓ FixT .Because rn ! 0, there exists m 2 such that n  m implies rn  r. Letn  m. Using also the assumption that QC is a quasi projector of C andCorollary 3.5 (v), we deduce thatkxn+1  yk2 =QCxn + ⌘n(Urnxn  xn) y2 (3.43a) kxn + ⌘n(Urnxn  xn) yk2 (3.43b) kxn  yk2  ⌘n(2 ⌘n)rn + kxn  Txnk2(3.43c) kxn  yk2  ⌘n(2 ⌘n)r2n. (3.43d)This implieskxm  yk2 Xnmkxn  yk2  kxn+1  yk2  Xnm⌘n(2 ⌘n)r2n = +1,(3.44)which contradicts our assumption on the parameters.Theorem 3.14 and Theorem 3.15 have various applications. Since everyresolvent of a maximally monotone operator (see [7, Definition 20.20]) isfirmly nonexpansive (see [7, Definition 4.1 (i)]) and hence a cutter, we obtainthe following result.Corollary 3.16. [13, Corollary 3.1] Let A : H ◆ H be maximally mono-tone, suppose that QC = PC , that T = (Id+A)1, and that one of thefollowing holds:(i) int(C \A10) 6= ? and Pn2N ⌘nrn = +1.(ii) C \ intA10 6= ? and Pn2N ⌘n(2 ⌘n)r2n = +1.Then (xn)1n=1 lies eventually in C \A10.333.3. Finitely Convergent Cutter MethodProof. Let A be maximally monotone, T = (Id+A)1, By Theorem 2.2.5in [20], T is firmly nonexpansive, then T is a cutter. 8x 2 FixT , thenTx = (Id+A)1x = x, so (Id+A)x = x, we have Ax = 0, x 2 A10.Therefore FixT ✓ A10. Similary proof for inverse. Then the conlusionfollows by Theorem 3.14 and Theorem 3.15.Corollary 3.16 applies in particular to finding a constrained critical pointof a convex function. When specializing further to a normal cone operator,we obtain the following result.Example 3.17. (Convex feasibility) [13, Example 3.1] LetD be a nonempty,closed and convex subset of H, and suppose that QC = PC , that T = PD,and that one of the following holds:(i) int(C \D) 6= ? and Pn2N rn = +1.(ii) C \ intD 6= ? and Pn2N r2n = +1.Then the sequence (xn)1n=1 generated byxn+1 := PC✓PDxn + rnPDxn  xnkPDxn  xnk◆(3.45)if xn /2 D and xn+1 := xn if xn 2 D, lies eventually in C \D.Remark 3.18. (Relationship to Polyak’s work) [13, Remark 3.1] In [50],B.T. Polyak considers random algorithms for solving constrained systemsof convex inequalities. Suppose that only one consistent constrained con-vex inequality is considered. Hence the cutters used are all subgradientprojectors. Then his algorithm coincides with the one considered in thissection and thus is comparable. We note that our Theorem 3.14 is moreflexible because Polyak requiresPn2N r2n = +1 (see [50, Theorem 1 andSection 4.2]) provided that 0 < infn2N ⌘n  supn2N ⌘n < 2 while we requireonlyPn2N rn = +1 in this case. Regarding our Theorem 3.15, we notethat our proof essentially follows his proof which actually works for cutters— not just subgradient projectors — and under a less restrictive constraintqualification.Remark 3.19. (Relationship to Crombez’s work) [13, Remark 3.2] In [31], G.Crombez considers asynchronous parallel algorithms for finding a point inthe intersection of the fixed point sets of finitely many cutters — without theconstraint set C. Again, we consider the case when we are dealing with onlyone cutter. Then Crombez’s convergence result (see [31, Theorem 2.7]) is343.4. Limiting Examplessimilar to Theorem 3.15; however, he requires that the radius r of some ballcontained in FixT be known which may not always be realistic in practicalapplications.Actually, it is not too dicult to extend Theorem 3.14 and Theorem 3.15to deal with finitely many cutter. We have opted here for simplicity ratherthan maximal generality.3.4 Limiting ExamplesIn this section, we collect several examples that illustrate the boundariesof the theory. We start by showing that the conclusion of Theorem 3.14 andTheorem 3.15 both may fail to hold if the divergent-series condition is notsatisfied.Example 3.20. (Divergent-series condition is important) Suppose thatH =C = R, that f : R ! R : x 7! x2  1, and that T = Gf is the subgradientprojector associated with f . Suppose that x0 > 1, set r1 := x01 > 0 and(8n 2 N) rn := r2n1/(4(1 + rn1)). Then (rn)n2N lies in R++, rn ! 0, andPn2N rn < +1 and hencePn2N r2n < +1. However, the sequence (xn)1n=1generated by (3.38) lies in ]1,+1[ and hence does not converge finitely toa point in FixT = [1, 1]. Furthermore, the classical subgradient projectoriteration (8n 2 N) yn+1 = Tyn converges to some point in FixT , but notfinitely when y0 /2 FixT .Proof. It is clear that FixT = [1, 1]. Observe that (8n 2 N) 0 < rn (1/4)rn1  (1/4)n+1r1. It follows that rn ! 0 and thatPn2N rn andPn2N r2n are both convergent series. Now suppose that rn1 = xn  1 > 0for some n 2 N. It then follows from Example 3.4 thatxn+1 =xn2+12xnrn = (xn  1)22xn+1rn = r2n12(1 + rn1)+1rn = rn+1.(3.46)Hence, by induction, (8n 2 N) xn = 1 + rn1 and therefore xn ! 1+.As for the sequence (yn)n2N, it is follows from Polyak’s seminal work(see [48]) that (yn)n2N converges to some point in FixT . However, by e.g.[12, Proposition 9.9], (yn)n2N lies outside FixT whenever y0 does.The next example illustrates that in the context of Theorem 3.14 andTheorem 3.15, we cannot expect finite convergence if the interior of FixT isempty.353.4. Limiting ExamplesExample 3.21. (Nonempty-interior condition is important) Suppose thatH = C = R, that f : R ! R : x 7! x2, and that T = Gf is the subgradientprojector associated with f . Then FixT = {0} and hence int FixT = ?. Setx0 := 1/2, and set (8n 2 N) wn := (n+1)1/2 and rn = wn if Uwnxn 6= 0 andrn = 2wn if Uwnxn = 0. Then rn ! 0 andPn2N r2n = +1. The sequence(xn)1n=1 generated by (3.38) converges to 0 but not finitely.Proof. The statements concerning (rn)n2N are clear. It follows readily fromthe definition that (8x 2 R)(8r 2 R+) Tx = x/2 and Urx = x/2 r sgn(x).Since x0 = 1/2, w0 = 1, U1x0 = 3/4 6= 0, and r0 = w0 = 1, it follows that0 < |x0/2| < r0. We now show that for every n 2 N,0 < |xn/2| < rn. (3.47)This is clear for n = 0. Now assume (3.47) holds for some n 2 N.Case 1: |xn| = 2wn.Then Uwnxn = xn/2  sgn(xn)wn = 0. Hence rn = 2wn and thus xn+1 =Urnxn = xn/2  2wn sgn(xn) = sgn(xn)wn  2wn sgn(xn) =  sgn(xn)wn.Thus 0 < |xn+1/2| = wn/2 = 1/(2pn+ 1) < 1/pn+ 2 = wn+1  rn+1,which yields (3.47) with n replaced by n+ 1.Case 2: |xn| 6= 2wn.Then Uwnxn = xn/2  sgn(xn)wn 6= 0. Hence rn = wn and thus xn+1 =Urnxn = xn/2 rn sgn(xn). It follows that |xn+1| = rn  |xn/2| > 0. Hence0 < |xn+1/2| and also |xn+1| < rn = wn < 2wn+1  2rn+1. Again, this is(3.47) with n replaced by n+ 1.It follow now by induction that (3.47) holds for every n 2 N.We now illustrate that when FixT = ?, then (xn)1n=1 may fail to con-verge.Example 3.22. Suppose that H = C = R, that f : R ! R : x 7! x2 + 1,and that T = Gf is the subgradient projector associated with f . Let y0 2 Rand suppose that (8n 2 N) xn+1 = Txn. Then (xn)n2N is either not welldefined or it diverges. Suppose that x0 > 1/p3, set k0 = x0  1/p3 > 0and (8n 2 N) kn+1 =p(n+ 1)/(n+ 2)kn. Suppose that(8n 2 N) rn = 12✓p3 + 2kn+1 + kn  1kn + 1/p3◆. (3.48)Then rn ! 0+ andPn2N r2n = +1. Moreover, the sequence (xn)1n=1 gener-ated by (3.38) diverges.363.4. Limiting ExamplesProof. Clearly, FixT = ? and one checks that(8r 2 R+)(8x 2 Rr {0}) Urx = x2 12x r sgn(x). (3.49)If some xn = 0, then the sequence (xn)n2N is not well defined.Case 1 : (9n 2 N) xn = 1/p3.Then xn+1 = Txn = U0xn = xn/2 1/(2xn) = 1/p3 = xn and similarlyxn+2 = xn+1 = xn. Hence the sequence eventually oscillates between 1/p3and 1/p3.Case 2 : (9n 2 N) |xn| = 1.Then xn+1 = 0 and the sequence is not well defined.Case 3 : (8n 2 N) |xn| /2 {1, 1/p3}.Using the Arithmetic Mean–Geometric Mean inequality, we obtain|xn+1  xn| =xn2  12xn  xn = 12xn + 1xn = 12✓|xn|+ 1|xn|◆ 1(3.50)for every n 2 N. Therefore, (xn)n2N is divergent or not well defined.We now turn to the sequence (xn)1n=1. Observe that0 < kn =pn/(n+ 1)kn1 = · · · = k0/pn+ 1! 0+ (3.51)and hence (kn)n2N is strictly decreasing. It follows that rn ! 0+ and thatrn > (2kn+1 + kn)/2 > 3kn+1/2 = 3k0/(2pn+ 2). Thus,Pn2N r2n = +1.Next, (3.49) yieldsx1 =x02 12x0 r0 (3.52a)=k0 + 1/p32 12k0 + 1/p3) 12✓p3 + 2k1 + k0  1k0 + 1/p3◆(3.52b)=  1p3 k1. (3.52c)Hence x1 < 0 and we then see analogously that x2 = 1/p3 + k2 > 0. Weinductively obtain(8n 2 N) 0 < x2n = 1p3+ k2n and 0 > x2n+1 =  1p3 k2n+1. (3.53)It follows that (1)nxn ! 1/p3; therefore, (xn)1n=1 is divergent.373.5. Comparison3.5 ComparisonWe assume that f : H ! R is convex and Fre´chet di↵erentiable with alevel set {x 2 H : f(x)  0} 6= ? and thatT = Gf : H! H : x 7!8<:xf(x)krf(x)k2rf(x), if f(x) > 0,x, otherwise(3.54)is the associated subgradient projector. Then (3.2) turns intoUrx =8<:xf(x) + rkrf(x)kkrf(x)k2 rf(x), if f(x) > 0,x, otherwise(3.55)and (3.38) intoxn+1 =8<:QC✓xn  ⌘n f(xn) + rnkrf(xn)kkrf(xn)k2 rf(xn)◆, if f(xn) > 0,xn, otherwise.(3.56)In the algorithmic setting, Polyak uses ⌘n ⌘ ⌘ 2 ]0, 2[ , (e.g. ⌘ = 1.8; see[50, Section 4.3]). In the present setting, his framework requiresPn2N r2n =+1. When C = H, one also has the following similar yet di↵erent updateformulayn+1 =8<:yn  ⌘nf(yn) + "nkrf(yn)k2rf(yn), if f(yn) > 0,yn, otherwise,(3.57)where 0 < infn2N ⌘n  supn2N ⌘n < 2 and ("n)n2N is a strictly decreasingsequence in R++ withPn2N "n = +1. In this setting, this is also knownas the Modified Cyclic Subgradient Projection Algorithm (MCSPA), whichfinds its historical roots in works by Fukushima [34], by De Pierro and Iusem[47], and by Censor and Lent [22]. Note that MCSPA requires the existenceof a Slater point, i.e., inf f(H) < 0, which is more restrictive than ourassumptions (consider, e.g., the squared distance to the unit ball). Let usnow link the assumption on the parameters of the MCSPA (3.57) to (3.56).Proposition 3.23. [13, Proposition 5.1] Suppose that H = C is finite-dimensional, inf f(H) < 0, ⌘n ⌘ 1,Pn2N rn = +1 and (8n 2 N) "n =rnkrf(xn)k > 0. Then "n ! 0 andPn2N "n = +1.383.5. ComparisonProof. Corollary 3.5 (iv) implies that (xn)1n=1 is bounded. Because rf iscontinuous, we obtain that  := supn2N krf(xn)k < +1. By Lemma 3.13,there exists ↵ 2 R++ such that if f(xn) > 0, then krf(xn)k  ↵. Hence(8n 2 N) f(xn) > 0 ) 0 < ↵rn  krf(xn)krn = "n  rn, (3.58)and thereforePn2N "n = +1.The following example shows that our assumptions are independent ofthose of the MCSPA.Example 3.24. [13, Example 5.1] Suppose that H = C = R, that f : R!R : x 7! x2  1, that rn = (n + 1)1 if n is even and rn = n1/2 if n isodd, and that ⌘n ⌘ 1. Clearly, rn ! 0 andPn2N r2n = +1. However,("n)n2N := (rn|f 0(xn)|)n2N is not strictly decreasing.Proof. The sequence (xn)1n=1 is bounded. Suppose that f(xn) > 0 for somen 2 N. By Example 3.4,xn+1 = Urnxn =xn2+12xn rn sgn(xn). (3.59)Assume that n is even, say n = 2m, where m  2, and that 1 < x2m <(2m+ 1)/2. Then x2m > 2x2m/p2m+ 1 and"2m = r2m|f 0(x2m)| = 2r2mx2m = 2x2m2m+ 1. (3.60)Hence, using (3.59),x2m+1 =x2m2+12x2m r2m > x2m2+12m+ 1 12m+ 1=x2m2, (3.61)and therefore2x2m+1 > x2m >2x2mp2m+ 1. (3.62)Thus "2m+1 = r2m+1|f 0(x2m+1)| = 2r2m+1x2m+1. It follows that"2m+1 =2x2m+1p2m+ 1>2x2m2m+ 1= "2m (3.63)and the proof is complete.We conclude this chapter with a simple experiment.393.5. ComparisonRemark 3.25. (Numerical experiment) We compare the performance of ouralgorithm to Censor’s modified cyclic subgradient projection algorithm (MC-SPA), which is another popular method and which has similar update for-mulas. Suppose that H = C = R and that f : H ! R : x 7! x2  1. Let Tbe the subgradient projector associated with f .Then our algorithm (3.38) is reduced toxn+1 =(xn  ⌘n f(xn)+rnkrf(xn)kkrf(xn)k2 rf(xn), if f(xn) > 0,xn, otherwise.(3.64)Let ("n)1n=1 ✓ R+ be a monotonically decreasing sequence such that "n ! 0andP1n=1 "n = +1. Censor’s MCSPA in [21] is defined byxn+1 =(xn  ⌘n f(xn)+"nkrf(xn)k2rf(xn), if f(xn) > 0,xn, otherwise.(3.65)We randomly choose 100 starting points in the interval [1, 106]. In thefollowing table, we record the performance of (3.64) and (3.65) with di↵erentselections of parameters. The choices of (rn, ⌘n) are for (3.64), and "n is forMCSPA. Mean and median refer to the number of iterations until the currentiterate is 106 feasible. See Table 3.1.Table 3.1: Performance of the algorithm for f(x) = x2  1.Algorithm for x2  1 Mean Median(rn, ⌘n) =1/(n+ 1), 111.49 13(rn, ⌘n) =1/(n+ 1), 22 2(rn, ⌘n) =1/pn+ 1, 110.83 12(rn, ⌘n) =1/pn+ 1, 22 2"n = 1/(n+ 1) 11.81 13"n = 1/pn+ 1 12.19 13Now let us instead consider f : H! R : x 7! 100x21. The correspond-ing data are in the Table 3.2.403.5. ComparisonTable 3.2: Performance of the algorithm for f(x) = 100x2  1.Algorithm for 100x2  1 Mean Median(rn, ⌘n) =1/(n+ 1), 113.29 14(rn, ⌘n) =1/(n+ 1), 212 12(rn, ⌘n) =1/pn+ 1, 117.52 19(rn, ⌘n) =1/pn+ 1, 2105 105"n = 1/(n+ 1) 15.27 16"n = 1/pn+ 1 15.76 17We observe that both the step lengths rn and "n and the relaxationparameter ⌘n have significant impact on the performance of the algorithms.This chapter gave a new method for finding a fixed point of a cutter,motivated by Polyak and Crombez’s works. More precisely, our assumptionson the parameters are more general than existing results. Furthermore, weprovided limiting examples to illustrate that our assumptions can not beweakened.41Chapter 4On Subgradient Projectors4.1 OverviewThis chapter is based on the paper [12]. Most results in this chapterare new, and some are complementary to work of B. Pauwels [44]. Theresults in this chapter are central for the understanding of the subgradi-ent projector. Fundamental properties such as continuity, nonexpansivenessand monotonicity are investigated. We also point out that the Yamagishi-Yamada operator is a subgradient projector of another convex function. Westart with the definition of our key notion: subgradient projector.Throughout this chapter, we assume that f : H! R is convex and con-tinuous, and C := {x 2 H : f(x)  0} 6= ?.Unless stated otherwise, we assume that s : H! H is a selection of @f ,i.e., (8x 2 H) s(x) 2 @f(x). Then the associated subgradient projector isdefined by(8x 2 X) Gf (x) =8<:xf(x)ks(x)k2 s(x), if f(x) > 0;x, otherwise.(4.1)The subgradient projector is the key ingredient in Polyak’s seminal work[48] on subgradient projection algorithms, which have since found many ap-plications; see, e.g., [2] (which deals with subgradient algorithms viewed asprojection algorithms with variable supersets), [6] (which provides a frame-work for transforming weakly into strongly convergent cutter methods), [20](which provides a very nice and recent monograph on cutters, subgradientprojectors and related algorithms), [22] (which introduces the cyclic sub-gradient projections method for solving systems of convex inequalities), [23](which deals with an almost cyclic sequential algorithm for solving a com-mon fixed point problem of cutters), [24] (which is a book that not onlyfeatures subgradient projection algorithms but it also won their authors theINFORMS Computing Society prize), [26] (which presents a general frame-work for algorithmically solving feasibility problems especially in signal pro-cessing), [28] (which is a broad survey on subgradient projection algorithms424.2. Basic Propertiesfor solving image recovery problems), [29] (which develops the mathematicaltheory of the general and fast parallel projection method for image recoverybased on subgradient projectors), [30] (which presents an adaptive level setmethod for constrained image recovery relying upon subgradient projectors),[43] (which introduces a cut that is tighter than the subgradient projector cutwhen applied to certain quadratic functions), [49] (which is a seminal bookthat includes subgradient methods for the constrained minimization of nons-mooth functions), [50] (which employs randomly chosen subgradient projec-tors to solve convex inequalities), [55] (which provides an adaptive projectedsubgradient method with applications to machine learning), [56] (which isnot only a review on subgradient projection algorithms for adaptive learn-ing but it also won the IEEE Signal Processing Society Signal ProcessingMagazine Best Paper Award), [58] (which presents a subgradient-projectorbased method for solving robust adaptive signal processing problems), [59](which features a very general hybrid steepest descent for solving varia-tional inequalities with applications to convex optimization via subgradientprojectors), [60] (which introduces an adaptive filtering algorithm based onparallel subgradient projection methods), and the references therein. Thisimpressive body of (including even award-winning) research, which has alsoattracted a significant number of citations and which clearly brought out theimportance of the subgradient projector as an algorithmic building block,warrants a mathematical study of the subgradient projector and its proper-ties.4.2 Basic PropertiesLet us record some basic results on subgradient projectors, which areessentially contained already in [48] and the proofs of which we provide forcompleteness.Fact 4.1. Let x 2 H, and setH = {y 2 H : hs(x), y  xi+ f(x)  0}. (4.2)Then the following hold:(i) f+(x) + hs(x), Gf (x) xi = 0, where f+(x) = max{f(x), 0}.(ii) FixGf = C ✓ H.(iii) Gf (x) = PHx.434.2. Basic Properties(iv) (Gf is a cutter) (8c 2 C) hcGf (x), xGf (x)i  0.(v) (8c 2 C) kxGf (x)k2 + kGf (x) ck2  kx ck2.(vi) f+(x) = ks(x)kkxGf (x)k.(vii) If x /2 C, then (8c 2 C) f2(x)ks(x)k2 + kGf (x) ck2  kx ck2.(viii) f+(x)(xGf (x)) = kxGf (x)k2s(x).(ix) Suppose that f is Fre´chet di↵erentiable at x 2 H r C. Then g =ln f : H r C ! R is Fre´chet di↵erentiable at x and Gf (x) = x rg(x)/krg(x)k2.(x) Suppose that min f(H) = 0 and f is Fre´chet di↵erentiable on H withrf being Lipschitz continuous9 with constant L, that x /2 C, and thatthere exists ↵ > 0 such that f(x)  ↵d2C(x). Then d2C(Gf (x)) (1 ↵2/L2)d2C(x).(xi) Suppose that min f(H) = 0, that x /2 C, and that there exists ↵ > 0such that f(x)  ↵dC(x). Then d2C(Gf (x))  (1 ↵2/ks(x)k2)d2C(x).Proof. (i): This follows directly from the definition of Gf . Indeed, supposex /2 C, we have f(x) > 0, then f+(x) = f(x).hs(x), Gf (x) xi = hs(x), f(x)ks(x)k2 s(x)i = f(x), (4.3)therefore f+(x) + hs(x), Gf (x)  xi = 0. When x 2 C, we have Gf (x) = xand f(x)  0, then f+(x) = 0, so f+(x) + hs(x), Gf (x) xi = 0.(ii): The equality is clear from the definition of Gf . Let z 2 C. By con-vexity of function or definition of subgradient, we have hs(x), zxi+f(x) f(z)  0 and hence z 2 H.(iii): Assume first that x 2 C. Then x 2 FixGf ✓ H by (ii) andhence Gf (x) = x = PHx. Now assume that x /2 H, then x /2 C. Then9We say an operator T : H! H is Lipschitz continuous with a constant L if 8x, y 2 H,kTx Tyk  Lkx yk.444.2. Basic Properties0 < f(x) = f+(x) and s(x) 6= 0. Using the formula of projection onto ahalf-space (see [7, Example 28.16]) we havePHx = x hs(x), xi hs(x), xi  f(x)ks(x)k2 s(x) = xf(x)ks(x)k2 s(x) = Gf (x).(4.4)(iv): In view of (iii) and Fact 2.40, we have (8h 2 H) hh  PHx, x PHxi = hh  Gf (x), x  Gf (x)i  0. Now invoke (ii), for 8c 2 C ✓ H, wehave hcGf (x), xGf (x)i  0.(v): This is equivalent to (iv) by definition of cutter.(vi): Assume first that x 2 C. Then f(x)  0, i.e., f+(x) = 0, andx = Gf (x) by (ii). Hence the identity is true. Now assume that x /2 C.Then 0 < f(x) = f+(x) and xGf (x) = f(x)ks(x)k2 s(x). Taking the norm, welearn that kxGf (x)k = f(x)ks(x)k = f+(x)ks(x)k .(vii): Combine (ii), (v), and (vi). If x /2 C, then kxGf (x)k = f+(x)ks(x)k =f(x)ks(x)k . Substitute kxGf (x)k into (v).(viii): This follows from (vi) and the definition of Gf . When x 2 C,we have x = Gf (x), then equation in (viii) holds. When x /2 C, we havef+(x) = f(x) > 0 and kxGf (x)k = f+(x)ks(x)k . By definition of Gf we havef+(x)(xGf (x)) = f+(x) f+(x)ks(x)k2 s(x) = kxGf (x)k2s(x). (4.5)(ix): The chain rule implies that rg(x) = rf(x)f(x) . Hence krg(x)k2 =krf(x)k2f2(x) and thus x rg(x)krg(x)k2 = x f(x)krf(x)k2rf(x) = Gf (x).(x): Let c 2 C. Then rf(c) = 0 and hence krf(x)k = krf(x) rf(c)k  Lkx  ck. Hence krf(x)k  LdC(x) and therefore, using (vii),we obtainkGf (x) ck2  kx ck2  f2(x)krf(x)k2  kx ck2  ↵2d4C(x)L2d2C(x). (4.6)Now take the minimum over c 2 C.454.3. Calculus(xi): Using (vii), we haved2C(Gf (x))  kGf (x)PCxk2  kxPCxk2f2(x)ks(x)k2  d2C(x)↵2d2C(x)ks(x)k2 .(4.7)The proof is complete.Example 4.2. Suppose that f(x) = kxk, x 2 H. Then rf(x) = xkxk andGf (x) = 0 when x 6= 0.4.3 CalculusWe now turn to basic calculus rules. It is convenient to introduce theoperator G : H◆ H, defined by(8x 2 H) Gx = Gfx = {Gs(x) : s is a selection of @f}, (4.8)where Gf is the operator occurring in (4.1) for a particular subgradient s.When f is Gaˆteaux di↵erentiable outside C, then we will identify G with G.Proposition 4.3 (Calculus). Let ↵ > 0, let A : H ! H be continuous andlinear such that A⇤A = AA⇤ = Id, and let z 2 H. Furthermore, let (fi)i2I bea finite family of convex continuous functions on H such that Ti2I Cfi 6= ?.Then the following hold:(i) Suppose that g = ↵f . Then Cg = Cf and Gg = Gf .(ii) Suppose that g = f  ↵ Id. Then Cg = ↵1Cf and Gg = ↵1Gf  ↵ Id.(iii) Suppose that f  0 and that g = f↵ is convex. Then Cg = Cf andGg = (1 ↵1) Id+↵1Gf .(iv) Suppose that g = f A. Then Cg = A⇤Cf and Gg = A⇤  Gf A.(v) Suppose that g : x 7! f(x  z). Then Cg = z + Cf and Gg : x 7!z + Gf (x z).(vi) Suppose that g = maxi2I fi. Then Cg =Ti2I Cfi and if g(x) > 0and I(x) = {i 2 I : fi(x) = g(x)}, then Gg(x) = {x  g(x)kx⇤k2x⇤ : x⇤ 2convSi2I(x) @fi(x)}.(vii) Suppose that g = f+. Then Gg = Gf .464.3. Calculus(viii) (Moreau envelope) Suppose that min f(H) = 0 and thatg = f⇤(1/2)k · k2 is the Moreau envelope of f . Then Cg = Cf and(8x 2 H) Gg(x) =8<:xg(x)kx Proxf xk2 (x Proxf x), if f(x) > 0;x, if f(x) = 0.(4.9)Proof. Let x 2 H. We shall only prove one inclusion for the subgradientprojector as the remaining one is proved similarly.(i): Since ↵ > 0, then g(x)  0 , f(x)  0, it follows that Cg = Cf .Suppose that f(x) > 0. For sf (x) 2 @f(x), it is easy to have ↵sf (x) 2 @g(x),we obtainGf (x) = x f(x)ksf (x)k2 sf (x) = xg(x)k↵sf (x)k2 (↵sf (x)). (4.10)This implies Gf (x) ✓ Gg(x).(ii): For 8x 2 Cg, then g(x) = f(↵x)  0, we have ↵x 2 Cf . On theother side, for x 2 Cf , then f(x) = f(↵ x↵) = g( x↵)  0, then x↵ 2 Cg. So wehave Cg = Cf . Suppose that g(x) > 0, i.e., f(↵x) > 0. Then↵1Gf (↵x) = ↵1(↵x f(↵x)ksf (↵x)k2 sf (↵x)) (4.11)= x ↵1 f(↵x)ksf (↵x)k2 sf (↵x) (4.12)= x f(↵x)k↵sf (↵x)k2 (↵sf (↵x)) 2 Gg(x). (4.13)Hence ↵1Gf (↵x) ✓ Gg(x).(iii): It is easy to check that Cg = Cf . Suppose that g(x) > 0. Thenf(x) > 0 and↵1(xGf (x)) = ↵1 f(x)ksf (x)k2 sf (x) (4.14)=f↵(x)k↵f↵1(x)sf (x)k2↵f↵1(x)sf (x) 2 x Gg(x). (4.15)474.4. Examples(iv): We have x 2 Cg , g(x) = f(Ax)  0, Ax 2 Cf , A⇤Ax 2 A⇤Cf, x 2 A⇤Cf . Suppose that g(x) > 0. Then f(Ax) > 0, for sf (Ax) 2@f(Ax), we have A⇤sf (Ax) 2 @g(x) andA⇤Gf (Ax) = A⇤(Ax f(Ax)ksf (Ax)k2 sf (Ax)) (4.16)= x f(Ax)kA⇤sf (Ax)k2A⇤sf (Ax) 2 Gg(x). (4.17)The last equation is using kAxk = kA⇤xk = kxk. SincekAxk2 = hAx,Axi = hA⇤Ax, xi = hx, xi = kxk2, (4.18)kA⇤xk2 = hA⇤x,A⇤xi = hAA⇤x, xi = hx, xi = kxk2. (4.19)(v): We have x 2 Cg , g(x) = f(xz)  0, xz 2 Cf , x 2 z+Cf .Suppose that 0 < g(x) = f(x z). Thenz +Gf (x z) = z +⇣x z  f(x z)ksf (x z)k2 sf (x z)⌘2 Gg(x). (4.20)(vi): This follows from the well known formula for the subdi↵erential ofa maximum; see, e.g., [45, Proposition 3.38].(vii): This follows from (vi) since f+ = max{0, f}.(viii): When x 2 Cf , then f(x)  0.g(x) = infy2H{f(y) + 12kx yk2}  f(x) + 12kx xk2 = f(x)  0.Then x 2 Cg. On the other side, let x 2 Cg, g(x) = infy2H{f(y) +12kx  yk2}  0. For 12kx  yk2, it attains its minimum at y = x, wehave infy2H{f(y) + 12kx  yk2} = f(x)  0. So x 2 Cf . When g  0,rg = IdProxf (see, e.g., [7, Proposition 12.29]), and argmin g = argmin f(see, e.g., [7, Corollary 17.5]).4.4 ExamplesIn this section, we present several illustrative examples.Example 4.4. Suppose that f = k · k2. Then rf = 2 Id and Gf = 12 Id.484.4. ExamplesExample 4.5. Suppose that(8x 2 H) f(x) =(12kxk2, if x 2 ball(0; 1);kxk  12 , otherwise.(4.21)Then Gf =12Pball(0;1) and Gf is firmly nonexpansive.Proof. Let x 2 H. By [9, Example 1.1.1], we have f = k ·k⇤(1/2)k ·k2 is theMoreau envelope of the norm. Hence it follows from Proposition 4.3 (viii)thatGf (x) = x f(x)kx Proxk·k xk2 (x Proxk·k x) (4.22)provided that x 6= 0, and Gf (x) = 0 = 12Pball(0;1)x if x = 0. Using[9, Example 1.1.1] and [7, Example 12.25], we have k · k⇤10 = ◆ball(0;1)and Proxk·k⇤ = Pball(0;1). Therefore we have Proxk·k = IdProxk·k⇤ =IdProx◆ball(0;1) = IdPball(0;1). Thus, IdProxk·k = Pball(0;1). Assumenow x 6= 0. If 0 < kxk  1, thenGf (x) = x12kxk2kPball(0;1)xk2Pball(0;1)(x) = xkxk22kxk2x =12x =12Pball(0;1)x;(4.23)and if 1 < kxk, thenGf (x) = x kxk 12kPball(0;1)xk2Pball(0;1)(x) (4.24)= x kxk 12x/kxk2 xkxk = 12 xkxk = 12Pball(0;1)x. (4.25)Now Pball(0;1) is firmly nonexpansive, and hence so is IdPball(0;1). It followsthat 2Gf  Id = (IdPball(0;1)) is nonexpansive, and therefore that Gf isfirmly nonexpansive.Proposition 4.6. Let (Ci)i2I be a finite family of closed convex subsets ofH such that C = Ti2I Ci 6= ? and f = maxi2I dCi. Let x 2 H r C, set10Let f : H! [1,+1]. The Fenchel conjugate of f isf⇤(u) = supx2H(hx, ui  f(x)) .494.4. ExamplesI(x) = {i 2 I : f(x) = dCi(x)}, and set Q(x) = conv{PCix}i2I(x). ThenG(x) =[q(x)2Q(x)⇢x f2(x)kx q(x)k2x q(x) and Q(x) ✓ conv {x}[G(x).(4.26)If I(x) = {i} is a singleton, then G(x) = {PCix}.Proof. This follows from Proposition 4.3 (vi) and the fact that rdCi(x) =(xPCix)/dCi(x) when x 2 HrCi. When q(x) 2 Q(x) = conv{PCix}i2I(x),Proposition 4.7. Let (Ci)i2I be a finite family of nonempty closed convexsubsets of H such that C = Ti2I Ci 6= ?. Let (i)i2I be a family in ]0, 1]such thatPi2I i = 1. Let p  1 and suppose that f =Pi2I idpCi. Set(8x 2 H) I(x) = {i 2 I : x /2 Ci}. Then 8x 2 HGf (x) = xPi2I(x) idpCi(x)pPi2I(x) idp2Ci(x)(x PCix)2 Xi2I(x)idp2Ci(x)(x PCix)(4.27)and if p = 2, we rewrite this asGf (x) =8><>:xPi2I ikx PCixk22Pi2I i(x PCix)2⇣xXi2IiPCix⌘, if x /2 C;x, otherwise.(4.28)Proof. Let x 2 H, and let i 2 I. Then rdCi(x) = xPCixdCi (x) if x /2 Ci and0 2 @dCi(x) otherwise. HencerdpCi(x) = pdp2Ci(x)(x PCix) (4.29)if x /2 Ci, and 0 2 @dpCi(x) otherwise. The result follows.Example 4.8. Let p  1 and suppose that f = dpC . ThenGf = (1 1p) Id+1pPC .504.4. ExamplesProof. This follows from Proposition 4.7 when I is a singleton. More pre-cisely, rf(x) = pdp1C xPCxdCx = pdp2C (x PCx). When x 2 H\C,Gf (x) = x dpC(x)kpdp2C (x PCx)k2pdp2C (x PCx)= x d2p2C (x)p2d2p4C (x)kx PCxk2p(x PCx)= x d2p2C (x)p2d2p4C (x)d2C(x)p(x PCx)= x 1p(x PCx)= (1 1p)x+1pPCx.Example 4.9. Suppose that u 2 H satisfies kuk = 1, and let  2 R. Thenthe following hold:(i) If f : x 7! |hu, xi  |, then C = {x 2 H : hu, xi = } and Gf : x 7!x (hu, xi  )u.(ii) If f : x 7! hu, xi  , then C = {x 2 H : hu, xi  } and Gf : x 7!x (hu, xi  )+u.Proof. (i):By using the formula of projection onto a hyperplane in [7, Ex-ample 3.21], then dC(x) = kx PCxk = kx (x hu,xikuk2 u)k = |hu, xi  |,so we have f = dC and hence Gf = PC by Example 4.8.(ii): Note that f+ = dC and hence Gf = PC by Proposition 4.3 (vii) andExample 4.8.We now give an example in which Gf is positively homogenenous butnot necessarily linear.Example 4.10. Let K be a nonempty closed convex cone with polar coneK , and suppose that f : x 7! 12hx, PKxi. Then Gf = Id12PK = PK +12PK .Proof. By Moreau’s conical decomposition (see [7, Theorem 6.29]), let x 2H, we havex = PKx+ PK x and hPKx, PK xi = 0.514.5. Continuity of Gf vs Fre´chet di↵erentiability of fThenhx, PKxi = hPKx+ PK x, PKxi = kPKxk2.So f(x) = 12kPKxk2 = 12kx  PK xk2 = 12d2K (x). By using [7, Corollary12.30] it follows that rf(x) = x PK x = PKx. ThenGf (x) = x f(x)krf(x)k2rf(x)= x12kPKxk2kPKxk2 PKx= x 12PKx= (PKx+ PK x) 12PKx= PK x+12PKx.A direct verification yields the following result which is known whenp = 2 (see [27] and [29]).Proposition 4.11. Let Y be another real Hilbert space, let A : H ! Y becontinuous and linear, let b 2 Y , and let "  0, and let p  1. Suppose that(8x 2 H) f(x) = kAx bkp "p and that C = {x 2 H : kAx bk  "} 6= ?.Then 8x 2 HGf (x) =8<:xkAx bkp  "ppkAx bkp2kA⇤(Ax b)k2A⇤(Ax b), if kAx bk > ";x, otherwise.(4.30)4.5 Continuity of Gf vs Fre´chet di↵erentiabilityof fIn this section, we investigate the continuity of Gf , which is a desir-able property when Gf is used as an operator in an algorithm. It turnsout that strong-to-strong continuity of Gf corresponds precisely to Fre´chetdi↵erentiability of f . We start with a technical result.Lemma 4.12. Let (xn)1n=1 be a sequence in H converging weakly to x¯ suchthat xn Gf (xn)! 0. Suppose that one of the following holds:524.5. Continuity of Gf vs Fre´chet di↵erentiability of f(i) there exists  > 0 such that (xn)1n=1 lies eventually in ball(x¯, ) and := sup k@f(ball(x¯; ))k < +1.(ii) xn ! x¯.(iii) f is bounded on every bounded subset of H.Then x¯ 2 C.Proof. If (i) is true : Without loss of generality, we assume that(8n 2 N) ks(xn)k  . (4.31)Since f+ is weakly lower semicontinuous, by using Fact 4.1(vi), we havef+(x¯)  lim inf f+(xn)   lim inf kxn Gf (xn)k = 0,then f(x¯)  0, i.e., x¯ 2 C.Suppose (ii) holds: Since xn ! x¯, by definition of convergence of the se-quence, 8✏ > 0, there exists N 2 N such thatxn 2 ball(x¯, ✏) (4.32)when n > N . Moreover, f is continuous at x¯, by using [7, Proposition16.14(iii)], there exists  2 R++ such that @f(ball(x¯; )) is bounded. In(4.32), let ✏ = , we have (ii) ) (i).Assume (iii) holds: By assumption, (xn)1n=1 converges weakly to x¯. Fact 2.24gives (xn) is bounded. Then there exists  > 0 such that xn 2 ball(x¯, ) be-cause f is bounded on every bounded subset of H. Applying the equivalenceof (i) and (iii) in [7, Proposition 16.17], we have dom @f = H and @f mapsevery bounded subset of H to a bounded set. For ball(x¯, ), it is a boundedsubset of H. Thus @f(ball(x¯; )) is also bounded. Hence (iii)) (i).Remark 4.13. Lemma 4.12 and Fact 4.1(ii) imply that Gf is fixed-pointclosed at x¯ (see, e.g., also [20, Theorem 4.2.7] or [4]), i.e., if xn ! x¯ andxn Gf (xn)! 0, then x¯ = Gf (x¯).Proposition 4.14. Gf is continuous at every point in C.Proof. Let x¯ 2 C, and let (xn)1n=1 be a sequence in H converging to x¯. Theresult is clear if (xn)1n=1 lies in C, so we can and do assume that (xn)1n=1lies in HrC. Then (8n 2 N) f(xn)  f(x¯) hs(xn), x¯ xni  hs(xn), xnx¯i  ks(xn)kkx¯  xnk. Hence 0 < f(xn)/ks(xn)k  kx¯  xnk ! 0. ByFact 4.1(vi), xn Gf (xn) ! 0. Thus limGf (xn) = limxn = x¯ = Gf (x¯) byusing Fact 4.1(ii).534.5. Continuity of Gf vs Fre´chet di↵erentiability of fThe continuity of Gf outside C is more delicate.Fact 4.15. (Smulyan) (See, e.g., [18, Proposition 6.1.4].) The followinghold:(i) f is Fre´chet di↵erentiable at x¯, s is (strong-to-strong) continuous atx¯.(ii) f is Gaˆteaux di↵erentiable at x¯ , s is strong-to-weak continuous atx¯.Lemma 4.16. Suppose that x¯ 2 HrC, that Gf is strong-to-weak continuousat x¯, but Gf is not strong-to-strong continuous at x¯. Then f is not Gaˆteauxdi↵erentiable at x¯.Proof. There exists a sequence (xn)1n=1 in Hr C such that xn ! x¯,Gf (xn)*Gf (x¯) yet Gf (xn) 6! Gf (x¯). It follows thatxn Gf (xn)* x¯Gf (x¯) and xn Gf (xn) 6! x¯Gf (x¯). (4.33)By the Kadec–Klee property11ofH, kxnGf (xn)k 6! kx¯Gf (x¯)k. Since k·kis weakly lower semicontinuous, we assume (after passing to a subsequenceand relabeling if necessary) thatkx¯Gf (x¯)k < ⌘ := limn2Nkxn Gf (xn)k. (4.34)Using Fact 4.1 (viii), it follows thats(xn) = f(xn)xn Gf (xn)kxn Gf (xn)k2 (4.35)* f(x¯)x¯Gf (x¯)⌘2(4.36)6= f(x¯) x¯Gf (x¯)kx¯Gf (x¯)k2 (4.37)= s(x¯). (4.38)Thus, s is not strong-to-weak continuous at x¯. It follows now from Fact 4.15(ii) that f is not Gaˆteaux di↵erentiable at x¯.Theorem 4.17. Let x¯ 2 Hr C. Then the following are equivalent:11 Kadec–Klee property is a sequence (yn)n2N in H converges to y¯ if and only if yn* y¯and kynk ! ky¯k.544.6. Continuity of Gf vs Gaˆteaux di↵erentiability of f(i) f is Fre´chet di↵erentiable at x¯.(ii) Gf is (strong-to-strong) continuous at x¯.(iii) f is Gaˆteaux di↵erentiable at x¯ and Gf is strong-to-weak continuousat x¯.Proof. (i))(ii): By Fact 4.15 (i), s is continuous at x¯. It follows from thedefinition of Gf that Gf is continuous at x¯ as well.(i)((ii): In view of Fact 4.1 (viii), we have s(x) = f(x)(xGf (x))/kxGf (x)k2 for all x suciently close to x¯. Hence s is continuous at x¯ andtherefore f is Fre´chet di↵erentiable at x¯ by Fact 4.15 (i).(i))(iii) and (ii))(iii): This is clear since (i),(ii) by the above.(iii))(ii): Suppose to the contrary that Gf is not strong-to-strong con-tinuous. Then, by Lemma 4.16, f is not Gaˆteaux di↵erentiable at x¯ whichis absurd.Corollary 4.18. (continuity) Gf is continuous everywhere if and only if fis Fre´chet di↵erentiable on Hr C.Proof. Combine Proposition 4.14 with Theorem 4.17.Example 4.19. Suppose that H = R and that f(x) = max{x, x, 2x 1}(8x 2 R). Then C = {0} and f is not di↵erentiable at 1; consequently, byCorollary 4.18, Gf is not continuous at 1.Remark 4.20. (weak-to-weak continuity) It is unrealistic to expect that Gfis weak-to-weak continuous even when f is Fre´chet di↵erentiable; see [4,Example 3.2 and Remark 3.3.(ii)].4.6 Continuity of Gf vs Gaˆteaux di↵erentiabilityof fIn view of Fact 4.15 and Corollary 4.18, it is now tempting to conjecturethatGf is strong-to-weak continuous if and only if f is Gaˆteaux di↵erentiableon H r C. Perhaps somewhat surprisingly, this turns out to be wrong.The counterexample is based on an ingenious construction by Borwein andFabian [15].554.6. Continuity of Gf vs Gaˆteaux di↵erentiability of fExample 4.21. (Borwein–Fabian) (See [15, Proof of Theorem 4].) Supposethat H is infinite-dimensional. Then there exists a function b : H! R suchthat the following hold:(i) b is continuous, convex and min b(H) = b(0) = 0.(ii) b is Fre´chet di↵erentiable on Hr {0}.(iii) b is Gaˆteaux di↵erentiable at 0, and rb(0) = 0.(iv) b is not Fre´chet di↵erentiable at 0.Example 4.22. (lack of strong-to-weak continuity) Let b be as in Exam-ple 4.21. Then there exists y 2 H such that rb(y) 6= 0. Suppose that(8x 2 H) f(x) = b(x) hrb(y), xi  12b(y) hrb(y), yi. (4.39)Then the following hold:(i) f is Gaˆteaux di↵erentiable (but not Fre´chet di↵erentiable) at 0, andGf is not strong-to-weak continuous at 0.(ii) f is Fre´chet di↵erentiable on Hr{0}, and Gf is continuous on Hr{0}.Proof. By Example 4.21 (iii), 0 2 ranrb. If {0} = ranrb, then we woulddeduce that b is constant and therefore Fre´chet di↵erentiable; in turn, thiswould contradict Example 4.21 (iv). Hence {0} $ ranrb and there existsy 2 H such thatv = rb(y) 6= 0. (4.40)Now setg : H! R : x 7! b(x) hv, xi. (4.41)Then(8x 2 H) f(x) = g(x) 12g(y), (4.42)and g(0) = b(0)  hv, 0i = 0 by Example 4.21 (i). Example 4.21 (iii) and(4.40) yield rg(0) = rb(0)  v = v 6= 0 while rg(y) = rb(y)  v = 0.Hence min g(H) = g(y) < g(0) = 0 and thereforef(y) = min f(H) = min g(H) 12g(y) = 12g(y) < 0 < 0 12g(y) = f(0).(4.43)Thus y 2 C while 0 /2 C.564.7. Gf as an accelerated mapping(i): On the one hand, since b is not Fre´chet di↵erentiable at 0 (Exam-ple 4.21 (iv)), neither is f . On the other hand, since b is Gaˆteaux di↵eren-tiable at 0 (Example 4.21 (iii)), so is f . Altogether, f is Gaˆteaux di↵eren-tiable, but not Fre´chet di↵erentiable at 0. Therefore, by Theorem 4.17, Gfis not strong-to-weak continuous at 0.(ii): Since b is Fre´chet di↵erentiable on H r {0} (Example 4.21 (ii)), sois f . Now apply Theorem 4.17.4.7 Gf as an accelerated mappingIn this section, we consider the case when f is a power of a quadraticform. We link Gf to the accelerated mapping corresponding to a linearoperator. This mapping can significantly speed up convergence of algorithms(see [8]).Proposition 4.23. Suppose that f : x 7! phx,Mxip, where p  1 andM : H ! H be continuous, linear, self-adjoint, and positive. Then Gf iscontinuous everywhere and(8x 2 H) Gf (x) =8<:xhx,MxipkMxk2Mx, if Mx 6= 0;x, if Mx = 0.(4.44)Proof. Assume first that p = 1. Since M has a unique positive square root,i.e., there exists12 B : H! H such that B is continuous, linear, self-adjoint,and positive, and kerB = kerM . Hence (8x 2 H) f(x) = phx,Mxi =kBxk so f is indeed convex and continuous. If x 2 H r kerM = H rkerB, then f is Fre´chet di↵erentiable at x with rf(x) = B⇤Bx/kBxk =Mx/kBxk; hence,Gf (x) = x kBxkkMxk2/kBxk2MxkBxk = xkBxk2kMxk2Mx = xhx,MxikMxk2 Mx(4.45)and Gf is continuous everywhere by Corollary 4.18. If p > 1, then the resultfollows from the above and Proposition 4.3 (iii).12See, e.g., [36, Theorem 9.4-2], where this is stated in a complex Hilbert space; however,the proof works unchanged in our real setting as well.574.7. Gf as an accelerated mappingExample 4.24. Let A : H ! H be linear, self-adjoint, and nonexpansive.Suppose that (8x 2 H) f(x) =phx, xAxi. Then Gf is positively homo-geneous, continuous everywhere, and(8x 2 H) Gf (x) =8<:xhx, xAxikxAxk2 (xAx), if Ax 6= x;x, if Ax = x.(4.46)Proof. Use Proposition 4.23 with M = IdA and p = 1.Remark 4.25. (accelerated mapping) Let A : H ! H be linear, nonexpan-sive, and self-adjoint. In [8], the authors study the accelerated mapping13of A, i.e.,x 7! txAx+ (1 tx)x, where tx =8<:hx, xAxikxAxk2 , if x 6= Ax;1, otherwise.(4.47)In view of the Example 4.24, the accelerated mapping of A is precisely thesubgradient projector Gf of the function x 7!phx, xAxi. Now supposethat H = `2(N), let (en)n2N be the standard orthonormal basis of H, andsuppose thatA : H! H : x 7!Xn2Nnn+1hen, xien. (4.48)This A is linear, self-adjoint and nonexpansive. Indeed, linearity is obviouslyfrom the constrcution of A. A is self-adjoint because (8x, y 2 H)hAx, yi = hXn2Nnn+1hen, xien, yi =Xn2Nnn+1xnyn = hx,Ayi.For the nonexpansiveness of A, we will use Bessel inequality. For every13In fact, the operator A in [8] need not necessarily be self-adjoint.584.8. Nonexpansivenessx, y 2 H, thenkAxAyk2 = kA(x y)k2 (since A is linear)=Xn2Nnn+1hen, x yien2= hXn2Nnn+1hen, x yien,Xn2Nnn+1hen, x yieni=Xn2N⇣nn+1⌘2 hen, x yi2Xn2Nhen, x yi2 k(x y)k2 (using Bessel inequality) kx yk2.Thus A defined in (4.48) is nonexpansive. By using Example 4.24, then Gfis continuous; however, Gf is neither linear nor uniformly continuous (seethe [8, Remark following Lemma 3.8]).4.8 NonexpansivenessWe now discuss when Gf is (firmly) nonexpansive or monotone. Theseproperties occur when studying resolvents, subdi↵erentials and gradients.For instance, every resolvent and every proximal mapping is firmly non-expansive; subdi↵erential (or gradient) operators of convex functions aremonotone. However, these properties are not automatic for subgradientprojectors as we will see in this section.Proposition 4.26. Suppose that f is Gaˆteaux di↵erentiable on HrC andthat Gf is firmly nonexpansive. Then Gg is likewise in each of the followingsituations:(i) ↵ > 0, and g = f  ↵ Id is convex.(ii) f  0, ↵  1, and g = f↵ is convex.(iii) A : H! H is continuous and linear, AA⇤ = A⇤A = Id, and g = f A.(iv) z 2 H and g : x 7! f(x z).The analogous statement holds when Gf is assumed to be nonexpansive.594.8. NonexpansivenessProof. This follows from the corresponding items in Proposition 4.3, whichdo preserve (firm) nonexpansiveness.On the real line, we obtain a simpler test.Proposition 4.27. Suppose that H = R and that f is twice di↵erentiableon H r C. Then Gf is monotone. Moreover, Gf is (firmly) nonexpansiveif and only if(8x 2 R) f(x)f 00(x)  f 0(x)2. (4.49)Proof. By Corollary 4.18, Gf is continuous. Let x 2 Rr C. Then Gf (x) =x  f(x)/f 0(x) and hence G0f (x) = f(x)f 00(x)/(f 0(x))2  0. It follows thatGf is increasing on R r C and hence on R. Furthermore, Gf is (firmly)nonexpansive if and only if G0f (x)  1, which gives the remaining character-ization.Example 4.28. Suppose that H = R, let ↵ > 0, and suppose that (8x 2 R)f(x) = xn  ↵, where n 2 {2, 4, 6, 8, . . .}. Then Gf is firmly nonexpansive.Proof. If x 2 R r C, then (f 0(x))2  f(x)f 00(x) = nxn2(↵n+ xn  ↵) > 0and we are done by Proposition 4.27.Example 4.29. Suppose that H = R and that f : x 7! exp(|x|) 1. Then(8x 2 H) Gf (x) = x sgn(x)(1 exp(|x|)) and G0f (x) = 1 exp(|x|) 2[0, 1[. It follows that Gf is firmly nonexpansive14.Example 4.30. Suppose that H = R and that f : x 7! exp(x2)  1. ThenGf is not (firmly) nonexpansive. Indeed, we compute (f 0(x))2f(x)f 00(x) =4x2 exp(x2)+2 exp(x2)2 exp(2x2), which strictly negative when |x| > 1.2.Now apply Proposition 4.27.Proposition 4.31. Suppose that H = R and that f is twice di↵erentiable,that min f(H) = 0, that g = f⇤(1/2)| · |2, and that 2ff 00  (2 + f 00)(f 0)2.Then Gg is firmly nonexpansive.Proof. We start by observing a couple of facts. First,g0 = IdProxf . (4.50)14Since G is monotone by Proposition 4.27, its antiderivative x 7! 12x2|x|exp(|x|) isconvex — although this does not look like convex function on first glance! It is interestingto do this also for other instances of f .604.8. NonexpansivenessFix x 2 R. Recall y := Proxf (x) is a minimizer ofz 7! f(z) + 12|x z|2.Then we have0 2 @f(y) (x y).Since f is di↵erentiable, so we havex = y + f 0(y). (4.51)Because (y+f 0(y))0 = 1+f 00(y) > 0. If we view x as a function of y in (4.51),we have x(y) is di↵erentiable. By the Inverse Function Theorem15, we con-clude that y as a function of x is di↵erentiable. Hence implicit di↵erentiationgives1 = y0(x) + f 00(y)y0(x) = y0(x)(1 + f 00(y(x))).Hence y0 = 11+f 00(y(x)) and thusg00(x) =IdProxf0(x) = 1 11 + f 00(Proxf (x))=f 00Proxf (x)1 + f 00Proxf (x) .(4.52)In view of Proposition 4.27 and because g(x) = f(Proxf (x)) + (1/2)(x Proxf (x))2 we must verify that gg00  (g0)2, i.e.,f(Proxf (x)) +12(x Proxf (x))2f 00Proxf (x)1 + f 00Proxf (x)  x Proxf (x)2.(4.53)Again writing y = Proxf (x) gives x  y = f 0(y) and so see that (4.53) isequivalent to f(y) + 12(f0(y))2f 00(y)1 + f 00(y) f 0(y)2. (4.54)Actually, (4.54) is equivalent to our assumption on f . Indeed, (8y 2 R)15The Inverse Function Theorem: (see [33, Theorem 1A.1] on page 10) Let f : Rn !Rn be continuously di↵erentiable on some open set containing x0. Suppose that thedeterminant of Jacobian of f(x0) does not equal to 0. Then there exists an open set Vcontaining x0 and an open set U containing f(x0) such that f : V ! U ha a continuousinverse f1 : U ! V which is di↵erentiable.614.9. The decreasing propertyf(y) + 12(f0(y))2f 00(y)1 + f 00(y) f 0(y)2 (4.55),f(y) + 12(f 0(y))2f 00(y)  (1 + f 00(y))(f 0)2 (4.56),f(y)f 00(y)  (1 + 12f 00(y))(f 0(y))2 (4.57),2f(y)f 00(y)  (2 + f 00(y))(f 0(y))2. (4.58)Therefore, by Proposition 4.27, Gg is firmly nonexpansive.We conclude this section with a result on the range of IdGf .Proposition 4.32. We have ran(IdGf ) ✓ cone ran @f ✓ (recC) .Proof. Let y⇤ 2 ran @f . Then there exists y 2 dom @f such that y⇤ 2 @f(y).Let c 2 C and x 2 recC. Then (c + nx)n2N lies in C. Hence (8n  1)0  f(c+ nx)  f(y) + hy⇤, c+ nx yi. Thenhy⇤, xi  hy⇤, y  ci  f(y)n! 0 as n! +1. (4.59)It follows that y⇤ 2 (recC) . Recall that recC is actually a cone. Thus,cone ran @f ✓ (recC) . Let z 2 Hr C and z⇤ = s(z) 2 @f(z), we havez Gf (z) = f(z)kz⇤k2 z⇤ 2 cone ran @f.Therefore, ran(IdGf ) ✓ cone ran @f ✓ (recC) .4.9 The decreasing propertyWe say that f has the decreasing property if(8x 2 H) sup f(Gx)  f(x). (4.60)This property is interesting in the context of applying subgradient projectorsas building blocks for algorithms. To investigate the decreasing property,it suces to consider points outside C.Proposition 4.33. If (8x 2 H) Gf (x) 2 conv({x} [ C), then f has thedecreasing property.624.9. The decreasing propertyProof. Let x 2 H r C. Then there exists c 2 C and  2 [0, 1] such thatGf (x) = (1  )x + c. It follows that f(Gf (x))  (1  )f(x) + f(c) (1 )f(x)  f(x).Lemma 4.34. Let (x, y, z) 2 R3 be such that x 6= z and (z y)(x y)  0.Then y 2 conv{x, z}.Proof. Suppose first that z < x. If y > x, then (z  y)(x  y) > 0 becauseit is the product of two strictly negative numbers. Similarly, if y < z, then(z  y)(x y) > 0. We deduce that y 2 [z, x]. Analogously, when x < z, weobtain that y 2 [x, z]. In either case, y 2 conv{x, z}.The next result shows that G is particularly well behaved on the realline.Corollary 4.35. Suppose that H = R. Then f has the decreasing property.Proof. Let x 2 R r C. Then x 6= PCx and, by Fact 4.1 (iv), (PCx Gf (x))(x  Gf (x))  0. Lemma 4.34 thus yields Gf (x) 2 conv{x, PCx}.Hence Gf (x) 2 conv({x} [ C), and we are done by Proposition 4.33.The next example shows that the decreasing property is not automatic.Example 4.36. Suppose that H = R2, that C1 = R ⇥ {0}, that C2 ={(⇠, ⇠) 2 H : ⇠ 2 R}, and that f = max{dC1 , dC2}. Then f does not havethe decreasing property.Proof. Set x = (2, 1). Then, using Proposition 4.6, we obtain that Gf (x) =(2, 0) and f(x) = 1 <p2 = f(Gf (x)).We now illustrate that the sucient condition of Proposition 4.33 is notnecessary:Example 4.37. Suppose that H = R2 and that (8x = (x1, x2) 2 R2)f(x) = |x1| + |x2|. Then f has the decreasing property, G2f (x) = (0, 0) yetGf (x) /2 conv{(0, 0), x} for almost every x 2 R2. Furthermore, Gf is notmonotone.Proof. We have lev0f = {(0, 0)} andGf (x1, x2) =8>>>>><>>>>>:⇣ |x1||x2|2|x1| x1,|x2||x1|2|x2| x2⌘if x1 6= 0, x2 6= 0,⇣ s|x2|s2+1 , s2s2+1x2⌘if x1 = 0, x2 6= 0,⇣s2s2+1x1, s|x1|s2+1⌘if x1 6= 0, x2 = 0,(0, 0) if x1 = x2 = 0,(4.61)634.9. The decreasing propertywhere s 2 [1, 1] depends on x.In particular, when x1x2 > 0,Gf (x1, x2) =✓x1  x22,x2  x12◆so it is a projection on R(1,1).When x1x2 < 0,Gf (x1, x2) =✓x1 + x22,x1 + x22◆so it is a projection on R(1, 1).(i) f has the decreasing property.When x1 6= 0, x2 6= 0, thenf(Gf (x1, x2)) = ||x1| |x2||  |x1|+ |x2| = f(x1, x2).When x1 = 0, x2 6= 0, we havef(Gf (x1, x2)) =|sx2|s2 + 1+s2|x2|s2 + 1=(|s|+ s2)|x2|s2 + 1 |x2| = f(0, x2).When x1 6= 0, x2 = 0, we havef(Gf (x1, x2)) =(s2 + |s|)|x1|s2 + 1 |x1| = f(x1, 0).(ii) Gf (x) /2 conv{(0, 0), x} for almost every x 2 R2.Indeed, [(x1, x2), (0, 0)] = {(x1, x2) : 0    1}.When (x1, x2) 2 R2r⇣R(1, 0)[R(0, 1)[R(1, 1)[R(1,1)⌘, we have|x1| 6= |x2| so that|x1| |x2|2|x1||x2| |x1|2|x2| < 0thenGf (x1, x2) =✓ |x1| |x2|2|x1| x1,|x2| |x1|2|x2| x2◆62 [(x1, x2), (0, 0)].644.9. The decreasing propertyMoreover,G2f (x1, x2) (4.62)= Gf (Gf (x1, x2)) = (4.63)0BB@|x1||x2|2 |x2||x1|2|x1| |x2| |x1| |x2|2|x1| ,|x2||x1|2 |x1||x2|2|x2| |x1| |x2| |x1|2|x2|1CCA(4.64)= (0, 0). (4.65)(iii) Gf is not monotone.Let x = (1, 3) and y = (1, 3). Referring to (4.61), we haveGf (x) =✓1 32(1), 3 163◆= (1, 1)andGf (y) =✓1 32,3 163◆= (1, 1).Hence hx  y,Gf (x)  Gf (y)i = h(2, 0), (2, 0)i = 4 < 0, so Gf isnot monotone.Remark 4.38. (infeasibility detection) Using the decreasing property, oneobtains a sucient condition for infeasibility: Suppose that H = R and wefind a point x such that f(Gf (x)) > f(x). Then C must be empty becauseof Corollary 4.35. For instance, suppose that f : x 7! x2 + 1. Then(8x 2 Rr {0}) Gf (x) = (x2  1)/(2x). (4.66)Now set x = 1/2. Then Gf (x) = 3/4 and f(Gf (x)) = 25/16 > 5/4 = f(x).It is known since the 19th century that the concrete instance (4.66) exhibitschaotic behaviour; see, e.g., [38, Problem 7-a on page 72].Remark 4.39. (Newton iteration) Suppose that H = R and that f is di↵er-entiable on X r C. Then(8x 2 Rr C) Gf (x) = x f(x)f 0(x)2 f 0(x) = x f(x)f 0(x) (4.67)is the same as the Newton operator for finding a zero of f !654.9. The decreasing propertyThe decreasing property is preserved in certain cases:Proposition 4.40. Suppose that f has the decreasing property. Then thefollowing hold:(i) If ↵ > 0, then ↵f has the decreasing property.(ii) If ↵  1, then (f+)↵ has the decreasing property.Proof. Let x 2 Hr C.(i): Let ↵ > 0. (↵f)G↵f (x) = (↵f)Gf (x) and hence sup(↵f)(G↵f (x)) =↵ sup f(Gf (x))  ↵f(x) = (↵f)(x) by Proposition 4.3 (i).(ii): Set g = (f+)↵ and  = 1/↵. Then 0 <   1 and Gg(x) =(1)x+Gf (x) by Proposition 4.3 (iii). Hence sup g(Ggx)  (1)g(x)+ sup g(Gfx). On the other hand, since f has the decreasing property, thenf(Gfx)  f(x), then g(Gfx) =⇣f+(Gfx)⌘↵  ⇣f+(x)⌘↵ = g(x), thussup g(Gf (x))  g(x). Altogether, sup g(Ggx)  g(x), i.e., g is decreasing.The following result is complementary to the decreasing property.Proposition 4.41. Suppose that f is strictly convex at x 2 H and f(x) > 0.Then f(Gf (x)) > 0.Proof. Recall that f is strictly convex at x if (8y 2 H r {x}) (8 2 ]0, 1[)f((1)x+y) < (1)f(x)+f(y). Arguing as in [18, proof of Proposi-tion 5.3.4 (a)], we see that 12hs(x), Gf (x)xi = hs(x), (12x+ 12Gf (x))xi f(12x+12Gf (x)) f(x) < 12f(x) + 12f(Gf (x)) f(x) = 12(f(Gf (x)) f(x)).Therefore, f(Gf (x)) > f(x) + hs(x), Gf (x) xi = 0 using Fact 4.1 (i).Remark 4.42. Suppose that f is strictly convex. Then Proposition 4.41shows that iterating Gf starting at a point outside C will never reach Cin finitely many steps. This is clearly illustrated by Example 4.8, whichshows that the function dC , even though it is neither strictly convex nordi↵erentiable everywhere, performs best because Gf = PC yields a solutionafter just one step.664.10. The subgradient projector of f(x1, x2) = |x1|p + |x2|p4.10 The subgradient projector off(x1, x2) = |x1|p + |x2|pThis section contains a case study. It reveals that the various propertiesof Gf can exhibit complicated behaviour.The following result complements Example 4.37.Proposition 4.43. Suppose that H = R2 and that f : (x1, x2) 7! |x1|p +|x2|p, where p > 1, and let x = (x1, x2) 2 R2 r {(0, 0)}. Then Gf (x) is x1 |x1|p + |x2|p|x1|p1 sgn(x1)p|x1|2p2 + |x2|2p2 , x2 |x1|p + |x2|p|x2|p1 sgn(x2)p|x1|2p2 + |x2|2p2!(4.68)and the following hold:(i) If p  2, then f(x)  f(Gf (x))  (1 2p1)pf(x).(ii) If 1 < p  2, then f(x)  f(Gf (x))  21(1 p1)pf(x).(iii) If 1 < p < 2, then Gf is not monotone.Proof. The formula (4.68) is a direct verification. More precisely, sincerf(x1, x2) = (p|x1|p1 sgn(x1), p|x2|p1 sgn(x2)). (4.69)When x1 = 0, x2 6= 0, we have rf(0, x2) = (0, p|x2|p1 sgn(x2)) from aboveformula, then we haveGf (0, x2) = (0, (1 1/p)x2).As p > 1, so we havef(Gf (0, x2)) = 0 + (1 1/p)p|x2|p = (1 1/p)pf(0, x2)  |x2|p  f(x1, x2).The proof for x1 6= 0 and x2 = 0 is similar. Hence (i) (ii) hold when x1 = 0or x2 = 0. Consequently, we assume that x1 6= 0 and x2 6= 0. Thus (4.69)gives Gf (x), which is x1 |x1|p + |x2|p|x1|p1 sgn(x1)p|x1|2p2 + |x2|2p2 , x2 |x1|p + |x2|p|x2|p1 sgn(x2)p|x1|2p2 + |x2|2p2!(4.70)674.10. The subgradient projector of f(x1, x2) = |x1|p + |x2|pNow (4.70) gives f(Gf (x1, x2)) =|x1|p1 (|x1|p + |x2|p)|x1|p2p(|x1|2p2 + |x2|2p2)p + |x2|p 1 (|x1|p + |x2|p)|x2|p2p(|x1|2p2 + |x2|2p2)p .(4.71)Defineci(x1, x2) =|x1|p + |x2|p|xi|p2p|x1|2p2 + |x2|2p2 ,for i = 1, 2.(i): If p  2. Note thatf(Gf (x)) = |x1|p1 c1p + |x2|p1 c2p. (4.72)When |x1|  |x2|, we havec1(x1, x2)  2|x1|p|x1|p2p|x1|2p2 =2p.When |x1|  |x2|, because p  2 we have |x1|p2  |x2|p2 so thatc1(x1, x2)  2|x2|p|x2|p2p|x2|2p2 =2p.Similarly, we havec2(x1, x2)  2p.Thus, for i 2 {1, 2} we have0 < ci(x1, x2)  2p.Therefore,0  1 2p 1 ci(x1, x2)  1.Together with (4.72), we obtain✓1 2p◆pf(x1, x2)  f(Gf (x1, x2))  f(x1, x2).684.10. The subgradient projector of f(x1, x2) = |x1|p + |x2|p(ii): If 1 < p  2. We assume that |x1|  |x2|, the other case is treatedanalogously. Write f(Gf (x1, x2))= |x1|p1 (|x1|p + |x2|p)|x1|p2p(|x1|2p2 + |x2|2p2)p + |x2|p 1 (|x1|p + |x2|p)|x2|p2p(|x1|2p2 + |x2|2p2)p(4.73)= |x2|p✓ |x1|p|x2|p1 (|x1|p + |x2|p)|x1|p2p(|x1|2p2 + |x2|2p2)p◆ (4.74)+ |x2|p1 (|x1|p + |x2|p)|x2|p2p(|x1|2p2 + |x2|2p2)p (4.75)= |x2|p |x1||x2|  |x1||x2| (|x1|p + |x2|p)|x1|p2p(|x1|2p2 + |x2|2p2)p (4.76)+ |x2|p1 (|x1|p + |x2|p)|x2|p2p(|x1|2p2 + |x2|2p2)p . (4.77)Set t = |x1|/|x2|. Then 0 < t  1. Definec1(t) =|x1||x2| |x1||x2|(|x1|p + |x2|p)|x1|p2p(|x1|2p2 + |x2|2p2) = tt2p1 + tp1p(t2p2 + 1),c2(t) = 1 (|x1|p + |x2|p)|x2|p2p(|x1|2p2 + |x2|2p2) = 1tp + 1p(t2p2 + 1).Thenf(Gf (x1, x2)) = |x2|p(|c1(t)|p + |c2(t)|p). (4.78)Since p 2  0, we have tp2  1 and tp > 0, hence1  c2  1 1 + tpp1 + tp = 1 1p  0. (4.79)Thus c2  0. We now claim that|c1|+ c2  1. (4.80)This will implies |ci(t)|  1 (i = 1, 2) so that |ci(t)|p  |ci(t)| as p > 1. Thenfrom (4.78) we will havef(Gf (x1, x2)) = |x2|p(|c1(t)|p + |c2(t)|p) (4.81) |x2|p(|c1(t)|+ |c2(t)|) (4.82) |x2|p (4.83) f(x1, x2). (4.84)694.10. The subgradient projector of f(x1, x2) = |x1|p + |x2|pObserve that (4.80) is equivalent toc1 + c2  1 (4.85a)c1 + c2  1 (4.85b)Sincec1 + c2 = t t2p1 + tp1p(t2p2 + 1)+ 1 tp + 1p(t2p2 + 1).Thus (4.85a) is equivalent tot t2p1 + tp1p(t2p2 + 1)+ 1 tp + 1p(t2p2 + 1) 1.That ist  (1 + tp)(1 + tp1)p(1 + t2p2). (4.86)And (4.85b) is equivalent toc1 + c2 = t+ t2p1 + tp1p(t2p2 + 1)+ 1 tp + 1p(t2p2 + 1) 1,we havetp1(1 + tp)p(1 + t2p2) t+ 1 + tpp(1 + t2p2). (4.87)Now check that (4.86) holds by using tp1  1 and the convexity of h : ⇠ 7!1 + ⇠p, which implies h(t)  h(1) + h0(1)(t 1), i.e., pt  1 + tp. Hencet  tp + 1p (tp + 1)(tp1 + 1)p(t2p2 + 1).Since 0 < tp1  1 when 0 < t  1and 1 < p  2, this implies thattp1(tp + 1)p(t2p2 + 1) tp + 1p(t2p2 + 1)from which (4.87) follows. Therefore, we have proved that |c1|+ c2  1 andhence f has decreasing property.Furthermore, using (4.78), (4.79) and the assumption that |x2|  |x1|, weobtainf(Gf (x))  cp2|x2|p 1 1pp|x2|p  1 1pp |x1|p + |x2|p2 =1 1pp2f(x).(4.88)704.11. Gf and the Yamagishi–Yamada operator(iii): Consider the points y = (1, ⇠) and z = (1, ⇠), where ⇠ > 0. Theny  z = (2, 0) andGf (y) =✓1 1 + ⇠pp(1 + ⇠2p2), ⇠  (1 + ⇠p)⇠p1p(1 + ⇠2p2)◆(4.89a)Gf (z) =✓1 + 1 + ⇠pp(1 + ⇠2p2), ⇠  (1 + ⇠p)⇠p1p(1 + ⇠2p2)◆. (4.89b)It follows thathGf (y)Gf (z), y  zi = 4✓1 1 + ⇠pp(1 + ⇠2p2)◆< 0 as ⇠ ! +1 (4.90)because lim⇠!+1 1+⇠pp(1+⇠2p2) = lim⇠!+1(2p 2)1⇠2p = +1 usingl’Hoˆpital’s rule. Therefore, Gf is not monotone.4.11 Gf and the Yamagishi–Yamada operatorIn this last section we study the accelerated version16 of Gf proposed byYamagishi and Yamada in [61]. Their operator, which has shown improvedperformance compared to Gf , is actually a subgradient projector of a variantof f when H is the real line17. For fixed L > 0 and r > 0, we assume inaddition thatf is Fre´chet di↵erentiable and rf is Lipschitz continuous with constant L,(4.91)and thatf is bounded below with inf f(H)  ⇢ (4.92)where ⇢ > 0 and we set(8x 2 H) ✓(x) = krf(x)k22L ⇢. (4.93)By [61, Lemma 1], we havef  ✓. (4.94)The Yamagishi–Yamada operator [61] isZ : H! H, (4.95)16See also [43] for another accelerated version of Gf .17Unfortunately our proof does not seem to extend to H (e.g., the set D may not beconvex and there is no obvious counterpart to (4.98)).714.11. Gf and the Yamagishi–Yamada operatordefined at x 2 H byZx =8>>>>>>><>>>>>>>:x, if f(x)  0;x rf(x)krf(x)k2 f(x), if f(x) > 0 and ✓(x)  0;x rf(x)krf(x)k2⇣f(x) +p✓(x) + ⇢p⇢2⌘, if f(x) > 0 and ✓(x) > 0.(4.96)Note that if f(x)  0 or ✓(x)  0, then Zx = Gf (x).We now prove that if H = R, then Z is itself a subgradient projector.Theorem 4.44. Suppose that H = R and that f is also twice di↵erentiable.Then for every x 2 R, (4.96) can be rewritten asZx =8>>>>>>>><>>>>>>>>:x, if f(x)  0;x 1f 0(x)f(x), if f(x) > 0 and |f 0(x)|  p2L⇢;x 1f 0(x) f(x) +✓ |f 0(x)|p2Lp⇢◆2!, if f(x) > 0 and |f 0(x)| > p2L⇢.(4.97)Set D = {x 2 R : ✓(x)  0} and assume that bdryD ✓ RrC. Then D is aclosed convex superset of C, and Z is a subgradient projector of a functiony, defined as follows. On D, we set y equal to f . The set RrD is empty, oran open interval, or the disjoint union of two open intervals. Assume thatI is one of these nonempty intervals, and let q be defined on I such that(8x 2 I) q0(x) = 1x Zx. (4.98)Now set d = PD(I) 2 D r C and(8x 2 I) y(x) = f(d)eq(d)eq(x). (4.99)The so-constructed function y : R! R is convex, and it satisfies Z = Gy.Proof. It is easy to check that (4.97) is the same as (4.96). Let x 2 R suchthat f(x) > 0 and ✓(x)  0, and setz(x) =|f 0(x)|p2Lp⇢ = sgnf 0(x)f 0(x)p2Lp⇢ =p✓(x) + ⇢p⇢  0.(4.100)724.11. Gf and the Yamagishi–Yamada operatorThenz0(x) =sgnf 0(x)f 00(x)p2L. (4.101)Using the convexity of f , (4.94), (4.100), and (4.101), we obtain0  f 00(x)f(x) ✓(x) (4.102a)= f 00(x)✓f(x)✓ |f 0(x)|p2L+p⇢◆✓ |f 0(x)|p2Lp⇢◆◆(4.102b)= f 00(x)✓f(x) + z(x)✓z(x) 2|f0(x)|p2L◆◆(4.102c)= f 00(x)✓f(x) + z2(x) 2z(x) |f0(x)|p2L◆(4.102d)= f 00(x)f(x) + z2(x) 2f 00(x)z(x) |f 0(x)|p2L(4.102e)= f 00(x)f(x) + z2(x) 2f 00(x)z(x)sgn f 0(x)f 0(x)p2L(4.102f)= f 00(x)f(x) + z2(x) f 0(x)2z(x)sgn f 0(x)f 00(x)p2L(4.102g)= f 00(x)f(x) + z2(x) f 0(x)2z(x)z0(x). (4.102h)Because x  Zx = f(x)+z2(x)f 0(x) is continuous, it is clear that there is an an-tiderivative q on I such thatq0(x) =1x Zx =f 0(x)f(x) + z2(x). (4.103)Calculus and (4.102) now result inq00(x) =f 00(x)f(x) + z2(x) f 0(x)f 0(x) + 2z(x)z0(x)f(x) + z2(x)2 (4.104a)=f 00(x)f(x) ✓(x) f 0(x)2f(x) + z2(x)2 . (4.104b)Observe that y is clearly continuous everywhere. Furthermore, y0(x) =734.11. Gf and the Yamagishi–Yamada operatorf(d)eq(d)eq(x)q0(x) and hence, using (4.103), (4.104) and again (4.102), we obtainy00(x) =f(d)eq(d)⇣eq(x)q0(x)2+ eq(x)q00(x)⌘(4.105)=f(d)eq(d)eq(x)⇣q0(x)2+ q00(x)⌘(4.106)= y(x)f 00(x)f(x) ✓(x)f(x) + z2(x)2 (4.107) 0. (4.108)Hence y is convex on I. As x 2 I approaches d, we deduce (because d /2 C,i.e., f(d) > 0) that q0(x) ! f 0(d)f(d)+z2(d)) = f0(d)f(d) and hence that y0(x) !f(d)eq(d)eq(d) f0(d)f(d) = f0(d). It follows that y is convex on R. Finally, if x /2 D,then Gy(x) = x y(x)/y0(x) = x 1/q0(x) = x (x Zx) = Zx.Example 4.45. Consider Theorem 4.44 and assume that f : x 7! x2  1,that L = 3, and that ⇢ = 1. Then (4.97) turns intoZx =8>>>>>><>>>>>>:x, if |x|  1;x2 + 12x, if 1 < |x|  p6/2;x2 + 2p6|x|6x, if |x| > p6/2.(4.109)Hence D =⇥p6/2,p6/2⇤. Using elementary manipulations, we obtain(8x 2 RrD) q(x) = 65 ln56 |x|p63; (4.110)Indeed, since q0(x) is defined in I such that q0(x) = 1xZx .q0(x) =1x Zx=1x x2+2p6|x|6x=6x6x2  x2  2p6|x|=6x5x2  2p6|x|=15x6 p63|x|x.744.11. Gf and the Yamagishi–Yamada operatorWhen x < p62 , then q0(x) = 15x6 +p63. When x >p62 , we have q0(x) =15x6 p63. Applying integration, we haveq(x) =8<:65 ln56x+ p63  , if x < p62 ;65 ln56x p63  , if x > p62 .=8<:65 ln⇣56xp63⌘, if x < p62 ;65 ln⇣56xp63⌘, if x >p62 .Thus, we have(8x 2 RrD) q(x) = 65 ln56 |x|p63; (4.111)By definition of function y in Theorem 4.44: on D, y = f ; on R r D, I isone of two open intervals which are disjoint included in RrD,(8x 2 I) y(x) = f(d)eq(d)eq(x).Recall that D =hp62 ,p62i. Then we have y(x) = x2  1 when |x| p62 .Now when |x| >p62 , we have x 2 I, thus d = PD(x) =p62 .y(x) =f(d)eq(d)eq(x)=64  1eln⇣56p62 p63⌘ 65eln⇣56 |x|p63⌘ 65=12⇣p612⌘ 65 56|x|p63! 65=12⇣p612⌘ 65 5|x| 2p66! 65=121⇣p612⌘ 65✓16◆ 65 ⇣5|x| 2p6⌘ 65=12✓2p6◆ 65 ⇣5|x| 2p6⌘ 65754.11. Gf and the Yamagishi–Yamada operator=21265635⇣5|x| 2p6⌘ 65=215 6256⇣5|x| 2p6⌘ 65=215 (2 · 3) 256⇣5|x| 2p6⌘ 65=(2 · 36) 156⇣5|x| 2p6⌘ 65.Thus we have y(x) = 721/565|x|  2p66/5. Consequently, the function y,given by(8x 2 R) y(x) =8><>:x2  1, if |x|  p6/2;721/565|x| 2p66/5, if |x| > p6/2, (4.112)satisfies Gy = Z by Theorem 4.44.This chapter gave a comprehensive study of properties of the subgradientprojector, which included calculus rules, characterization of strong-to-strongand strong-to-weak continuity, nonexpansiveness, monotonicity and the de-creasing property. We also considered the Yamagishi-Yamada operator anddiscovered it is a subgradient projector of a di↵erent convex function.76Chapter 5Characterizations ofSubgradient Projectors5.1 MotivationThis chapter is based on the manuscript [10]. The results in this chapterextend some of those found in previous chapters. Chapter 4 mainly focuseson the subgradient projector of a convex function. In this chapter, we wantto explore the subgradient projector of nonconvex functions. We investigatewhen a subgradient projector is a cutter or when a mapping is a subgradientprojector of a convex function. Moreover, we present calculus rules for thesubgradient projector of nonconvex functions.Studies of optimization problems and convex feasibility problems haveled in recent years to the development of a theory of subgradient projectors.Rather than finding projections on the level sets of the original functions,the iterative algorithms find projection on half spaces containing the former.Polyak developed the subgradient projector algorithms for convex functions[48–50]; this was further developed by Censor, Combettes, Yamada andothers, and also applied to optimization problems [19, 21, 22, 29, 30, 44, 58].We want to extend the basic theory of subgradient projectors to non-convex functions. As far as nonconvex functions are concerned, the cuttertheory or T -class operators developed by Cegielski [20] or Bauschke, Borweinand Combettes [3] furnishes a new approach to subgradient projectors with-out appealing to the existence theory of subgradient projectors for convexfunctions.5.2 Preliminary and definitionsThroughout the following two chapters, we assumeX = Rn775.2. Preliminary and definitionsand f : X ! ]1,+1] is a lower semicontinuous function, its 0-level setis denoted bylev0f = {x 2 X : f(x)  0}. (5.1)T : X ! X is a mapping.To introduce subgradient projectors for possible non-convex functions,we need the following subgradients [40, 42, 53].Definition 5.1. Suppose that x¯ 2 X such that f(x¯) is finite. For a vectorv 2 X, one says that(i) v is a regular subgradient of f at x¯, written v 2 @ˆf(x¯), iff(x)  f(x¯) + hv, x x¯i+ o(kx x¯k), (5.2)(ii) v is a limiting (or Mordukhovich) subgradient of f at x¯, written v 2@f(x¯), if there are sequences xn ! x¯, f(xn)! f(x¯) and vn 2 @ˆf(xn)with vn ! v.(iii) f is subdi↵erentially regular at x¯ if @ˆf(x¯) = @f(x¯).Remark 5.2. The regular subgradient inequality (5.2) in o form is shorthandfor the one-side limit conditionlim infx!x¯,x 6=x¯f(x) f(x¯) hv, x x¯ikx x¯k  0. (5.3)This is because if v satisfies (5.2), which is equivalent tof(x) f(x¯) hv, x x¯i  o(kx x¯k), (5.4)Dividing kxx¯k on both sides of above inequality and applying the definitionof o form, we have (5.3). Conversely, suppose that v satisfies (5.3). Thenby definition of limit inferior, we havef(x) f(x¯) hv, x x¯ikx x¯k  ", (5.5)where " is any positive number and x suciently close to x¯. Simplifying thisinequality, we havef(x) f(x¯) hv, x x¯i  "kx x¯k. (5.6)785.2. Preliminary and definitionsLet (x) = min{f(x)  f(x¯)  hv, x  x¯i, 0}. Obviously, we have (x)  0.From (5.6) we have(x) = min{f(x) f(x¯) hv, x x¯i, 0}  min{"kx x¯k, 0}. (5.7)Dividing kx x¯k on both sides of above equation, we have(x)kx x¯k  ". (5.8)So we havelim infx!x¯(x)kx x¯k  ". (5.9)Letting "! 0, thenlim infx!x¯(x)kx x¯k  0. (5.10)Moreover, (x)  0, solim supx!x¯(x)kx x¯k  0. (5.11)Then we have limx!x¯ (x)kxx¯k = 0. Thus (5.2) holds.Next, we provide an example and show how to calculate correspondingsubdi↵erentials for a specific function.Example 5.3. Consider function f(x) = |x| on R. Then(i) @ˆf(0) = ?.(ii) @ˆf(x¯) = {1} if x¯ > 0; @ˆf(x¯) = {1} if x¯ < 0.(iii) @f(x) = {1} if x > 0; @f(x) = {1} if x < 0.(iv) @f(0) = {1, 1}.Proof. (i) We prove it by contradiction. Assume there exists v 2 @ˆf(0)such that v satisfying (5.2). Then we havelim infx!0,x 6=0|x| vx|x| = lim infx!0,x 6=0(1 v sgn(x))  0. (5.12)When x ! 0+, v must satisfy v  1 such that (5.12) holds; whenx! 0, v has to be greater or equal to 1 such that (5.12) holds. Hencehere is no proper v such that (5.12) holds for any x 2 R. Therefore,@ˆf(0) = ?.795.2. Preliminary and definitions(ii) Let x¯ < 0 and v 2 @ˆf(x¯). Then we havelim infx!x¯,x 6=x¯✓ |x¯| |x||x x¯|  v sgn (x x¯)◆ 0. (5.13)When x! x¯, we have v  1; when x! x¯+, we have v  1. Therefore@ˆf(x¯) = {1}. The proof is similar when x¯ > 0.(iii) It is clear from definition of limiting subdi↵erential.(iv) From (ii), we can see when xn ! 0+, vn = 1 2 @ˆf(xn) and xn ! 0,vn = 1 2 @ˆf(xn). There are just two sides: lim vn = 1 and lim vn =1. Thus @f(0) = {1, 1}.Definition 5.4. [53, Definition 9.1] We say f : X ! ]1,+1[ is locallyLipschitz continuous near a point x 2 X if 9⇢ 2 R++ and L 2 R+ such that|f(y) f(z)|  Lky  zk, (5.14)for any y, z 2 ball(x, ⇢).Remark 5.5. [53, Corollary 8.10] When f is lower semi-continuous, the setof points at which @f is nonempty-valued is at least dense in the domain off . When f is locally Lipschitz, @f is nonempty-valued everywhere.Remark 5.6. When function f is convex, recall v is a subgradient of f at x¯,written v 2 @f(x¯), iff(x)  f(x¯) + hv, x x¯i. (5.15)We use the same notation since all subdi↵erentials are the same for convexfunctions. (For more details, see [40, Theorem 1.93].)Definition 5.7. For a general lower semicontinuous function f : X !]1,+1], the subgradient projector of f is defined byGf : X ! X : x 7!(x f(x)ks(x)k2 s(x) if f(x) > 0 and 0 62 @f(x),x otherwise,(5.16)where s : X ! X is a selection of @f with s(x) 2 @f(x).Proposition 5.8. (i) When f is continuously di↵erentiable on X \ lev0f ,Gf reduces toGf : X ! X : x 7!(x f(x)krf(x)k2rf(x) if f(x) > 0 and rf(x) 6= 0,x otherwise.(5.17)805.2. Preliminary and definitions(ii) When f is convex and infX f  0, Gf reduces toGf : X ! X : x 7!(x f(x)ks(x)k2 s(x) if f(x) > 0,x otherwise,(5.18)where s : X ! X is a selection of @f with s(x) 2 @f(x). Referring toLemma 3.13, f(x) > 0 automatically implies s(x) 6= 0.Recall that the projection on half space:Fact 5.9. ([7] or [20, page 133]) For the half spaceH(a,) := {z 2 X : ha, zi  }, (5.19)where a 2 X, a 6= 0 and  2 R, its metric projection is given byPH(a,)x =(x ha,xikak2 a if ha, xi > ,x if ha, xi  . (5.20)Proposition 5.10. (i) For a locally Lipschitz f : X ! R, when f(x) > 0,0 62 @f(x), we haveGf (x) = PH(s(x),(x))(x) (5.21)where s(x) 2 @f(x) and the half spaceH(s(x),(x)) : = {z 2 X : f(x) + hs(x), z  xi  0} (5.22)= {z 2 X : hs(x), zi  hs(x), xi  f(x)}. (5.23)(ii) If f : X ! ]1,+1], thenFixGf = {x 2 X : 0 2 @f(x)} [ lev0f. (5.24)If f is locally Lipschitz, FixGf is closed.(iii) If f : X ! ]1,+1] is convex and infX f  0, thenFixGf = lev0f. (5.25)Proof. (i): Apply Fact 5.9 with a := s(x), := (x) = hs(x), xi  f(x).(ii): This follows from the definition of Gf . When f is locally Lipschitz,@f is outer-semicontinuous, so {x 2 X : 0 2 @f(x)} is closed.18 Being aunion of two closed sets, FixG is closed.18(See [53]) For the definition of outer-semicontinuity of subdi↵erentials (page 151). Forthe relevant result, see Proposition 8.7 on page 302.815.2. Preliminary and definitions(iii): When f is convex, 0 2 @f(x) gives f(x) = minX f , so f(x)  0.Then{x 2 X : 0 2 @f(x)} ⇢ lev0f. (5.26)Thus (iii) follows from (ii).Fact 5.11. [46, Proposition 1.6] If the convex function f is continuous,then it is locally Lipschitz. Thus Proposition 5.10 is also true for convexfunctions.A simple example illustrates the di↵erence between convex and noncon-vex case.Example 5.12. Consider f : X ! R by f(x) = kpkxk = kxk1/k wherek > 0. Then(i) When k  1, f is convex, Gf = (1 k) Id is firmly nonexpansive.(ii) When k > 1, f is not convex, Gf = (1k) Id may be neither monotonenor nonexpansive, e.g, k = 3.Proof. This follows from definition directly or use Gf = (1  k) Id+kGk·kby Theorem 5.27 and Gk·k ⌘ 0. Indeed,(i): If 0 < k  1. We want to show function f is convex. Define h(t) = t 1kwhere t 2 R. Then f(x) = h(kxk). Since h0(t) = 1k t1k1 > 0 and h00(t) =1k (1k  1)t1k2 > 0. Thus function h as defined is an increasing convexfunction on the real line. Moreover, k · k is also convex. By [41, Proposition1.39], we have f is convex. Notice thatkGf (x)Gf (y)k2 = k(1 k)x (1 k)yk2 (5.27)= k(1 k)(x y)k2 (5.28)= (1 k)2kx yk2, (5.29)andhx y,Gf (x)Gf (y)i = hx y, (1 k)(x y)i (5.30)= (1 k)kx yk2. (5.31)Since 0 < k  1, then (1 k)2  (1 k), thuskGf (x)Gf (y)k2  hx y,Gf (x)Gf (y)i. (5.32)825.3. Calculus for subgradient projectorsHence Gf is firmly nonexpansive.(ii) Define  : R ! R with (t) = f(td) where d is a direction with d 6= 0.Then (t) = |t| 1k kdk 1k . Since kdk 1k is a constant, we assume g(t) = |t| 1k .Then g0(t) = 1k |t|1k2t and g00(t) = 1k |t|1k2( 1k  1) < 0 when k > 1. Thus fis not convex. Fig. 5.1(b) shows that f is not convex. For x, y 2 X, whenk > 1, we havehxy,Gf (x)Gf (y)i = hxy, (1k)(xy)i = (1k)kxyk2 < 0. (5.33)Then Gf is not monotone when k > 1.For x, y 2 X,kGf (x)Gf (y)k = k(1 k)(x y)k = |1 k|kx yk. (5.34)When k > 2, it is obvious that Gf is not nonexpansive.(a) Graph of f is defined when k =13(b) Graph of f is given when k =3.Figure 5.1: Illustration for Example 5.125.3 Calculus for subgradient projectorsWe start with a stronger di↵erentiability than Fre´chet di↵erentiability(see Definition 2.62).835.3. Calculus for subgradient projectorsDefinition 5.13. [40, Definition 1.13] A function f : X ! ]1,+1] isstrictly di↵erentiable at x¯ 2 X iflimx!x¯,u!x¯f(x) f(u)rf(x¯)(x u)kx uk = 0, (5.35)where rf(x¯) is gradient of f at x¯.It is clear that when u = x¯, Definition 5.13 coincides with definition ofFre´chet di↵erentiability. However, Fre´chet di↵erentiable functions are notalways strictly di↵erentiable, see the following example.Example 5.14. A function f : R! R is given byf(x) =(x2 if x is rational,0 otherwise.(5.36)Then f is Fre´chet di↵erentiable but not strictly di↵erentiable at x¯ = 0.Proof. Since x¯ = 0 is a rational number, so f 0(x¯) = 2x¯ = 0.0  f(y)|y|  |y|. (5.37)Then no matter whether y is a rational or an irrational number, we stillhavelim0 6=|y|!0f(y)|y| = 0, (5.38)thus f is Fre´chet di↵erentiable at x¯ = 0. It remains to show that f is notstrictly di↵erentiable at x¯ = 0. Let x > 0 be rational and set u = ⇡x2+x /2Q. Note that if x! 0+, then u! 0+. Furthermore, since x u = ⇡x2 > 0,we havef(x) f(u) 0 · (x u)|x u| =x2|x u| =x2⇡x2=1⇡6! 0 (5.39)as x! 0+. Therefore, f is not strictly di↵erentiable at x¯ = 0.The following facts are crucial to study the calculus of subgradient pro-jectors.Fact 5.15. Assume that  : X ! X is locally Lipschitz at x¯ 2 X, and : X ! R be strictly di↵erentiable at (x¯). Then for m(x) = ((x)), onehas@m(x¯) = @h0(y¯),i(x¯) with y¯ = (x¯). (5.40)845.3. Calculus for subgradient projectorsProof. See [42, Theorem 6.5].Fact 5.16. [53, Theorem 10.6] Suppose f(x) = g(F (x)) for a proper lowersemicontinuous g : X ! ]1,+1] and a strictly di↵erentiable F : X ! X,and let x¯ be a point where f is finite and the Jacobian rF (x¯) has full rank.Then@f(x¯) = rF (x¯)⇤@g(y¯) with y¯ = F (x¯). (5.41)Fact 5.17. Let fj : X ! R be locally Lipschitz at x¯ and J(x¯) = {j : fj(x¯) =max{f1, f2}(x¯)}. Then@max{f1, f2}(x¯) ✓ conv{@fj(x¯) : j 2 J(x¯)}, (5.42)where the equality holds and max{f1, f2} is subdi↵erentially regular at x¯ ifthe functions fj is subdi↵erentially regular at x¯ for j 2 J(x¯).Proof. See [42, Theorem 7.5(ii)].These facts allow us to establish a calculus of subgradient projectors.Fact 5.18. Let f : X ! R be a lower semicontinuous function.(i) If k > 0 then Gkf = Gf .(ii) Let ↵ 2 R. Define Gf,↵ : X ! X as follow:Gf,↵(x) :=(x f(x)↵ks(x)k2 s(x) if f(x) > ↵ and 0 62 @f(x)x otherwise.(5.43)Then Gf,↵ = Gf↵.(iii) Let ↵ > 0. ThenGf↵(x) =(Gf (x) +↵s(x)ks(x)k2 if f(x) > ↵ and 0 62 @f(x)x otherwise.(5.44)Proof. (i) By [53, D. Rescaling 10(6)], we have @(kf) = k@f when k > 0.Note that kf(x) > 0 if and only if f(x) > 0, and 0 62 @(kf)(x) if and onlyif 0 62 @f(x). When kf(x) > 0 and 0 62 @(kf)(x), for s(x) 2 @f(x), we haveks(x) 2 k@f(x) = @(kf)(x) so thatGkf (x) = x kf(x)kks(x)k2ks(x) = xf(x)ks(x)k2 s(x) = Gf (x). (5.45)855.3. Calculus for subgradient projectorsWhen kf(x)  0 or 0 2 @(kf)(x), we have f(x)  0 or 0 2 @f(x), soGkf (x) = x = Gf (x).(ii) It suces to note that @(f  ↵) = @f . Indeed, let  = f and(x) = x  ↵. Applying Fact 5.15, we have @(f  ↵)(x) = @(())(x) =@h0(y),i(x) = @(x) = @f(x), where y = (x).(iii) When f(x) > ↵ > 0 and 0 62 @f(x), we have f(x) > 0 and 0 62 @f(x).For s(x) 2 @f(x), we have s(x) 2 @(f  ↵)(x). ThenGf↵(x) = x f(x) ↵ks(x)k2 s(x) (5.46)= x f(x)ks(x)k2 s(x) +↵ks(x)k2 s(x) (5.47)= Gf (x) +↵ks(x)k2 s(x) (5.48)When f(x)  ↵ or 0 2 @f(x), Gf↵(x) = x by definition.Remark 5.19. Fact 5.18 (i) is also true when f is convex. Some litera-ture has already contained this result, like[12, Proposition 3.1 (i)] and [44,Proposition 2.1].Proposition 5.20. Assume that f1, f2 : X ! R are locally Lipschitz andsubdi↵erentially regular. For the maximum function g = max(f1, f2), onehasGg(x) =8>>>><>>>>:Gf1(x) if g(x) > max(f2(x), 0), 0 62 @f1(x),Gf2(x) if g(x) > max(f1(x), 0), 0 62 @f2(x),s(x) if g(x) = f1(x) = f2(x) > 0, 0 62 conv{@f1(x), @f2(x)},x if g(x)  0, or 0 2 conv{@f1(x), @f2(x)}(5.49)wheres(x) = x fi(x)kvk2 v (5.50)with v 2 conv{@f1(x), @f2(x)}. In particular, when g(x) = f1(x) = f2(x) >0 and 0 62 conv{@f1(x), @f2(x)}, one may choose s(x) = Gf1(x) or Gf2(x).Consequently, one may think Gg as two piecewise defined function, namely,Gg(x) =8>>>><>>>>:Gf1(x) if g(x) > max(f2(x), 0), 0 62 @f1(x),Gf2(x) if g(x) > max(f1(x), 0), 0 62 @f2(x),Gf1(x) if g(x) = f1(x) = f2(x) > 0, 0 62 conv{@f1(x), @f2(x)},x if g(x)  0, or 0 2 conv{@f1(x), @f2(x)},(5.51)865.3. Calculus for subgradient projectorsorGg(x) =8>>>><>>>>:Gf1(x) if g(x) > max(f2(x), 0), 0 62 @f1(x),Gf2(x) if g(x) > max(f1(x), 0), 0 62 @f2(x),Gf2(x) if g(x) = f1(x) = f2(x) > 0, 0 62 conv{@f1(x), @f2(x)},x if g(x)  0 or 0 2 conv{@f1(x), @f2(x)}.(5.52)Proof. When g(x) > 0, we consider three cases: (i) g(x) > f2(x); (ii) g(x) >f1(x); (iii) g(x)  min(f2(x), f1(x)) which is g(x) = f1(x) = f2(x). Alsonote that@g(x) = conv(@f1(x), @f2(x)) (5.53)when f1(x) = f2(x) by Fact 5.17.Proposition 5.21. Assume that k 2 Rr {0}, and g(x) = f(kx). ThenGg(x) =1kGf (kx) (5.54)for every x 2 X. Moreover, FixGg = 1k FixGf .Proof. By Fact 5.16, @g(x) = k@f(y) where y = kx. Indeed, @g(x) =@hk, fi(y) = k@f(y) where y = kx.And 0 62 @g(x) if and only if 0 62 @f(y) with y = kx. When g(x) > 0 and0 62 @g(x), we have f(kx) > 0 and 0 62 @f(kx). Suppose s(y) 2 @f(y) withy = kx. Then ks(y) 2 k@f(y) = @g(x). ThereforeGg(x) = x f(kx)kks(y)k2ks(y) = x1kf(kx)ks(y)k2 s(y) (5.55)=1k✓kx f(kx)ks(y)k2 s(y)◆=1kGf (kx) (5.56)When g(x)  0 or 0 2 @g(x), we have f(kx)  0 or 0 2 @f(y) withy = kx, thusGg(x) = x =1kkx =1kGf (kx). (5.57)This establishes the result.Remark 5.22. Proposition 5.21 can be guaranteed if f is a continuous andconvex function, see [12, Proposition 3.1 (ii)].875.3. Calculus for subgradient projectorsFor an operator A : X ! X, A⇤ is the adjoint of A.Proposition 5.23. Assume that A : X ! X is unitary, i.e., A⇤A = Id,and let b 2 X, f : X ! ]1,+1]. Define x 7! g(x) := f(Ax+ b). ThenGg(x) = A⇤Gf (Ax+ b) b (5.58)for every x 2 X. Furthermore,FixGg = A⇤(FixGf  b). (5.59)Proof. By Fact 5.16, @g(x) = A⇤@f(y) where y = Ax + b. As A is unitary,8s(y) 2 @f(y), we have kA⇤s(y)k = ks(y)k. Indeed,kA⇤s(y)k2 = hA⇤s(y), A⇤s(y)i = hs(y), AA⇤s(y)i = hs(y), s(y)i = ks(y)k2.(5.60)When g(x) > 0 and 0 62 @g(x), we have f(Ax + b) > 0 and 0 62 @f(y) withy = Ax+ b. Let s(y) 2 @f(y), we have A⇤s(y) 2 @g(x).Gg(x) = x f(Ax+ b)kA⇤s(y)k2A⇤s(y) = A⇤✓Ax+ b f(Ax+ b)ks(y)k2 s(y) b◆(5.61)= A⇤(Gf (Ax+ b) b). (5.62)When g(x)  0 or 0 2 @g(x), we have f(Ax + b)  0 or 0 2 @f(y) withy = Ax+ b, thusGg(x) = x = A⇤(Ax+ b b) = A⇤(Gf (Ax+ b) b). (5.63)Hence (5.58) holds. Finally (5.59) follows from (5.58).Remark 5.24. Combining [12, Proposition 3.1 (iii)] with [12, Proposition3.1 (iv)], we have a result related to Proposition 5.23 if f is continous andconvex. For another similar result, see [44, Proposition 2.9]Corollary 5.25. Let a 2 X, f : X ! ]1,+1] and g(x) = f(x  a).Then(8 x 2 X) Gg(x) = Gf (x a) + a. (5.64)Moreover, FixGg = a+ FixGf .Proof. Let A = Id in Proposition 5.23. The result is straightforward.Remark 5.26. When f is a continuous convex function, Corollary 5.25 wasshown in [12, Proposition 3.1 (v)].885.4. Basic properties of subgradient projectorsTheorem 5.27. Assume that f  0 and g = fk with k > 0. ThenGg =✓1 1k◆Id+1kGf .Proof. By Fact 5.15, @g(x) = kf(x)k1@f(x) when f(x)  0.When g(x) > 0 and 0 62 @g(x), we have f(x) > 0 and 0 62 @f(x), therefore(IdGg)(x) = f(x)kkkf(x)k1s(x)k2kf(x)k1s(x) (5.65)=1kf(x)ks(x)k2 s(x) (5.66)=1k(IdGf )(x) (5.67)where s(x) 2 @f(x).When g(x) = 0 or 0 2 @g(x), we have f(x) = 0 or 0 2 @f(x), thus(IdGg)(x) = 0 = (IdGf )(x). (5.68)Therefore, IdGg = 1k (IdGf ) which gives Gg =✓1 1k◆Id+ 1kGf .Remark 5.28. If f and g are convex functions, the conclusion of Theorem5.27 can be found in [12, Proposition 3.1 (iii)] and in [44, Corollary 2.3 ].The following is immediate from the definition of subgradient projectors.Theorem 5.29. Let f, g be two functions such that f ⌘ g on an open setU ✓ X. Then Gf = Gg on U .Remark 5.30. For calculus of subgradient projectors of convex functions, see[12, 44].5.4 Basic properties of subgradient projectorsBased on the definition of subgradient projector, we have the followingfundamental properties which give an alternative expression of Gf in termsof g(x) = ln f(x).Theorem 5.31. (i) We havekxGf (x)k = f(x)ks(x)k and (5.69)895.4. Basic properties of subgradient projectorsxGf (x)kxGf (x)k2 =s(x)f(x)(5.70)for every x satisfying f(x) > 0 and 0 62 @f(x). In particular, when f iscontinuously di↵erentiable, one hasxGf (x)kxGf (x)k2 = r(ln f(x)).(ii) Setg(x) =(ln f(x) if f(x) > 0,1 if f(x)  0. (5.71)Then g is lower-semicontinuous on the open set {x 2 X : f(x) > 0}. Thenwhenever f(x) > 0 and 0 62 @f(x) we haveGf (x) = x c(x)kc(x)k2 (5.72)where c(x) 2 @g(x).If f is continuously di↵erentiable on X \ lev0f , thenGf (x) = x rg(x)krg(x)k2 ,whenever f(x) > 0 and rf(x) 6= 0.Proof. (i) When f(x) > 0 and 0 62 @f(x), let s(x) 2 @f(x), we havexGf (x) = f(x)ks(x)k2 s(x).Therefore,kxGf (x)k = f(x)ks(x)k .If follows thatxGf (x) = f(x)2ks(x)k2s(x)f(x)= kxGf (x)k2 s(x)f(x) . (5.73)HencexGf (x)kxGf (x)k2 =s(x)f(x).905.4. Basic properties of subgradient projectorsWhen f is continuously di↵erentiable, s(x) = rf(x). The result followsas r(ln f(x)) = rf(x)f(x) .(ii) By Fact 5.15, we have @g(x) = 1f(x)@f(x) when f(x) > 0. Thenc(x) =s(x)f(x)2 @g(x)because s(x) 2 @f(x). Since1kc(x)k2 =f(x)2ks(x)k2 ,when f(x) > 0 and 0 2 @f(x). So we havec(x)kc(x)k2 =s(x)f(x)f(x)2ks(x)k2 =f(x)ks(x)k2 s(x).ThusGf (x) = x f(x)ks(x)k2 s(x) = xc(x)kc(x)k2 , (5.74)when f(x) > 0 and 0 /2 @f(x). Hence (5.72) is true.When f is continuously di↵erentiable and f(x) > 0 and rf(x) 6= 0, theresult follows from@g(x) =⇢rf(x)f(x)) rg(x) = c(x) = rf(x)f(x).From (5.74), we haveGf (x) = x rg(x)krg(x)k2 ,whenever f(x) > 0 and rf(x) 6= 0.5.4.1 When is a mapping T a subgradient projector?Theorem 5.32. A mapping T : X ! X is a subgradient projector Gf of alocally Lipschitz function f : X ! R if and only ifx Txkx Txk2 2 @(ln f(x)) whenever f(x) > 0 and 0 62 @f(x), (5.75)Tx = x whenever f(x)  0 or 0 2 @f(x). (5.76)915.4. Basic properties of subgradient projectorsProof. ): Suppose that T = Gf . Applying Theorem 5.31(i), we obtain(5.75).(: Assume that (5.75) and (5.76) hold. By Theorem 5.31(ii), @ ln f =@ff . When f(x) > 0 and 0 62 @f(x), (5.75) gives1kx Txk =ks(x)kf(x)i.e., kx Txk = f(x)ks(x)k ,where s(x) 2 @f(x). By (5.75), when f(x) > 0 and 0 62 @f(x), we havex Tx = kx Txk2 s(x)f(x),so that Tx is equal tox kx Txk2 s(x)f(x)= x✓f(x)ks(x)k◆2 s(x)f(x)= x f(x)ks(x)k2 s(x) = Gf (x)when f(x) > 0 and 0 62 @f(x).5.4.2 Recovering f from its subgradient projector GfCan one determine the function f if Gf is known?Definition 5.33. A locally Lipschitz function f : X ! R is called essen-tially strictly di↵erentiable on an open set A ⇢ X if f is strictly di↵erentiableeverywhere on A except possibly on a Lebesgue null set.Such a class of functions has been extensively studied by Borwein andMoors [17]. This class of functions include finite-valued convex functions,Clarke regular locally Lipschitz functions, semismooth locally Lipschitz func-tions, C1 functions and others, [17, pages 323-328]. If a locally Lipschitzfunction is essentially smooth, then @f is single-valued almost everywhere.Moreover, @f can be recovered by every densely defined selection s 2 @f .Fact 5.34. Let f, g be locally Lipschitz on a polygonally connected19 andopen subset O of Rn. If rf = rg almost everywhere on O, then h := f  gis a constant on O.19 (See [35, Definition 2] on page 189) A setW ⇢ Rn is said to be polygonally connectedif given any two points x, y 2 W , there are points x0 = x, x1, · · · , xm = y such thatSmi=1 xi1xi ✓W , where xi1xi is the closed segment joining xi1 and xi.925.4. Basic properties of subgradient projectorsProof. We prove this by contradiction. By the assumption, h is locallyLipschitz and rh = 0 almost everywhere on O. Suppose that x, y 2 O andh(x) 6= h(y). As O is polygonally connected, there exists z 2 O such thateither [x, z] ✓ O with h(x) 6= h(z) or [z, y] ✓ O with h(z) 6= h(y). Withoutloss of generality, assume [z, y] ✓ O and h(z) 6= h(y). As h is di↵erentiablealmost everywhere, by Fubini’s Theorem [52, Theorem 6.2.2, page 110], wecan choose z˜ nearby z and y˜ nearby y so that both h is di↵erentiable andrh = 0 almost everywhere on [z˜, y˜] ✓ O, and h(z˜) 6= h(y˜). Thenh(y˜) h(z˜) =Z 10hrh(z˜ + t(y˜  z˜)), y˜  z˜idt =Z 100dt = 0which contradicts h(z˜) 6= h(y˜).Theorem 5.35. Let T : X ! X be a subgradient projector. More precisely,ifGf = T = Gf1and f, f1 are essentially strictly di↵erentiable, then there exists k > 0 suchthatf = kf1on each polygonally connected component.Proof. If Gf = T then dom f  domT . As T has full domain, we havedom f = X. Assume that there exist two essentially strictly di↵erentiableand locally Lipschitz functions such that T = Gf = Gf1 . By Theorem 5.32(ii), we havex T (x)kx T (x)k2 2 @(ln f(x)) whenever f(x) > 0,x T (x)kx T (x)k2 2 @(ln f1(x)) whenever f1(x) > 0.As f, f1 are locally Lipschitz, both ln f, ln f1 are locally Lipschitz onX \ FixT . Then@ ln f =1f@f, @ ln f1 =1f1@f1by Fact 5.15 or [25, Theorem 2.3.9(ii)]. Because f, f1 are essentially strictlydi↵erentiable and locally Lipschitz, @f, @f1 are single-valued almost every-where [46], thus@(ln f1(x)) =x T (x)kx T (x)k2 = @(ln f(x)) almost everywhere.935.4. Basic properties of subgradient projectorsBy Fact 5.34, ln f  ln f1 = c for some c 2 R, which implies that f1 = kffor some k > 0.5.4.3 Fixed point closed property and continuityTheorem 5.36. (fixed-point closed property) For a locally Lipschitz functionf , Gf is fixed-point closed at every x 2 Rn, i.e.,ky Gf (y)k ! 0 and y ! x ) x = Gf (x). (5.77)Proof. Assume that a sequence (yn)n2 in X satisfieskyn Gf (yn)k ! 0 and yn ! x. (5.78)Consider three cases.Case 1: If there exists infinitely many yn’s, say (ynk)k2N, such that0 2 @f(ynk). Since @f is upper semicontinuous, taking limit when k ! 1gives 0 2 @f(x). Hence x = Gf (x).Case 2: If there exists infinitely many yn’s, say (ynk)k2, such thatf(ynk)  0. Taking limit when k ! 1 and using the continuity of f atx givesf(x) = limk!1f(ynk)  0.Hence x = Gf (x).Case 3: There exists N 2 N such that f(yn) > 0 and 0 62 @f(yn) whenn > N . Then by (5.69),f(yn) = kyn Gf (yn)kks(yn)k. (5.79)As f is continuous at x, f is locally Lipschitz around x, so @f is locallybounded around x. Therefore,f(x) = limn!1 f(yn) = limn!1(kyn Gf (yn)kks(yn)k) = 0since kyn  Gf (yn)k ! 0. Hence x = Gf (x). Altogether, x 2 FixGf .This establishes (5.77) because (yn)n2N was an arbitrary sequence satisfying(5.78).Theorem 5.37. Let f be a locally Lipschitz function and essentially strictlydi↵erentiable. Then Gf is continuous at x 2 X \ FixGf if and only if f isstrictly di↵erentiable at x.945.4. Basic properties of subgradient projectorsProof. Let x 2 X \ FixGf .(: Assume that f is strictly di↵erentiable at x. By [39, page 258], thelimiting subdi↵erential is reduced to the singleton. Thus we have rf(x) 6= 0which is continuous at x. The result follows from the definition of Gf , whichisy 7! Gf (y) = y  f(y)krf(y)k2rf(y). (5.80)): Assume that Gf is continuous at x. By (5.70),s(y) = f(y)y Gf (y)ky Gf (y)k2 (5.81)so s is (norm to norm) continuous at x. As s is a selection of @f and f isessentially di↵erentiable, f is strictly di↵erentiable at x.The following example illustrates that Gf cannot be continuous if f isnot di↵erentiable.Example 5.38. Definef(x) =(|x| if x  1,2x 1 if x > 1. (5.82)ThenGf (x) =8><>:0 if x < 1,1/2 if x > 1,1 1s(x) where s(x) 2 [1, 2] if x = 1,(5.83)which is discontinuous at x = 1Proof. When x < 0, Gf (x) = x  x(1)2 (1) = 0; When x = 0, f(0) = 0,so Gf (0) = 0; When 0 < x < 1, Gf (x) = x  x12 (1) = 0; When x > 1,Gf (x) = x 2x122 2 = 1/2; When x = 1, @f(1) = [1, 2], soGf (x) = x 1s(x)2 (s(x)) = 11s(x)(5.84)where s(x) 2 [1, 2].955.5. When is the subgradient projector Gf a cutter?Figure 5.2: Graph of f , where f is defined in Example 5.38.5.5 When is the subgradient projector Gf acutter?Our first result characterizes the class of functions f for which its Gf isa cutter.Theorem 5.39. (level sets of tangent plane including the target set) Letf : X ! R be locally Lipschitz. Then Gf is a cutter if and only if wheneverx satisfies f(x) > 0, 0 62 @f(x) and u satisfies f(u)  0 or 0 2 @f(u) onehasf(x) + hs(x), u xi  0, (5.85)where s(x) 2 @f(x). That is, when f(x) > 0 and 0 62 @f(x),{u 2 X : f(u)  0 or 0 2 @f(u)} ✓ {u 2 X : f(x) + hs(x), u xi  0}.(5.86)Proof. When f(x)  0 or 0 2 @f(x), x = Gfx, then Gf satisfies the in-equality in the definition of cutter. Assume that f(x) > 0, 0 62 @f(x) ands(x) 2 @f(x), u 2 FixGf . We havehxGfx, uGfxi = h f(x)ks(x)k2 s(x), u x+f(x)ks(x)k2 s(x)i (5.87)=f(x)ks(x)k2 hs(x), u xi+f2(x)ks(x)k2 (5.88)=f(x)ks(x)k2f(x) + hs(x), u xi. (5.89)965.5. When is the subgradient projector Gf a cutter?As f(x) > 0,hxGfx, uGfxi  0 , f(x) + hs(x), u xi  0. (5.90)Remark 5.40. When function f is locally Lipschitz (not necessarily convex),if it satisfies{u 2 X : f(u)  0 or 0 2 @f(u)} ✓ {u 2 X : f(x) + hs(x), u xi  0},(5.91)when f(x) > 0 and 0 /2 @f(x). By Theorem 5.39, we have Gf is a cutter.Moreover, if the interior of lev0f is nonempty, we take T = Gf into finiteconvergent algorithm in [13], where f is locally Lipschitz. Then the sequencegenerated by Gf will converge finitely.Theorem 5.41. Let k  1 and f : X ! [0,+1) be locally Lipschitz. Ifg = fk and Gf is a cutter, then Gg is a cutter.Proof. By Theorem 5.27, Gg = (11/k) Id+1/kGf . As Id and Gf are bothcutters, and FixGf \ Fix Id = FixGf 6= ?, being a convex combination ofcutters, Gg is a cutter by [20, Corollary 2.1.49, page 62].Theorem 5.42. Assume that A : X ! X is unitary, i.e., A⇤A = Id, andlet b 2 X, f : X ! ]1,+1]. Define x 7! g(x) := f(Ax + b). If Gf is acutter, then Gg is a cutter.Proof. Let x 2 X, u 2 FixGg. Proposition 5.23 givesGg(x) = A⇤Gf (Ax+ b) b (5.92)and Au+ b 2 FixGf . Since A is unitary and Gf is a cutter, we havekxGg(x)k2 = kxA⇤Gf (Ax+ b) bk2 (5.93)= kAx+ bGf (Ax+ b)k2 (5.94) kAx+ b (Au+ b)k2  kGf (Ax+ b) (Au+ b)k2 (5.95)= kx uk2  kGf (Ax+ b)Au bk2 (5.96)= kx uk2  kA⇤Gf (Ax+ b) b uk2 (5.97)= kx uk2  kGg(x) uk2. (5.98)Hence Gg is a cutter by Fact 2.27(i).975.5. When is the subgradient projector Gf a cutter?Theorem 5.43. Assume that R 3 k 6= 0, and g(x) = f(kx). If Gf is acutter, then Gg is a cutter.Proof. Proposition 5.21 gives Gg(x) =1kGf (kx), and FixGg =1k FixGf .Let x 2 X and u 2 FixGg, we have ku 2 FixGf . ThenhxGg(x), uGg(x)i = hx 1kGf (kx), u 1kGf (kx)i (5.99)=1k2hkxGf (kx), kuGf (kx)i  0 (5.100)since Gf is a cutter. Therefore, Gg is a cutter.It is easy to show thatFact 5.44. [20, Corollary 4.2.6] Let f : X ! R be convex and lev0f 6= ?.Then Gf is a cutter. Consequently, Gf is continuous at every x 2 lev0f .Proof. As lev0f 6= ?, FixGf = lev0f . Assume that f(x) > 0. For u 2FixGf , f(u)  0. By the convexity of f we havef(x) + hs(x), u xi  f(u)  0.Hence Theorem 5.39 applies. Therefore, Gf is a cutter. The remainingresult follows from Fact 2.27(ii).Remark 5.45. If the functions f defined in Theorems 5.41, 5.42 and 5.43 areconvex, then functions g are also convex, respectively. By [13, Example 2.1],we have Gf and Gg are cutters. Moreover, one more result [13, Proposition5.3] supports Fact 5.44.In Fact 5.44, lev0f 6= ? is required, as the following example shows.Example 5.46. (1). Let f : R! R be defined byf(x) = exp |x| 8 x 2 R.Then lev0f = ? andGf (x) =8><>:x 1 if x > 0,0 if x = 0,x+ 1 if x < 0.985.5. When is the subgradient projector Gf a cutter?In particular, this Gf is discontinuous at x = 0 and not a cutter. Moreover,Gf is not monotone.(2). Considerx 7! f(x) = exp(kxk2/2).We have lev0f = ? andGf (x) =(x xkxk2 if x 6= 00 if x = 0.In particular, Gf is not continuous at 0, so not a cutter.One might ask: If each function fi : X ! R has Gfi being a cutter,must the maximum g := max{f1, f2} has Gg being a cutter? The answer isnegative as the following example shows.Example 5.47. Let f1(x) = 1 + x and f2(x) = 1  x on R. Each Gfi is acutter by Fact 5.44. The function g(x) := max{f1(x), f2(x)} hasGg(x) :=8><>:1 if x > 0,0 if x = 0,1 if x < 0which is not continuous at x = 0, so Gg is not a cutter.Another question that may arise is if the convexity of the function isnecessary for Gf to be a cutter? Following examples illustrate that theconvexity of the function is independent of Gf to be a cutter.Example 5.48. If f is not convex, Gf need not be a cutter. Considerf : R! R defined by R 3 x 7! f(x) = 1 exp(x2). Then the subgradientprojector of f isGf (x) =(x ( 12x exp(x2)  12x) if x 6= 0,0 if x = 0(5.101)and FixGf = {0}. However Gf is not a cutter.995.5. When is the subgradient projector Gf a cutter?Indeed, when |x| > p2 we havef(x) +rf(x)(0 x) = 1 exp(x2) + (2x exp(x2))(0 x) (5.102)= 1 1 + 2x2exp(x2)(5.103)=exp(x2) (1 + 2x2)exp(x2)(5.104) 1 + x2 + x42  (1 + 2x2)exp(x2)(5.105)=x2(x2  2)2 exp(x2)> 0. (5.106)By Theorem 5.39, Gf is not a cutter.Example 5.49. Even when f is not convex, Gf may still be a cutter. Definef : R! R byf(x) =8>>>><>>>>:0 if x  0,x if 0  x  20/7,8(x 2.5) if 20/7  x  3,2(x 1) if x > 3.(5.107)Then f is not convex since f 0(x) is not monotone on [20/7,+1). However,its subgradient projectorGf (x) =8>>>>>>>>><>>>>>>>>>:x if x  0,0 if 0 < x < 20/7,207  207 1s(x) if x = 20/7, where s(x) 2 [1, 8],2.5 if 20/7 < x < 3,3 4s(x) if x = 3, where s(x) 2 [2, 8],1 if x > 3.(5.108)is a cutter.To see this, by Theorem 5.39, it suces to consider zero level sets oftangent planes. Indeed, Let f(u)  0, i.e., u  0. When x0 > 3,f(x0) + s(x0)(u x0) = 2(u 1)  0; (5.109)when x0 = 3,f(x0) + s(x0)(u x0) = 4 + s(3)(u 3)  4 + 2(u 3)  0; (5.110)1005.6. Characterization of subgradient projectors of convex functionswhere 2  s(3)  8; when 20/7 < x0 < 3,f(x0) + s(x0)(u x0) = 8(u 2.5)  0; (5.111)when x0 = 20/7,f(x0) + s(x0)(x x0) = 20/7 + s(20/7)(u 20/7)  u  0; (5.112)where 1  s(20/7)  8; when 0 < x0 < 20/7,f(x0) + s(x0)(u x0) = u  0. (5.113)Remark 5.50. Note that even if Gf is continuous, it does not mean that Gfis a cutter, e.g., see Example 5.12(ii). (See Corollary 6.14(ii) for an exampleon R2). In [20], Cegielski developed a systematic theory for cutters. Thetheory of cutters can be used to study the class of functions (Theorem 5.39)whose subgradient projectors are cutters.5.6 Characterization of subgradient projectors ofconvex functionsThe following result is of independent interest.Proposition 5.51. Assume that C ✓ X is closed and convex. The functionf : X ! R satisfies(i) f  0 on X \ C;(ii) f is convex on every convex subsets of X \ C;(iii) lim y!xy2X\C,x2Cf(y) = 0, i.e., limi!1 f(yi) = 0 whenever (yi)1i=1 is asequence in X \ C converging to a boundary point x of X \ C.Defineg(x) =(f(x) if x 62 C0 if x 2 C.Then g is convex on X.Proof. Let x, y 2 X, 0    1. We consider three cases.(i) If [x, y] ✓ X \ C, g = f is convex on [x, y] by assumption.1015.6. Characterization of subgradient projectors of convex functions(ii) If x+ (1 )y 2 C, theng(x+ (1 )y) = 0  g(x) + (1 )g(y)since g(x), g(y)  0.(iii) x+(1)y 62 C and [x, y]\C 6= ?. In particular, both x, y cannotboth be in C. We consider two subcases. (1). x 2 C and y 62 C. We needto showg(x+ (1 )y)  g(x) + (1 )g(y).As y 62 C, there exists z 2 bdry(C) such thatx+ (1 )y 2 [z, y] ⇢ X \ C and f(z) = 0.Asx+ (1 )y = ↵z + (1 ↵)y for some 0  ↵  1.and f is convex on [z, y], we havef(x+ (1 )y) = f(↵z + (1 ↵)y)  ↵f(z) + (1 ↵)f(y) = (1 ↵)f(y).(5.114)Nowz = x+ (1 )y for some 0    1,x+ (1 )y = ↵z + (1 ↵)y = ↵(x+ (1 )y) + (1 ↵)ygive  = ↵. Therefore, by (5.114), g(x) = 0 and g(y) = f(y)  0,g(x+ (1 )y) = f(x+ (1 )y) (5.115) (1 ↵)f(y) = (1 )g(y) + g(x). (5.116)(2). x 62 C and y 62 C. We need to showg(x+ (1 )y)  g(x) + (1 )g(y).There exists z 2 bdry(C) such thatx+ (1 )y 2 [z, y] or x+ (1 )y 2 [x, z],say x+ (1 )y 2 [z, y]. Thenx+ (1 )y = ↵z + (1 ↵)y for some 0  ↵  1.1025.6. Characterization of subgradient projectors of convex functionsAs f is convex on [z, y], f(z) = 0,g(x+ (1 )y) = f(↵z + (1 ↵)y)  ↵f(z) + (1 ↵)f(y) = (1 ↵)f(y).(5.117)Nowz = x+ (1 )y for some 0    1,x+ (1 )y = ↵z + (1 ↵)y = ↵(x+ (1 )y) + (1 ↵)ygive  = ↵. Then by (5.117), using g(x) = f(x)  0, g(y) = f(y)  0,g(x+ (1 )y)  (1 ↵)f(y) = (1 )f(y) (5.118) (1 )f(y) + f(x) (5.119)= (1 )g(y) + g(x). (5.120)Combining (i)–(iii), we conclude that g is convex on X.Theorem 5.52. Let T : X ! X andC := {x 2 X : Tx = x}.Then T is a subgradient projector of a convex function f : X ! R withlev0f = C if and only if there exists g : X ! [1,+1) such that g :X \ C ! R is locally Lipschitz, g(x) = 1 for every x 2 C, and(i) for every x 2 X \ C, xTxkxTxk2 2 @g(x);(ii) The function defined byf(x) :=(exp(g(x)) if x 62 C,0 if x 2 Cis convex.In this case, T = Gf .Proof. ): Assume that T is a subgradient projector, say T = Gf1 withf1 : X ! R being convex and lev0f1 = C. Then f = max{0, f1} is convexand Gf = Gf1 . Put g = ln f and C = lev0f . Since f is locally Lipschitz, g islocally Lipschitz on X \C. Note that @g(x) = (@f(x))/f(x) when f(x) > 0.Apply Theorem 5.31(i) to obtain (i).1035.6. Characterization of subgradient projectors of convex functions(: Assume that (i), (ii) hold. When x 62 C, (i) and (ii) givekx Txk = 1kc(x)k , @g(x) =@f(x)f(x)where c(x) 2 @g(x). Using (i) again, we haveTx = x kx Txk2c(x) = x c(x)kc(x)k2 = Gf (x) (5.121)by Theorem 5.31(ii). Moreover, when x 2 C, Tx = x = Gf (x). HenceT = Gf .Theorem 5.53. Let T : X ! X be di↵erentiable andC = {x 2 X : Tx = x}be closed and convex. Define T1 : X \ C ! X byx 7! T1(x) = x Txkx Txk2 .Then T is a subgradient projector of a convex function f : X ! R withlev0f = C and being di↵erentiable on X \ C if and only if(i) For every x 2 X \ C,T1(x)(T1(x))⇤ +rT1(x) ⌫ 0i.e., positive semidefinite.(ii) There exists a function g : X ! [1,+1] such that(8 x 2 X \ C) rg(x) = T1(x),(8 x 2 C) limy!xy2X\Cg(y) = 1,and(8 x 2 C) g(x) = 1.Proof. ): Assume that T = Gf with f being convex and lev0f = C. ByTheorem 5.31(i), we can put g = ln f to obtain (ii). Indeed, for x 2 X \ C,T1(x) =x Txkx Txk2 =xGf (x)kxGf (x)k2 = r(ln(f(x))) = rg(x). (5.122)1045.6. Characterization of subgradient projectors of convex functionsFor x 2 C, we have f(x)  0. Moreover, let (yn) ✓ X \C and yn ! x. Thenf(yn) > 0. Since f is continuous, thus we have f(yn)! f(x)  0. Thereforef(x) = 0 and ln(f(x)) = 1 for x 2 C. By Theorem 5.52, g = ln(f) islocally Lipschitz, thus we havelimy!xy2X\Cg(y) = limy!xy2X\Cln(f(y)) = 1. (5.123)Furthermore, when x 2 C, f(x)  0 and f(x) = exp(g(x))  0 as well. Thusf(x) = exp(g(x)) = 0 when x 2 C. Therefore g(x) = 1 when x 2 C. Asf = exp(g), for every x 62 C we haverf(x) = eg(x)rg(x) = eg(x)T1(x),r2f(x) = eg(x)T1(x)(T1(x))⇤+eg(x)rT1(x) = eg(x)T1(x)(T1(x))⇤+rT1(x).Since f is convex, r2f(x) ⌫ 0, and this is equivalent toT1(x)(T1(x))⇤ +rT1(x) ⌫ 0which is (i).(: Assume that (i) and (ii) hold. Put f = exp(g). Then lev0f = C.Indeed, let x 2 lev0f , we have f(x) = exp(g(x))  0. That is g(x) = 1.If x /2 C, by (ii), we have rg(x) = T1(x). This is not true when g(x) = 1.Thus x 2 C. So lev0f ✓ C. Conversely, let x 2 C, from (ii), we haveg(x) = 1. Hence f(x) = exp(g(x)) = 0. So x 2 lev0f . Then C ✓ lev0f .For x 2 X \ C,rf(x) = eg(x)rg(x) = eg(x)T1(x),r2f(x) = eg(x)T1(x)(T1(x))⇤+eg(x)rT1(x) = eg(x)T1(x)(T1(x))⇤+rT1(x).(i) and (ii) imply that f is di↵erentiable and convex on convex subsets ofX \ C, and f ⌘ 0 on C. By Proposition 5.51, f is convex on X.Moreover, when x 6= Tx we haveGf (x) = x✓f(x)krf(x)k◆2rf(x)f(x)= x T1(x)kT1(x)k2 (5.124)= xxTxkxTxk2✓1kxTxk◆2 = x (x Tx) = Tx. (5.125)Let’s reduce X = Rn to the real line R.1055.6. Characterization of subgradient projectors of convex functionsCorollary 5.54. Let T : R! R be di↵erentiable andC = {x 2 R : x = Tx}be a closed interval. Then T is a subgradient projector of a convex functionf : R! R with lev0f = C and being di↵erentiable on R \ C if and only if(i) T is monotonically increasing on convex subsets of R \ C;(ii) The functiong(x) =Z xa1s Tsdssatisfieslimx#sup(C)g(x) = 1for some a > sup(C); andlimx"inf(C)g(x) = 1for some a < inf(C).Proof. Define n : R ! R by x 7! n(x) = x  Tx. Then for every x 62 C,T1(x) =1n(x) . Theorem 5.53 (i) is equivalent to1n2(x) n0(x)n2(x) 0.This is the same as n0(x)  1, which implies that T 0(x)  0. Conclusions in(ii) are actually equivalent to Theorem 5.53(ii).Corollary 5.55. Let T : R! R be di↵erentiable andC = {x 2 R : x = Tx}be a closed interval. Define N : R! R byx 7! N(x) := x Tx.Suppose that(i) N is nonexpansive;1065.6. Characterization of subgradient projectors of convex functions(ii) The functiong(x) =Z xa1s Tsdssatisfies limx#sup(C) g(x) = 1 for some a > sup(C); andlimx"inf(C) g(x) = 1 for some a < inf(C).Then T is a subgradient projector of a convex function f : R ! R withlev0f = C and being di↵erentiable on R \ C. In particular, the assumption(i) holds when T is firmly nonexpansive.Proof. It suces to observe thatT = IdN.Since N is nonexpansive, T is monotone. Also note that T is firmly nonex-pansive if and only if N is. Moreover, N 0(x)  1 is equivalent to T 0(x)  0.Thus (i) is the same as Corollary 5.54 (i).We illustrate Corollary 5.54 with three examples. They demonstratethat both conditions (i) and (ii) in Corollary 5.54 are needed. More pre-cisely, (i) is for convexity of function f ; (ii) is for lev0f = C.In the first example, T fails to be monotone but verified condition (ii)in Corollary 5.54. Then T is not the subgradient projector of a convexfunction.Example 5.56. Define T : R! R byT (x) :=(xpx+pxe2px if x > 0,x if x  0.Then T is a subgradient projector of the nonconvex function f : R ! Rgiven byf(x) :=(e2px  1 if x > 0x if x  0In this case, T fails to be monotone nor to be a cutter, but T verifies con-dition (ii) of Corollary 5.54.Proof. When x > 0, f 0(x) = e2pxx1/2, so that f 00(x) =e2px⇣1 12px⌘x . Sincef 00(x) < 0 when 0 < x < 1/4, f is not convex on R. Now we show that (i)1075.6. Characterization of subgradient projectors of convex functionsT fails to be monotone. This is equivalent to verify that for some x we haveN 0(x) > 1 where N(x) = x Tx. Indeed,N 0(x) =✓pxpxe2px◆0=12e2px  1e2pxpx+ e2px.L’Hospital rule gives limx!0+ e2px1e2pxpx= 2, so limx!0+ N 0(x) = 2. Therefore,T is not monotone.(ii) T fails to be a cutter. From the definition of T , we have FixT = R.Let x = 14 and u = 0 2 FixT , thenhx Tx, u Txi ⇡ (14 (0.066))(0 (0.066)) ⇡ 0.020856 > 0.(iii) T verified condition (ii) of Corollary 5.54. For x > 0,N(x) =e2px  1e2pxx1/2.With a > 0, we haveg(x) =Z xa1N(s)ds =Z xae2pxx1/2e2px  1 dx = ln(e2px  1) ln(e2pa  1).Clearly, limx!0+ g(x) = 1. Hence (ii) holds.(a) Nonconvex function f . (b) The plot of T which is notmonotonically increasing.Figure 5.3: Illustration for Example 5.561085.6. Characterization of subgradient projectors of convex functionsIn the following example, both two T s are the subgradient projectors ofconvex functions but T s do not satisfy Corollary 5.54(ii).Example 5.57. (1) Define T : R! R byx 7! T (x) =8><>:x 1 if x > 0,0 if x = 0,x+ 1 if x < 0.Then T = Gf where the convex function f : R! R is defined byx 7! f(x) = e|x|.However, lev0f = ? but FixT = {0}. In this case, in Corollary 5.54 condi-tion (i) holds but condition (ii) fails.(a) curve of convex function f . (b) the plot of T which is notmonotonically increasing.Figure 5.4: Illustration for Example 5.57 (1)(2) Define T : R! R byx 7! T (x) =(x 12x if x 6= 00 if x = 0.Then T = Gf where f : R! R is given byx 7! f(x) = ex2 .1095.6. Characterization of subgradient projectors of convex functionsHowever, lev0f = ? but FixT = {0}. In this case, in Corollary 5.54 condi-tion (i) holds but condition (ii) fails.(a) convex function f . (b) the plot of T which is notmonotonically increasing.Figure 5.5: Illustration for Example 5.57 (2)Proof. (1) We haveN(x) = xGf (x) =8><>:1 if x > 00 if x = 01 if x < 0.Let a > 0. For x > 0 we haveg(x) =Z xa1N(s)ds =Z xa1ds = x a.Thus, limx!a+ g(x) = 0. Therefore, (ii) of Corollary 5.54 fails.(2) We haveN(x) = x T (x) = 12xand N 0(x) =  12x2 . Therefore, T is monotone on (0,+1) and (1, 0). Thissays that condition (i) of Corollary 5.54 holds.1105.6. Characterization of subgradient projectors of convex functionsHowever, when a > 0, for x > 0 we haveg(x) =Z xa1N(s)ds =Z xa2sds = x2  a2.Then limx!0+ g(x) = a2, so condition (ii) of Corollary 5.54 fails.Note that T defined in Example 5.57(1) is a subgradient projector ofa convex function, but is not a cutter. Actually, it is easy to see thatFixT = {0}. Let x = 12 > 0, we have Tx = x 1 = 12 . Let 0 = y 2 FixT ,we havehx Tx, y  Txi = h12 (12), 0 (12)i = 12> 0. (5.126)Then T is not a cutter.Remark 5.58. Example 5.57(1) gives us an example of a subgradient pro-jector of a convex function which is not a cutter. On the other hand, thereis an example. f(x) = 1. f is convex on X. By Proposition 5.8(i), we havethe subgradient projector of f is identity, which is a cutter. Thus we cannotconclude that the subgradient projector of a convex function is a cutter ifwe don’t have condition of lev0f 6= ?.Example 5.59. Define T : R! R byx 7! T (x) =8><>:xpx if x > 00 if x = 0xpx if x < 0(5.127)Then T = Gf where the nonconvex function f : R! R is given byx 7! f(x) =(e2px if x  0e2px if x < 0.(5.128)However, lev0f = ? but FixT = {0}. In this case, both conditions (i) and(ii) in Corollary 5.54 fail.Proof. f(x) = e2px is nonconvex on [0,+1), see Example 5.56. Gf = Tfollows by direct calculations.Condition (i) of Corollary 5.54 fails: T is not monotonically increasingon [0,+1) since T 0(x) = 1 12px< 0 when x > 0 is suciently near 0.1115.6. Characterization of subgradient projectors of convex functionsCondition (ii) of Corollary 5.54 fails. Indeed,N(x) = x T (x) = px (5.129)when x  0. When a > 0, for x > 0 we haveg(x) =Z xa1psds = 2px 2pa, (5.130)so that limx!0+ g(x) = 2pa.(a) Plot of nonconvex function f . (b) The plot of T which is notmonotonically increasing.Figure 5.6: Illustration for Example 5.59.This chapter presented basic properties of subgradient projectors of non-convex functions. We also investigated when a subgradient projector is acutter and when a mapping is a subgradient projector of a convex function.112Chapter 6Linear SubgradientProjectors6.1 OverviewIn this chapter, based on the manuscript [11], we mainly focus on theproperties of Gf when it is linear. We characterize symmetric linear sub-gradient projectors and obtain a closed form. It turns out that the set ofsubgradient projectors is not convex.6.2 Characterizations of Gf when Gf is linearProposition 6.1. Let T be a linear operator. Then T is a cutter if andonly if T is firmly nonexpansive.Proof. ): Assume that T is a cutter. Then for every x 2 X and u 2 FixT ,hx Tx, u Txi = hTx x, Tx ui  0. (6.1)Put u = 0. Since T is linear, thus T mapps 0 to 0. Then 0 2 FixT . SohTx x, Tx 0i  0 ) kTxk2  hx, Txi. (6.2)By (2.44), T is firmly nonexpansive.(: Assume that T is firmly nonexpansive. Let u 2 FixT . Then Tu = uandhTx x, Tx ui = hTx x, Tx Tui (6.3)= hTx Tu+ Tu x, Tx Tui (6.4)= kTx Tuk2 + hTu x, Tx Tui (6.5)= kTx Tuk2  hx u, Tx Tui  0. (6.6)Hence T is a cutter.1136.2. Characterizations of Gf when Gf is linearThe following example says that Proposition 6.1 fails if T is not linear.Example 6.2. Define a continuous nonlinear T : R! R byT (x) =8>>>><>>>>:x/2 if 2  x  2,3 x if 2  x  3,(3 + x) if 3  x  2,0 otherwise.(6.7)Then T is a cutter but not firmly nonexpansive as T is not monotone (re-ferring to Theorem 2.32).Proof. (i) It is easy to obtain that FixT = {0}. This means that T is acutter if and only if (T (x))2  xT (x).When 2  x  2, we have(T (x))2 =x24 xx2= xT (x); (6.8)When 2  x  3, we have(T (x))2 = (3 x)2 = (3 x)(3 x)  x(3 x); (6.9)when 3  x  2, we have(T (x))2 = [(x+ 3)]2 = [(x+ 3)][(x+ 3)]  x[(x+ 3)]; (6.10)when |x| > 3, we have(T (x))2 = 0 = xT (x). (6.11)Hence T is a cutter.(ii) T is not monotone. Actually, 8x, y 2 [2, 3], the following is true.hTx Ty, x yi = h(3 x) (3 y), x yi = (x y)2  0. (6.12)Therefore we conclude that T is not monotone, either firmly nonex-pansive.1146.2. Characterizations of Gf when Gf is linearFigure 6.1: Graph of T defined in Example 6.2.Observe that Example 6.2 is much simpler than [20, Example 2.2.8, page68].Theorem 6.3. Let a > 0 and B 6= 0 being an n⇥n symmetric and positivesemidefinite matrix. Consider the function(8x 2 X) f(x) = (x|Bx)1/(2a) (6.13)(i) We haveGf (x) =(x a x|BxkBxk2Bx if Bx 6= 0,x if Bx = 0.(6.14)(ii) Gf is linear if and only if B = PL where  > 0 and L ✓ X is asubspace. In this casekerB = L?, (6.15)f(x) = 1/(2a)dL?(x)1/aand (6.16)Gf = IdaPL = (1 a) Id+aPL? . (6.17)(iii) Assume that Gf is linear. Then Gf is a cutter if and only if 0 < a  1.Proof. (i) As B is positive semidefinite, i.e., (8x 2 X) x|Bx  0. f(x)  0if and only if f(x) = 0, which is equivalent to Bx = 0. Gf follows fromdirect calculations.1156.2. Characterizations of Gf when Gf is linear(ii) ): Assume that Gf is linear. The mappingx 7! T1(x) := a1xGf (x)=(x|BxkBxk2Bx if Bx 6= 00 if Bx = 0(6.18)is linear. Let i be eigenvalues of B, where i = 1, 2. Assume that 1,2 > 0.We want to show that 1 = 2. We prove it by contradiction. Suppose that1 6= 2. Take unit length eigenvector vi associated with i. Note thathv1, v2i = 0, Bvi 6= 0 and B(v1 + v2) = 1v1 + 2v2 6= 0. As T1 is linear, wehave T1(v1 + v2) = T1v1 + T1v2. NowT1(v1 + v2) =(v1 + v2)|B(v1 + v2)kB(v1 + v2)k2 B(v1 + v2) (6.19)=(v1 + v2)|(1v1 + 2v2)k1v1 + 2v2)k2 (1v1 + 2v2) (6.20)=1v|2v1 + 2v|2v2 + 1v|1v1 + 2v|1v221 + 22(1v1 + 2v2) (6.21)=1 + 221 + 22(1v1 + 2v2), (6.22)T1v1 + T1v2 =v|1Bv1kBv1k2Bv1 +v|2Bv2kBv2k2Bv2 (6.23)=1kv1k2k1v1k21v1 +2kv2k2k2v2k22v2 (6.24)= v1 + v2 (6.25)As {v1, v2} are linearly independent, the above gives 1 = 2 which con-tradicts 1 6= 2. Therefore, all positive eigenvalues of B have to be equal.Hence, we haveB = U|✓Id 00 0◆U (6.26)where U is an orthogonal matrix,  > 0, Id is an m⇥m identity matrix forsome m  0. The matrixU|✓Id 00 0◆U (6.27)is idempotent20 and symmetric, so it is a matrix associated with an or-thogonal projection on a closed subspace, say PL, [37, page 430, page 433].HenceB = PL (6.28)20We sayA is an idempotent operator if and only if AAx = Ax for x 2 X.1166.2. Characterizations of Gf when Gf is linearwhich implies that Bx = 0 if and only if PLx = 0, i.e., kerB = L?. Thenwhen PLx 6= 0,T1(x) =x|BxkBxk2Bx =x|PLxx|PLPLxPLx =x|PLx2x|PLxPLx = PLx; (6.29)when PLx = 0, T1x = 0 = PLx. Hence T1 = PL. It follows thatGf = IdaT1 = IdaPL = (1 a) Id+a(IdPL) = (1 a) Id+aPL? .(6.30)We proceed to find the expression of f(x):f(x) = (x|Bx)1/(2a) = (x|PLx)1/(2a) (6.31)= 1/(2a)(x|PLPLx)1/(2a) = 1/(2a)(kPLxk2)1/(2a) (6.32)= 1/(2a)(kx PL?xk2)1/(2a) = 1/(2a)(dL?(x)2)1/(2a) (6.33)= 1/(2a)dL?(x)1/a(6.34)(: Assume that B = PL for  > 0 and some subspace L ✓ X. Theassumption givesf(x) = 1/(2a)dL?(x)1/a. (6.35)By Fact 5.18,Gf = GdL?1/a . (6.36)By Theorem 5.27,GdL?1/a = (1 a) Id+aGdL? . (6.37)By Fact 6.5,GdL?1/a = (1 a) Id+aPL? . (6.38)Since PL? is linear as L? is a subspace. Hence Gf is linear.(iii) ): Assume that Gf is a linear cutter. By Fact 6.1, Gf is firmlynonexpansive, so is IdGf . By (ii), IdGf = aPL, aPL has to be nonex-pansive. Take 0 6= x 2 L. The nonexpansiveness requireskaPLx aPL0k = kaxk  kxk (6.39)so that a  1.(: Assume that 0 < a  1. Since x 7! (x|Bx)1/2 is convex, and thefunction[0,+1) 3 t 7! t1/a (6.40)1176.2. Characterizations of Gf when Gf is linearis convex and inceasing when 0 < a  1. By [41, Proposition 1.39], we havethatx 7! f(x) = (x|Bx)1/21/a (6.41)is convex. By Example 2.26, we have Gf is a cutter.We illustrate Theorem 6.3(iii) with the following example.Example 6.4. Let a > 1. ConsiderX 3 x 7! f(x) = (x|x)1/(2a) = kxk1/a. (6.42)Then f is not convex, andGf (x) = (1 a)x (6.43)for every x 2 X. Although Gf is linear, but it is not a cutter since it is notmonotone, see, e.g., Proposition 6.1.For a set C ⇢ X, its distance function dC : X ! [0,+1) is defined byX 3 x 7! dC(x) := inf{kx yk : y 2 C}. (6.44)Fact 6.5. ([5]) Let C ✓ X be closed and convex, and f = dC . ThenGf = PC .The following result completely characterizes symmetric and linear sub-gradient projectors.Theorem 6.6. Assume that T : X ! X is linear and symmetric. Then thefollowing are equivalent.(i) T is a subgradient projector of a convex function.(ii) T = Gf where f : X ! R is given byf(x) = K(x|PLx)1/(2) = KdL?(x)1/(6.45)where 0 <   1, K > 0 and L ✓ X is a subspace. In this case,Gf = (1 ) Id+PL? .Proof. (i))(ii): Assume that T = Gf for some convex function. Since Tis linear and a cutter, T is firmly nonexpansive by Proposition 6.1. ThenT1 = IdT is firmly nonexpansive. We consider two cases.Case 1: int lev0f 6= ?. Let x0 2 int lev0f , i.e., there is " > 0 such that1186.2. Characterizations of Gf when Gf is linearball(x0, ") ✓ lev0f . We now show that T1 ⌘ 0 on ball(x0, "). Actually,8b 2 X with kbk  ", we have x0 + b 2 ball(x0, ") ✓ lev0f , thus T (x0 +b) = x0 + b, so T1(x0 + b) = 0. Moreover, since T is linear, so is T1, thusT1(b) = T1(x0 + b)  T1(x0) = 0. Then T1 ⌘ 0 on ball(0, "). Next weshow T1 ⌘ 0 outside of ball(0, "). Indeed, let x 2 X \ ball(0, "). Thenxkxk" 2 ball(0, "), thus we have T1( xkxk") = "kxkT1(x) = 0. Therefore T1 ⌘ 0on X. Thus, T = Id on X. Then T = Gf with f ⌘ 0. This that means (ii)holds with L = {0},  = 1 and K > 0.Case 2: int lev0f = ?. As T1 is continuous, we only need to considerT1(x) =f(x)krf(x)k2rf(x) when f(x) > 0. (6.46)ThenkT1(x)k = f(x)krf(x)k (6.47)and rf(x)f(x)=T1(x)kT1(x)k2 . (6.48)Since T is symmetric, T1 is symmetric, so there exists an orthogonal matrixQ such that Q|T1Q = D where D is an diagonal matrix and Q| denotes thetranspose of Q. Put g = ln f and x = Qy. We have(rg)(Qy) = T1QykT1Qyk2 . (6.49)Multiplying both sides by Q| and using Q| being an isometry (i.e., kQ|zk =kzk for every z 2 X) giveQ|(rg)(Qy) = Q|T1QykT1Qyk2 =DykQ|T1Qyk2 =DykDyk2 . (6.50)If we put h = g Q, then rh(y) = Q|rg(Qy) for every y 2 X, sorh(y) = DykDyk2 . (6.51)WriteD =0BBB@1 0 · · · 00 2 · · · 0....... . ....0 0 · · · n1CCCA (6.52)1196.2. Characterizations of Gf when Gf is linearWhen 1 = · · · = n = 0, this is covered in Case 1. We can assume thatT1 6⌘ 0. As T1 is monotone, we can and do assume that 1, · · · ,m > 0 andm+1 = · · · = n = 0. Thenrh(y) =✓1y1Pmk=1 2ky2k, · · · , mymPmk=1 2ky2k, 0, · · · , 0◆. (6.53)Since h has continuous second order derivatives,@2h@yi@yj=@2h@yj@yi(6.54)which gives2j2i yiyjPmk=1 2ky2k=2i2jyiyjPmk=1 2ky2k(6.55)when 1  i, j  m, i 6= j. As int lev0f = ?, (6.55) holds on open subset ofX, so we have i = j . Because 1  i, j  m were arbitrary, we obtain that1 = · · · = m. HenceT1 = Q✓ Idm 00 0◆Q| = Q✓Idm 00 0◆Q| = PL (6.56)where L ✓ X is a linear subspace, see [37, page 430]. Namely, T1 is a positivemultiple of an orthogonal projector.Now T1 is firmly nonexpansive and T|1 = T1, this implies thatT1 + T|12 T |1 T1 = T1  T 21 = Q✓( 2) Id 00◆Q (6.57)is positive semidefinite, so 0    1. As T1 6= 0 in this case, 0 <   1.Therefore,r ln f(x) = T1xkT1xk2 =PLxkPLxk2 =1PLxkPLxk2 . (6.58)Note that P |L = PL, P2L = PL,r ln kPLxk = 1kPLxkrkPLxk =1kPLxkP|LPLxkPLxk =PLxkPLxk2 . (6.59)It follows thatr ln f(x) = 1r ln kPLxk = r ln kPLxk1/ (6.60)1206.2. Characterizations of Gf when Gf is linearwhich is equivalent toln f(x) = ln kPLxk1/ + c (6.61)for some constant c 2 R. Taking exp both sides givesf(x) = KkPLxk1/ = K(kPLxk2)1/(2) = K(x|PLx)1/(2) (6.62)where K = exp(c) > 0. As PL = IdPL? ,f(x) = Kkx P?xk1/ = K(dL?(x))1/. (6.63)Finally,T = Gf = IdT1 = IdPL = Id(IdPL?) = (1 ) Id+PL? .(6.64)Theorem 6.7. Subgradient projectors of convex functions are not closedunder convex combination.Proof. Let L := {0} ⇥ R ✓ R2 and M := {(x, y) 2 R2 : hxy,11i = 0}.Define two functions(8x 2 R2) f(x) = K1dL?(x)1/1 , g(x) = K2dM?(x)1/2 (6.65)where 0 < 1 6= 2 < 1. By Theorem 6.6, we haveGf = (1 1) Id+1PL? , Gg = (1 2) Id+2PM? . (6.66)Now consider 3Gf + (1 3)Gg where 0 < 3 < 1,3 6= 1,2. Then3Gf + (1 3)Gg (6.67)=3(1 1) Id+1PL?+ (1 3)(1 2) Id+2PM?(6.68)=(1 2 + 23  13) Id+13PL? + 2(1 3)PM? (6.69)If 3Gf + (1 3)Gg is a subgradient projector, by Theorem 6.6, thereare 0 <  < 1 and S which is a subspace of R2 such that3Gf + (1 3)Gg = (1 ) Id+PS? . (6.70)Therefore we have(1 2 + 23  13) Id+13PL? + 2(1 3)PM? = (1 ) Id+PS? .(6.71)1216.2. Characterizations of Gf when Gf is linearNatually, the set of fixed points of left-hand side is equal to the set of fixedpoints of right-hand side. Thus we haveFix ((1 ) Id+PS?) (6.72)=Fix ((1 2 + 23  13) Id+13PL? + 2(1 3)PM?) . (6.73)By [7, Proposition 4.34], we haveFix ((1 2 + 23  13) Id+13PL? + 2(1 3)PM?) = L? \M?.(6.74)Also,Fix ((1 ) Id+PS?) = S?. (6.75)Hence{(0, 0)} = L? \M? (6.76)= Fix ((1 2 + 23  13) Id+13PL? + 2(1 3)PM?)(6.77)= Fix ((1 ) Id+PS?) (6.78)= S? (6.79)Therefore S? = {(0, 0)}, it imples S = R2. Now we view a projectionoperator as a matrix. Then we havePL? =1 00 0, PM? =12121212, PS?0 00 0. (6.80)So we can see that PL? , PS? are diagonal matrices, but PM? is not. Hence,(6.71) is not true. This means 3Gf + (1  3)Gg is not a subgradientprojector.Theorem 6.8. Assume 0 < 1 < 1, 0 < 2 < 1, 0 <  < 1. Suppose thatL,M are linear subspaces of X satisfying L = M?,M = L? and L,M arenot {0} nor X. LetGf = (1 1) Id+1PL? , Gg = (1 2) Id+2PM? . (6.81)If 1 6= 21 , then (1)Gf +Gg is not a subgradient projector of a convexfunction.1226.2. Characterizations of Gf when Gf is linearProof. We have(1 )Gf + Gg (6.82)=(1 ) ((1 1) Id+1PL?) +  ((1 2) Id+2PM?) (6.83)= [(1 )(1 1) + (1 2)] Id+1(1 )PL? + 2PM? (6.84)= Id+(1 )(PL? + (1 )PM?) (6.85)where  = (1  )(1  1) + (1  2) and  = 1(1)1(1)(11)(12) . Wenext show that 0 <  < 1 and  6= 12 . To see this,0 <  = (1 )(1 1) + (1 2) < (1 ) +  = 1. (6.86)Also, =12(6.87)() 1(1 )1 (1 )(1 1) (1 2) =21 (1 )(1 1) (1 2)(6.88)()1(1 ) = 2 (6.89)()1 =21. (6.90)By the assumption: 1 6= 21 , so  6= 12 . Since Gf , Gg are linear andsymmetric, so is (1 )Gf + Gg.If (1  )Gf + Gg is a subgradient projector of a convex function, byTheorem 6.6, we have(1 )Gf + Gg = (1 ↵) Id+↵PS? (6.91)where 0 < ↵ < 1 and S is a subspace of X. Note that Gf , Gg are averagedmapping21, so is (1 )Gf + Gg. AsFixGf = L?, FixGg =M?. (6.92)By [7, Proposition 4.34], we obtainFix((1 )Gf + Gg) = FixGf\FixGg = L?\M? = {0}. (6.93)21(See [7, Definition 4.23]) Let t 2 ]0, 1[ and T : X ! X. Then T is averaged withconstant t if there exists a nonexpansive operator N such that T = (1 t) Id+tN .1236.2. Characterizations of Gf when Gf is linearAsFix((1 ↵) Id+↵PS?) = S?. (6.94)Thus we have S? = {0}. Therefore, refer to (6.91), we have(1 )Gf + Gg = (1 ↵) Id . (6.95)Equivalently, Id+(1 )(PL? + (1 )PM?) = (1 ↵) Id . (6.96)We proceed to analyze ↵,. Take x 2 M,x 6= 0. Then PM?x = 0, PL?x =PMx = x. (6.96) givesx+ (1 )(x) = (1 ↵)x. (6.97)It implies + (1 ) = 1 ↵. (6.98)Take x 2 L, x 6= 0. Then PL?x = 0, PM?x = PLx = x. (6.96) givesx+ (1 )(1 )x = (1 ↵)x. (6.99)It implies + (1 )(1 ) = 1 ↵. (6.100)Substracting (6.98) from (6.100), we have(1 )(1 2) = 0 (6.101)which implies  = 1 or  = 12 . This contradicts the choices of ,1,2.Question: Why must Gf be symmetric when Gf is linear and f isconvex? When f is not convex, we know that Gf can be nonsymmetric, seeExample 6.14 (2).Proposition 6.9. Let M : X ! X be linear, monotone and(8 x 2 X with Mx 6= 0) rh(x) = MxkMxk2 (6.102)where the function h : {x 2 X : Mx 6= 0} ! R. If dim ranM 6= 2, then Mis symmetric.1246.2. Characterizations of Gf when Gf is linearProof. If dim ranM = 0, then M = 0, so it is symmetric. Let us assumethat dim ranM > 0 and dim ranM 6= 2. Since h has continuous mixedsecond order derivatives at x whenever Mx 6= 0, the Hessian matrix r2h(x)is symmetric. Asr2h(x) = kMxk2M Mx(rkMxk2)|kMxk4 =kMxk2M  2Mxx|M|MkMxk4 ,(6.103)the symmetric property means thatkMxk2M  2Mxx|M|M = (kMxk2M  2Mxx|M|M)| (6.104)=M|kMxk2  2M|Mxx|M| (6.105)whenever Mx 6= 0. Put y =Mx. The above is simplified toM M|2=yy|kyk2M M| yy|kyk2 . (6.106)Denote the projection operator on the line spanned by {y}, span(y), byPy =yy|kyk2 . We have(8 y 2 ranM) M M|2= PyM M|Py. (6.107)Since M is monotone, ranM = ranM|, see, e,g, [14, Theorem 3.2]. Let{ei : i = 1, . . . ,m} be an orthonormal basis of ranM . ThenPranM =mXi=1eie|i . (6.108)Note thatM|PranM =M|PranM| =M|(PranM| + P(ranM|)?) =M| (6.109)becauseM|P(ranM|)? = 0. To see this, let y 2 (ranM|)?. For every z 2 X,hM|y, zi = hy,Mzi = 0 (6.110)because Mz 2 ranM = ranM|. As z 2 X was arbitrary, we must haveM|y = 0. SinceM M|2= PeiM M|Pei (6.111)1256.2. Characterizations of Gf when Gf is linearby (6.107), summing up (6.111) from from i = 1 to i = m, followed by using(6.108) and (6.109), we obtainm2(MM|) = (mXi=1Pei)MM|(mXi=1Pei) = PranMMM|PranM =MM|,(6.112)that is,(m2 1)(M M|) = 0. (6.113)Hence M M| = 0, and so M is symmetric.Proposition 6.9 fails when dim ranM = 2 as the following example shows.Example 6.10. When dim ranM = 2, although M : R2 ! R2 is linear,monotone and(8 Mx 6= 0) rh(x) = MxkMxk2 , (6.114)one cannot guarantee that M is symmetric.Let x = (x, y)| 2 R2.(1). DefineM =✓0 11 0◆. (6.115)Then M is linear, monotone, dim ranM = 2 andr arctan(y/x) =✓yx◆x2 + y2=MxkMxk2 (6.116)whenever x 6= 0. However, M is not symmetric.(2). DefineM =✓1/2 1/21/2 1/2◆. (6.117)Then M is linear and monotone andr✓ln(x2 + y2)2+ arctan(y/x)◆=✓x yy + x◆x2 + y2=MxkMxk2 (6.118)whenever x 6= 0. However, dim ranM = 2 and M is not symmetric.1266.3. A complete analysis of linear subgradient projectors on R2Conjecture: Let M : Rn ! Rn be linear, monotone and(8 x 2 Rn with Mx 6= 0) rh(x) = MxkMxk2 (6.119)where the function h : {x 2 Rn : Mx 6= 0} ! R. If dim ranM = 2 andexp(h) is convex on convex subsets of {x 2 Rn : Mx 6= 0}, then M issymmetric.Combining Theorem 6.6 and Proposition 6.9, we obtain the followingcharacterization of linear subgradient projectors.Theorem 6.11. Assume that T : X ! X is linear and dim ran(IdT ) 6= 2.Then the following are equivalent.(i) T is a subgradient projector of a convex function.(ii) T = Gf where f : X ! R is given byf(x) = K(x|PLx)1/(2) = KdL?(x)1/(6.120)where 0 <   1, K > 0 and L ✓ X is a subspace. In this case,Gf = (1 ) Id+PL? .Proof. (i))(ii): Assume that T = Gf for some convex function. Then Tis a cutter by Fact 5.44. As T is linear, in view of Proposition 6.1, T isfirmly nonexpansive, so M := IdT is firmly nonexpansive, in particular,monotone. By Theorem 5.31,rh(x) = MxkMxk2 (6.121)where h(x) = ln f(x), f(x) > 0. This is equivalent torh(x) = MxkMxk2 (6.122)whenMx 6= 0. Proposition 6.9 shows thatM is symmetric, so is T = IdM .It suces to apply Theorem 6.6 to obtain (ii).(ii))(i): Clear.6.3 A complete analysis of linear subgradientprojectors on R2We start with a simple result about essentially strictly di↵erentiablefunctions (for definition, see Definition 5.33).1276.3. A complete analysis of linear subgradient projectors on R2Lemma 6.12. Let O ✓ X be a nonempty open set and f : O ✓ X ! Rbe an essentially strictly di↵erentiable function. If there exists a continuousselection s : O ! X with s(x) 2 @f(x) for every x 2 O, then f is strictlydi↵erentiable on O.Proof. By [17, Theorem 2.4, Corollary 4.2], f has a minimal Clarke subd-i↵erential @cf , and @cf can be recovered by every dense selection of @cf .Since s(x) 2 @f(x) ⇢ @cf(x), and s is continuous on O, we have @cf(x) =@f(x) = s(x) for every x 2 O, which implies that f is strictly di↵erentiableat x. Hence f is strictly di↵erentiable on O.We consider linear operator in R2:T =✓1 a bc 1 d◆(6.123)where a2 + b2 + c2 + d2 6= 0. Note that when a = b = c = d = 0, T = Gfwith f ⌘ 0.Theorem 6.13. Let T be given by (6.123). Then T is a subgradient pro-jector of an essentially strictly di↵erentiable function on R2 \ FixT if andonly if one of the following holds(i) (1) a = b = c = 0, d 6= 0: T = Gf where f(x1, x2) = K|x2|1/d forsome K > 0;(2) b = c = d = 0, a 6= 0: T = Gf where f(x1, x2) = K|x1|1/a forsome K > 0;(3) b = c = 0, a = d 6= 0: T = Gf where f(x1, x2) = K(x21 + x22)12afor some K > 0.(ii) a 6= 0, b = c, d = c2/a: T = Gf where f(x1, x2) = K|ax1+cx2|a/(a2+c2)for some K > 0.(iii) a = d 6= 0, b = c 6= 0: T = Gf wheref(x1, x2) =8><>:K(x21 + x22)a2(a2+c2) exp⇣ ca2+c2arctan⇣x1x2⌘⌘if x2 6= 0,0 if (x1, x2) = (0, 0),K|x1|a(a2+c2) exp⇣ |c|a2+c2⇡2⌘if x1 6= 0, x2 = 0,(6.124)for some K > 0, and f is lower semicontinuous. In particular, whenc 6= 0, f is nonconvex since f is not continuous when x1 6= 0 andx2 = 0.1286.3. A complete analysis of linear subgradient projectors on R2Proof. Assume that T is a subgradient projector. By Theorem 5.31 andLemma 6.12, we can find a di↵erentiable function g : X ! R such thatfor every x 2 X, x Txkx Txk2 = rg(x).Asx Tx =✓a bc d◆✓x1x2◆=✓ax1 + bx2cx1 + dx2◆,we have@g@x1=ax1 + bx2(ax1 + bx2)2 + (cx1 + dx2)2,@g@x2=cx1 + dx2(ax1 + bx2)2 + (cx1 + dx2)2.Since@2@x1@x2g(x1, x2) (6.125)= (a2c+ c3)x21 + (cd2  b2c+ 2abd)x22 + 2(c2d+ a2d)x1x2((ax1 + bx2)2 + (cx1 + dx2)2)2, (6.126)@2@x2@x1g(x1, x2) (6.127)= (a2b bc2 + 2acd)x21 + (b3 + bd2)x22 + 2(ab2 + ad2)x1x2((ax1 + bx2)2 + (cx1 + dx2)2)2(6.128)on a nonempty open set of R2. Then we have@2@x1@x2g(x1, x2) =@2@x2@x1g(x1, x2).This leads to8<: a2b bc2 + 2acd = a2c+ c3, (1)b3 + bd2 = cd2  b2c+ 2abd, (2)ab2 + ad2 = c2d+ a2d. (3)Now a ⇤ (2) b ⇤ (3)) (ad bc)(ab+ cd) = 0. It follows that1296.3. A complete analysis of linear subgradient projectors on R2(A) If ad = bc. (1) implies (b  c)(a2 + c2) = 0. Then the following twocase could happen.i. b = c = 0. Then (3)) ad(a d) = 0. That meansa = b = c = 0, d 6= 0,orb = c = d = 0, a 6= 0,ora = d 6= 0, b = c = 0.ii. b = c 6= 0 and d = c2/a.(B) If ab+ cd = 0. (1) implies (b+ c)(a2 + c2) = 0. We only consider thecase b = c 6= 0. Then (2) and (3) implies a = d.Thus we get the following three cases.(i) a = b = c = 0, d 6= 0. Then we getg(x1, x2) =ln |x2|d+ C1, if x2 6= 0.Figure 6.2: Graph of function g.1306.3. A complete analysis of linear subgradient projectors on R2f(x1, x2) = K|x2|1/d for some K > 0.Figure 6.3: Graph of function f .Or b = c = d = 0, a 6= 0. Then we getg(x1, x2) =ln |x1|a+ C1, if x1 6= 0.Figure 6.4: Graph of function g.1316.3. A complete analysis of linear subgradient projectors on R2f(x1, x2) = K|x1|1/a for some K > 0.Figure 6.5: Graph of function f .Or a = d 6= 0, b = c = 0. Then we getg(x1, x2) =12aln (x21 + x22) + C1, if x1 6= 0 and x2 6= 0.Figure 6.6: Graph of function g1326.3. A complete analysis of linear subgradient projectors on R2f(x1, x2) = K(x21 + x22)12a for some K > 0.Figure 6.7: Graph of function f(ii) a 6= 0, b = c 6= 0, d = c2/a. Then we getg(x1, x2) =aa2 + c2ln |ax1 + cx2|+ C2, if ax1 + cx2 6= 0.Figure 6.8: Graph of function g1336.3. A complete analysis of linear subgradient projectors on R2f(x1, x2) = K|ax1 + cx2|a/(a2+c2) for some K > 0.Figure 6.9: Graph of function f(iii) a = d 6= 0, b = c 6= 0. Then we getg(x1, x2) =a2(a2 + c2)ln(x21+x22)ca2 + c2arctan✓x1x2◆+C3, (6.129)if x2 6= 0. Since g = ln f , we obtain f = exp(g) by using (i) and (ii).For (iii), we obtainf(x1, x2) = K(x21 + x22)a2(a2+c2) exp✓ ca2 + c2arctan✓x1x2◆◆if x2 6= 0(6.130)for some K > 0.1346.3. A complete analysis of linear subgradient projectors on R2(a) Graph of function g. (b) Graph of function f .Figure 6.10: Plots of g and f when a = 1, c = 1 and K = 1 in Theo-rem 6.13(iii).However, when c 6= 0, f is not continuous at (x¯1, 0) with x¯1 6= 0 since⇡2= limx1!x¯1,x2#0arctanx1x26= limx1!x¯1,x2"0arctanx1x2= ⇡2. (6.131)The function given by (6.124) is lower semicontinuous but not continuous atevery (x¯1, 0). Moreover, f is not convex on R2 since a finite-valued convexfunction is continuous.It is natural to ask for what selection s 2 @f , we have Gf = T on R2.On R2 \ {(x1, x2) : x2 6= 0}, one clearly chooses s = rf . It remains todetermine the subgradient of f at (x¯1, 0). Indeed, when x2 6= 0, f(x1, x2) =exp(g(x1, x2)), so that rf(x1, x2) = f(x1, x2)rg(x1, x2), i.e.rf(x1, x2) (6.132)=K(x21 + x22)a2(a2+c2) exp✓ ca2 + c2arctan✓x1x2◆◆1a2 + c2(ax1  cx2, ax2 + cx1)x21 + x22.(6.133)When (x1, x2)! (x¯1, 0), cx1/x2 > 0, we havef(x1, x2)! K|x¯1|a(a2+c2) exp✓ |c|a2 + c2⇡2◆= f(x¯1, 0) (6.134)1356.3. A complete analysis of linear subgradient projectors on R2andrf(x1, x2)! K|x¯1|a(a2+c2) exp✓ |c|a2 + c2⇡2◆1a2 + c2✓ax¯1,cx¯1◆. (6.135)Therefore, by the definition of limiting subdi↵erential (see Definition 5.1),K|x¯1|a(a2+c2) exp✓ |c|a2 + c2⇡2◆1a2 + c2✓ax¯1,cx¯1◆2 @f(x¯1, 0). (6.136)Hence we can choose s(x¯1, 0) to be the limiting subgradient given by (6.136).Corollary 6.14. (1). DefineT =✓0 11 0◆(6.137)The skew linear mapping T is not firmly nonexpansive, so not a cutter.However, T is a subgradient projector of a nonconvex, discontinuous butlower semicontinuous function f given byf(x, y) =8><>:(x2 + y2)1/4 ⇤ exp((1/2) ⇤ arctan(x/y)) if y 6= 0,0 if (x, y) = (0, 0),|x|1/2 exp(⇡/4) if x 6= 0, y = 0.(6.138)Figure 6.11: Graph of f defined in (6.138).1366.3. A complete analysis of linear subgradient projectors on R2(2). The linear mappingT :=✓1/2 1/21/2 1/2◆(6.139)is firmly nonexpansive and a cutter. However, T is a subgradient projector of anonconvex, discontinuous but lower semicontinuous function f given byf(x, y) =8><>:(x2 + y2)1/2 ⇤ exp( arctan(x/y)) if y 6= 0,0 if (x, y) = (0, 0),|x| exp(⇡/2) if x 6= 0, y = 0.(6.140)Figure 6.12: Graph of f defined in (6.140).By Theorem 5.35, there exists no continuous convex function f suchthat Gf = T in either case. Corollary 6.14 says that T = Gf being linearand firmly nonexpansive does not implies that f is convex. A key pointbelow is that if T = Gf is linear and f is convex, then T has to be firmlynonexpansive and symmetric.Corollary 6.15. Let T be given by (6.123). Then T is a subgradient pro-jector of a convex function if and only if one of the following holds1376.3. A complete analysis of linear subgradient projectors on R2(i) a = b = c = 0, d 6= 0, 0 < d  1: T = Gf where f(x1, x2) = K|x2|1/dfor some K > 0; or b = c = d = 0, a 6= 0, 0 < a  1: T = Gf wheref(x1, x2) = K|x1|1/a for some K > 0.(ii) a 6= 0, b = c, d = c2/a, a  a2 + c2: T = Gf where f(x1, x2) =K|ax1 + cx2|a/(a2+c2) for some K > 0.(iii) a = d, b = c = 0, 0 < a  1: T = Gf where f(x1, x2) = K(x21 + x22)12afor some K > 0.In this chapter, we not only characterized symmetric linear subgradientprojectors (see Theorem 6.6), but also provided a criterion when a linearmonotone operator is a subgradient projector (see Theorem 6.11).138Chapter 7Conclusion7.1 Key ResultsThis thesis has provided a complete study of charaterizations of sub-gradient projectors, as well as a new method for finding a fixed point of acutter. Let us list the key results and some crucial examples.Theorem 3.14 gives a finite convergence result provided that int(C \FixT ) 6= ? and P ⌘nrn = +1.Theorem 3.15 shows the finite convergence is also true if we have a lessrestrictive assumption on (FixT,C) but a more restrictive one on parameters(rn, ⌘n).Example 3.20 and Example 3.21 show that both the divergent-seriescondition and the nonempty-interior condition are necessary and cannot beomitted for Theorem 3.14 and Theorem 3.15.Theorem 4.17 lists the corresponding relationship between the continuityof subgradient projectors and Fre´chet di↵erentiability of the function f .Example 4.22 provides an example that subgradient projectors lack strong-to-weak continuity when the function f is only Gaˆteaux di↵erentiable.Example 4.24 links the subgradient projector to the accelerated mappingcorresponding to a linear operator when the function f is a power of aquadratic form.Proposition 4.43 exhibits surprisingly complicated properties of the sub-gradient projector of the function f(x1, x2) = |x1|p + |x2|p with di↵erentchoices of the parameter p  1.Theorem 4.44 shows that, on the real line, the Yamagishi-Yamada oper-ator is actually a subgradient projector of another convex function.Theorem 5.32 gives a criterion when a mapping is a subgradient projec-tor.Theorem 5.39 characterizes the class of functions f for which its subgra-dient projector is a cutter.Theorem 6.3 lists several interesting results of subgradient projectors offunctions defined by a symmetric, positive semidefinite matrix. We obtained1397.2. Future Work and Open Questionsthat its subgradient projector is linear if and only if the involved matrix canbe represented as a positive multiple of the projector onto a linear subspace.Theorem 6.6 completely characterizes symmetric linear subgradient pro-jectors.It is known that the set of cutters is convex. Even though subgradientprojectors are cutters, the set of subgradient projectors is not convex: seeTheorem 6.7 and Theorem 6.8.Proposition 6.9 provides a criterion when a linear monotone operator isa subgradient projector.7.2 Future Work and Open Questions7.2.1 Open Questions(1) Consider Theorem 6.6. Suppose the subgradient projector of a convexfunction is linear. Must it be symmetric?(2) Consider Theorem 6.11. Suppose that the rank of the displacementmatrix is equal to 2. What conditions are sucient to guarantee thatthe original matrix is a subgradient projector of a convex function?7.2.2 S-cutterIn this section, we consider a new operator: S-cutter.Definition 7.1. Let S be a nonempty subset of FixT . We call T : H! Ha S-cutter if for 8x 2 H, 8y 2 S,hy  Tx, x Txi  0. (7.1)Clearly, if S = FixT , then the definition of S-cutters matches the definitionof cutters.In [20], there is a systematic study of the properties of cutters. A nat-ural question is: which properties of cutters also hold for S-cutters? Forexample, [20, Theorem 2.1.39] shows the relationship between cutters andstrongly quasi-nonexpansive operators. It would be interesting to find con-nections between S-cutters and strongly quasi-nonexpansive operators, andto consider the following list of questions:(1) Is the set of S-cutters closed under convex combinations and compo-sitions?1407.2. Future Work and Open Questions(2) What desirable properties do the relaxations of S-cutters have?(3) In [20, Theorem 3.4.3], we see that a strongly quasi-nonexpansive op-erator with a fixed point is asymptotically regular. If there is somerelation between S-cutters and strongly quasi-nonexpansive operators,does a similar property hold for S-cutters? That is: are S-cutters ortheir relaxations asymptotically regular? If so, what is the consequenceof [20, Theorem 3.5.2] when applied to S-cutters?(4) Considering [20, Corollary 3.7.1, Corollary 3.7.3], what are the Opial-type theorems for S-cutters?(5) In [13], Theorems 3.1 and 3.2 hold if the interior of FixT is nonempty.If S ✓ FixT , we may ask: do Theorems 3.1 and 3.2 hold for S-cuttersunder some sucient conditions?7.2.3 Analysis of ParametersFrom the numerical experiments at the end of Chapter 3, we can seethat the selection of parameters is crucial and plays an important role in theperformance of algorithms. A question for future work is how to predefineparameters so that our algorithm will perform better.7.2.4 Generalized MonotonicitiesChapters 4–6 provide some properties and characterizations of subgradi-ent projectors. During the last twenty-five years, generalized monotonicitieshave been studied a lot. They play an important role in variational inequal-ity problems. Let C ✓ H be closed convex and F : H! H. The variationalinequality problem is to find x 2 C such thathFx, y  xi  0 (7.2)holds for all y 2 C. It would be interesting to investigate the generalizedmonotonicities of subgradient projectors in the context of solving the varia-tional inequality problem.7.2.5 Recognition Convexity of f from GfIn Section 6.3, we provide a complete analysis of linear subgradient pro-jectors on R2. Outside R2, is it possible to detect convexity of f from Gf?141Bibliography[1] H. H. Bauschke, Projection Algorithms and Monotone Operators,PhD thesis, Simon Fraser University, Burnaby, BC, Canada, August1996. ! pages[2] H. H. Bauschke and J. M. Borwein, On projection algorithms forsolving convex feasibility problems, SIAM Review, 38 (1996), pp. 367–426. ! pages[3] H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Bregmanmonotone optimization algorithms, SIAM Journal on Control Optimiza-tion, 42 (2003), pp. 596–636. ! pages[4] H. H. Bauschke, J. Chen, and X. Wang, A projection methodfor approximating fixed points of quasi nonexpansive mappings withoutthe usual demiclosedness condition, Journal of Nonlinear and ConvexAnalysis, 15 (2014), pp. 129–135. ! pages[5] H. H. Bauschke, J. Chen, and X. Wang, A Bregman projectionmethod for approximating fixed points of quasi-Bregman nonexpansivemappings, Applicable Analysis, 94 (2015), pp. 75–84. ! pages[6] H. H. Bauschke and P. L. Combettes, A weak-to-strong conver-gence principle for Feje´r-monotone methods in Hilbert space, Mathe-matics of Operations Research, 26 (2001), pp. 248–264. ! pages[7] H. H. Bauschke and P. L. Combettes, Convex Analysis and Mono-tone Operator Theory in Hilbert Spaces, CMS Books in Mathematics/ Ouvrages de Mathe´matiques de la SMC, Springer, New York, 2011.With a foreword by He´dy Attouch. ! pages[8] H. H. Bauschke, F. Deutsch, H. Hundal, and S. H. Park, Accel-erating the convergence of the method of alternating projections, Trans-actions of the American Mathematical Society, 355 (2003), pp. 3433–3461. ! pages142Bibliography[9] H. H. Bauschke, S. M. Moffat, and X. Wang, Self-dual smoothapproximations of convex functions via the proximal average, Fixed-Point Algorithms for Inverse Problems in Science and Engineering(Ban↵ 2009), Springer, (2011), pp. 23–32. ! pages[10] H. H. Bauschke, C. Wang, X. Wang, and J. Xu, Characterizationsof subgradient projectors, in progress. ! pages[11] H. H. Bauschke, C. Wang, X. Wang, and J. Xu, The linearity ofsubgradient projectors, in progress. ! pages[12] H. H. Bauschke, C. Wang, X. Wang, and J. Xu, On subgradientprojectors, SIAM Journal on Optimization, 25 (2015), pp. 1064–1082.! pages[13] H. H. Bauschke, C. Wang, X. Wang, and J. Xu, On the finite con-vergence of a projected cutter method, Journal of Optimization Theoryand Applications, 165 (2015), pp. 901–916. ! pages[14] H. H. Bauschke, X. Wang, and L. Yao, Monotone linear relations:maximality and Fitzpatrick functions, Journal of Convex Analysis, 16(2009), pp. 673–686. ! pages[15] J. M. Borwein and M. Fabia´n, On convex functions having pointsof Gateaux di↵erentiability which are not points of Fre´chet di↵erentia-bility, Canadian Journal of Mathematics, 45 (1993), pp. 1121–1134. !pages[16] J. M. Borwein and A. S. Lewis, Convex Analysis and NonlinearOptimization, Springer-Verlag New York, 2 ed., 2006. ! pages[17] J. M. Borwein and W. B. Moors, Essentially smooth Lipschitzfunctions, Journal of Functional Analysis, 149 (1997), pp. 305–351. !pages[18] J. M. Borwein and J. D. Vanderwerff, Convex Functions, Cam-bridge University Press, 2010. ! pages[19] T. D. Capricelli, Projections Convexes Ge´ne´ralise´es et Applicationsen Imagerie Me´nicale, PhD thesis, LU´niversite´ Pierre et Marie CuriePairs VI, 2008. ! pages[20] A. Cegielski, Iterative Methods for Fixed Point Problems in HilbertSpaces, Springer, 2012. ! pages143Bibliography[21] Y. Censor, W. Chen, and H. Pajoohesh, Finite convergence of asubgradient projection method with expanding controls, Applied Math-ematics & Optimization, 64 (2011), pp. 273–285. ! pages[22] Y. Censor and A. Lent, Cyclic subgradient projections, Mathemat-ical Programming, 24 (1982), pp. 233–235. ! pages[23] Y. Censor and A. Segal, Sparse string-averaging and split com-mon fixed points, in ”Nonlinear Analysis and Optimization I”, A.Leizarowitz, B.S. Mordukhovich, I. Shafrir, and A.J. Zaslavski (edi-tors), Contemporary Mathematics, 513 (2010), pp. 125–142. ! pages[24] Y. Censor and S. A. Zenios, Parallel Optimization, Oxford Univer-sity Press, 1997. ! pages[25] F. H. Clarke, Optimization and Nonsmooth Analysis, Classics in Ap-plied Mathematics, Society for Industrial and Applied Mathematics(SIAM), Philadelphia, PA, second ed., 1990. ! pages[26] P. L. Combettes, The foundations of set theoretic estimation, Pro-ceedings of the IEEE, 81 (1993), pp. 182–208. ! pages[27] P. L. Combettes, Restauration ensembliste d’images par ite´rationsparalle`les extrapole´es de sous-gradients, Proceedings of the 15thGRETSI Symposium, (1995), pp. 447–450. ! pages[28] P. L. Combettes, The convex feasibility problem in image recovery,in Advances in Imaging and Electron Physics, P. Hawkes (editor), Aca-demic Press, 95 (1996). ! pages[29] P. L. Combettes, Convex set theoretic image recovery by extrapo-lated iterations of parallel subgradient projections, IEEE Transactionson Image Processing, 6 (1997), pp. 493–506. ! pages[30] P. L. Combettes and J. Luo, An adaptive level set method for non-di↵erentiable constrained image recovery, IEEE Transactions on ImageProcessing, 11 (2002), pp. 1295–1304. ! pages[31] G. Crombez, Finding common fixed points of a class of paracontrac-tions, Acta Mathematica Hungarica, 103 (2004), pp. 233–241. ! pages[32] K. R. Davidson and A. P. Donsig, Real Analysis and Applications,Springer, 2010. ! pages144Bibliography[33] A. L. Dontchev and R. T. Rockafellar, Implicit Functions andSolution Mappings, Springer Monographs in Mathematics, 2009. !pages[34] M. Fukushima, A finite convergent algorithm for convex inequalities,IEEE Transactions on Automatic Control, 27 (1982), pp. 1126–1127.! pages[35] M. C. Gemignani, Elementary Topology, Dover Publications Inc, sec-ond ed., 1972. ! pages[36] E. Kreyszig, Introductory Functional Analysis with Applications, Wi-ley, 1989. ! pages[37] C. Meyer, Matrix Analysis and Applied Linear Algebra, Society forIndustrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000.! pages[38] J. Milnor, Dynamics in One Complex Variable, Princeton UniversityPress, third ed., 2006. ! pages[39] B. S. Mordukhovich, Generalized di↵erential calculus for nonsmoothand set-valued mappings, Journal of Mathematical Analysis and Appli-cations, (1994), pp. 250–288. ! pages[40] B. S. Mordukhovich, Variational Analysis and Generalized Di↵er-entiation I, Springer-Verlag, 2006. ! pages[41] B. S. Mordukhovich and N. M. Nam, An Easy Path to ConvexAnalysis and Applications, Morgan and Claypool Publishers, 2014. !pages[42] B. S. Mordukhovich and Y. Shao, Nonsmooth sequential analysisin asplund spaces, Transactions of the American Mathematical Society,348 (1996), pp. 1235–1280. ! pages[43] N. Ogura and I. Yamada, A deep outer approximating half space ofthe level set of certain quadratic functions, Journal of Nonlinear andConvex Analysis, 6 (2005), pp. 187–201. ! pages[44] B. Pauwels, Subgradient projection operators, Master’s thesis, Univer-site Pierre et Marie Curie, France, September 2012. ! pages[45] J. P. Penot, Calculus Without Derivatives, Springer, 2013. ! pages145Bibliography[46] R. R. Phelps, Convex functions, Monotone Operators and Di↵eren-tiability, no. 1364 in Lecture Notes in Mathematics, Springer-Verlag,Berlin, second ed., 1993. ! pages[47] A. R. D. Pierro and A. N. Iusem, A finitely convergent ”row-action”method for the convex feasibility problem, Applied Mathematics andOptimization, 17 (1988), pp. 225–235. ! pages[48] B. T. Polyak, Minimization of unsmooth functionals, U.S.S.R. Com-putational Mathematics and Mathematical Physics, 9 (1969), pp. 14–29. ! pages[49] B. T. Polyak, Introduction to Optimization, Optimization Software,1987. ! pages[50] B. T. Polyak, Random algorithms for solving convex inequalities. InD. Butnariu, Y. Censor, S. Reich(eds.): Inherently Parallel Algorithmsin Feasibility and Optimization and their Applications, Elsevier SciencePublishers, (2001), pp. 409–422. ! pages[51] E. Raik, A class of iterative methods with Feje´r-monotone sequences,Eesti NSV Teaduste Akadeemia Toimetised. Fu¨u¨sika-Matemaatika, 18(1969), pp. 22–26. ! pages[52] L. F. Richardson, Measure and Integration : a Concise Introductionto Real Analysis, Wiley, 2009. ! pages[53] R. T. Rockafellar and J. B. Wets, Variational Analysis, Springer,third ed., 2009. ! pages[54] K. A. Ross, Elementary Analysis, Springer, second ed., 2013. ! pages[55] K. Slavakis and I. Yamada, The adaptive projected subgradientmethod constrained by families of quasi-nonexpansive mappings andits application to online learning, SIAM Journal on Optimization, 23(2013), pp. 126–152. ! pages[56] S. Theodoridis, K. Slavakis, and I. Yamada, Adaptive learningin a world of projections, IEEE Signal Processing Magazine, 97 (2011),pp. 97–123. ! pages[57] S. X. Wang, Fine and Pathological Properties of Subdi↵erentials, PhDthesis, Simon Fraser University, August 1999. ! pages146Bibliography[58] I. Yamada and N. Ogura, Adaptive projected subgradient method forasymptotic minimization of sequence of nonnegative convex functions,Numerical Functional Analysis and Optimization, 25 (2004), pp. 593–617. ! pages[59] I. Yamada and N. Ogura, Hybrid steepest descent method for vari-ational inequality problem over the fixed point set of certain quasi-nonexpansive mappings, Numerical Functional Analysis and Optimiza-tion, 25 (2004), pp. 619–655. ! pages[60] I. Yamada, K. Slavakis, and K. Yamada, An ecient robust adap-tive filtering algorithm based on parallel subgradient projection tech-niques, IEEE Transactions on Signal Processing, 50 (2002), pp. 1091–1101. ! pages[61] M. Yamagishi and I. Yamada, A deep monotone approximation op-erator based on the best quadratic lower bound of convex functions,IEICE Transactions on Fundamentals of Electronics, Communicationsand Computer Sciences, E91 (2008), pp. 1858–1866. ! pages147Appendix148Appendix AMaple CodeA.1 Code to verify Remark 3.24 whenf(x) = x2  1f := x 7! x2  1df := D (f)Gf := x 7! x 1/2 x2  1xr1 := n 7! 1pn+ 1r2 := n 7! (n+ 1)1⌘1 := 1⌘2 := 2✏1 := n 7! 1pn+ 1✏2 := n 7! (n+ 1)1U [1] := (x, r) 7! piecewise(f(x)  0, x, f(x) > 0, x + ⌘1 ⇤ (r + |G[f ](x) x)| ⇤ (G[f ](x) x)|G[f ](x) x| )U [2] := (x, r) 7! piecewise(f(x)  0, x, f(x) > 0, x + ⌘2 ⇤ (r + |G[f ](x) x)| ⇤ (G[f ](x) x)|G[f ](x) x| )T [1] := y 7! piecewise(f(y)  0, y, f(y) > 0, y  f(y) + ✏1(n)df(y))T [2] := y 7! piecewise(f(y)  0, y, f(y) > 0, y  f(y) + ✏2(n)df(y))149with(RandomTools)with(Statistics)with(LinearAlgebra)M := 100h := V ector(M)g := V ector(M)k := V ector(M)l := V ector(M)u := V ector(M)p := V ector(M)SetState(state = 12345)for v from 1 by 1 to M dox[0] := Generate(float(range = 1 .. 1000000));y[0]:= x[0]; z[0]:= x[0]; w[0] := x[0]; a[0] := x[0]; b[0] := x[0];n := 0; m := 0; i := 0; j := 0; s := 0; q := 0; ;while f(x[n]) > 106 do x[n+ 1] := evalf(U [1](x[n], r[1](n))); n := n+ 1; end dowhile f(y[m]) > 106 do y[m+ 1] := evalf(U [1](y[m], r[2](m))); m := m+ 1; end dowhile f(z[i]) > 106 do z[i+ 1] := evalf(U [2](z[i], r[1](i))); i := i+ 1; end dowhile f(w[j]) > 106 do w[j + 1] := evalf(U [2](w[j], r[2](j))); j := j + 1; end dowhile f(a[s]) > 106 do a[s+ 1] := evalf(T [1](a[s])); s := s+ 1; end do150while f(b[q]) > 106 do b[q + 1] := evalf(T [2](b[q])); q := q + 1; end doh[v] := n; g[v] := m; k[v] := i; l[v] := j; u[v] := s; p[v] := qmeanU1(x,r1) := Mean (h)10.8300000000000mn1 := evalfmeanU1(x,r1), 410.83meanU1(x,r2) := Mean (g)11.4900000000000mn2 := evalfmeanU1(x,r2), 411.49meanU2(x,r1) := Mean (k)2.0mn3 := evalfmeanU2(x,r1), 42.0meanU2(x,r2) := Mean (l)2.0mn4 := evalfmeanU2(x,r2), 42.0meanT1 := Mean (u)11.8100000000000mn5 := evalf (meanT1 , 4)11.81meanT2 := Mean (p)12.1900000000000mn6 := evalf (meanT2 , 4)12.19medianU1(x,r1) := Median (h)12.0d1 := evalfmedianU1(x,r1), 412.0medianU1(x,r2) := Median (g)13.0d2 := evalfmedianU1(x,r2), 415113.0medianU2(x,r1) := Median (k)2.0d3 := evalfmedianU2(x,r1), 42.0medianU2(x,r2) := Median (l)2.0d4 := evalfmedianU2(x,r2), 42.0medianT1 := Median (u)13.0d5 := evalf (medianT1 , 4)13.0medianT2 := Median (p)13.0d6 := evalf (medianT2 , 4)13.0N :=Matrix(6, 2, [mn[2], d[2],mn[4], d[4],mn[1], d[1],mn[3], d[3],mn[5], d[5],mn[6], d[6]])N :=266666666666411.49 13.02.0 2.010.83 12.02.0 2.011.81 13.012.19 13.03777777777775152A.2 Code to verify Remark 3.24 whenf(x) = 100x2  1f := x 7! 100 ⇤ x2  1df := D (f)Gf := x 7! x 100 ⇤ x2  1200 ⇤ xr1 := n 7! 1pn+ 1r2 := n 7! (n+ 1)1⌘1 := 1⌘2 := 2✏1 := n 7! 1pn+ 1✏2 := n 7! (n+ 1)1U [1] := (x, r) 7! piecewise(f(x)  0, x, f(x) > 0, x + ⌘1 ⇤ (r + |G[f ](x) x)| ⇤ (G[f ](x) x)|G[f ](x) x| )U [2] := (x, r) 7! piecewise(f(x)  0, x, f(x) > 0, x + ⌘2 ⇤ (r + |G[f ](x) x)| ⇤ (G[f ](x) x)|G[f ](x) x| )T [1] := y 7! piecewise(f(y)  0, y, f(y) > 0, y  f(y) + ✏1(n)df(y))T [2] := y 7! piecewise(f(y)  0, y, f(y) > 0, y  f(y) + ✏2(n)df(y))with(RandomTools)with(Statistics)with(LinearAlgebra)M := 100153h := V ector(M)g := V ector(M)k := V ector(M)l := V ector(M)u := V ector(M)p := V ector(M)SetState(state = 12345)for v from 1 by 1 to M dox[0] := Generate(float(range = 1 .. 1000000));y[0]:= x[0]; z[0]:= x[0]; w[0] := x[0]; a[0] := x[0]; b[0] := x[0];n := 0; m := 0; i := 0; j := 0; s := 0; q := 0; ;while f(x[n]) > 106 do x[n+ 1] := evalf(U [1](x[n], r[1](n))); n := n+ 1; end dowhile f(y[m]) > 106 do y[m+ 1] := evalf(U [1](y[m], r[2](m))); m := m+ 1; end dowhile f(z[i]) > 106 do z[i+ 1] := evalf(U [2](z[i], r[1](i))); i := i+ 1; end dowhile f(w[j]) > 106 do w[j + 1] := evalf(U [2](w[j], r[2](j))); j := j + 1; end dowhile f(a[s]) > 106 do a[s+ 1] := evalf(T [1](a[s])); s := s+ 1; end dowhile f(b[q]) > 106 do b[q + 1] := evalf(T [2](b[q])); q := q + 1; end doh[v] := n; g[v] := m; k[v] := i; l[v] := j; u[v] := s; p[v] := qmeanU1(x,r1) := Mean (h)17.5200000000000mn1 := evalfmeanU1(x,r1), 415417.52meanU1(x,r2) := Mean (g)13.2900000000000mn2 := evalfmeanU1(x,r2), 413.29meanU2(x,r1) := Mean (k)105.0mn3 := evalfmeanU2(x,r1), 4105.0meanU2(x,r2) := Mean (l)12.0mn4 := evalfmeanU2(x,r2), 412.0meanT1 := Mean (u)15.2700000000000mn5 := evalf (meanT1 , 4)15.27meanT2 := Mean (p)15.7600000000000mn6 := evalf (meanT2 , 4)15.76medianU1(x,r1) := Median (h)19.0d1 := evalfmedianU1(x,r1), 419.0medianU1(x,r2) := Median (g)14.0d2 := evalfmedianU1(x,r2), 414.0medianU2(x,r1) := Median (k)105.0d3 := evalfmedianU2(x,r1), 4105.0medianU2(x,r2) := Median (l)12.0155d4 := evalfmedianU2(x,r2), 412.0medianT1 := Median (u)16.0d5 := evalf (medianT1 , 4)16.0medianT2 := Median (p)17.0d6 := evalf (medianT2 , 4)17.0N := Matrix(6, 2, [mn[2], d[2],mn[4], d[4],mn[1], d[1],mn[3], d[3],mn[5], d[5],mn[6], d[6]])N :=266666666666413.29 14.012.0 12.017.52 19.0105.0 105.015.27 16.015.76 17.03777777777775156

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0300354/manifest

Comment

Related Items