Parametrically Prox-Regular Functions by Chayne Planiden B.Sc. Hons., Universidad Aut´onoma de Baja California, 2009 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The College of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) April 2013 c Chayne Planiden 2013 Abstract Prox-regularity is a generalization of convexity that includes all lower-C 2 functions. Therefore, the study of prox-regular functions provides insight on a broad spectrum of important functions. Parametrically prox-regular (para-prox-regular) functions are a further extension of this family, produced by adding a parameter. Such functions have been shown to play a key role in understanding the stability of minimizers in optimization problems. This thesis discusses para-prox-regular functions in Rn . We begin with some basic examples of para-prox-regular functions, and move on to the more complex examples of the convex and nonconvex proximal averages. We develop an alternate representation of para-prox-regular functions, related to the monotonicity of an f -attentive -localization as was done for prox-regular functions [25]. Levy in [18] provided proof of one implication of this relationship; we provide a characterization. We analyze two common forms of parametrized functions that appear in optimization: finite parametrized sum of functions, and finite parametrized max of functions. The example of strongly amenable functions by Poliquin and Rockafellar [27] is given, and a relaxation of its necessary conditions is presented. Some open questions and directions of further research are stated. ii Statement of Co-Authorship This thesis has been adapted from a manuscript co-authored with Dr. Warren Hare at the University of British Columbia, Okanagan campus. The manuscript has been submitted for publication to the Journal of Convex Analysis [12]. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . vi Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2: Preliminary Definitions . . . . . . . . . . . . . . . 6 2.1 Basic Notation and Definitions . . . . . . . . . . . . 6 2.2 Subgradients . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Prox-Regularity . . . . . . . . . . . . . . . . . . . . . 14 Chapter 3: Basic Examples, Proximal Average . . . . . . 18 3.1 Basic Examples . . . . . . . . . . . . . . . . . . . . . 18 3.2 Proximal Average . . . . . . . . . . . . . . . . . . . . 22 Chapter 4: Relationship to Monotonicity . . . . . . . . . . 26 4.1 Localization of the Subdifferential . . . . . . . . . . 26 4.2 Characterization . . . . . . . . . . . . . . . . . . . . 28 Chapter 5: Constructing Para-prox-regular Functions . . 5.1 Naming Parameters Explicitly . . . . . . . . . . . . 5.2 Scalar Multiplication and Finite Sum . . . . . . . . 5.3 Finite Maximum . . . . . . . . . . . . . . . . . . . . 34 34 37 42 Chapter 6: Conclusion . . . . . . . . . . . . . . . . . . . . . . . 45 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Appendix A: Glossary of Notation . . . . . . . . . . . . 50 Appendix B: Index . . . . . . . . . . . . . . . . . . . . . 51 iv List of Figures 1.1 1.2 1.3 Three convex functions. . . . . . . . . . . . . . . . . . Line segments on or above the curve. . . . . . . . . . . A non-convex function. . . . . . . . . . . . . . . . . . . 2.1 2.2 2.3 2.4 2.5 2.6 The function x2 and some tangent lines. . . . . . . . . The norm function has multiple subgradients at the origin. The epigraph of sin x. . . . . . . . . . . . . . . . . . . . The projection of (0, −1) onto epi−|x| . . . . . . . . . . . The proximal normal cone to epi|x| at the origin. . . . . 1 The graph of f (x) = x 3 , and its normal cone at x = 0 in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . Left: a prox-regular function. Right: a function that is not prox-regular at x = 0. . . . . . . . . . . . . . . . . 2.7 3 3 4 11 11 12 12 13 14 15 v Acknowledgements Thank you to Dr. Warren Hare, my supervisor and co-author. You have brought out a love for research in me, and I look forward to continuing down this path you have helped me to discover. Thank you to the committee members for agreeing to be a part of this very important moment in my career. Thank you to NSERC and UBC for funding this research. vi Dedication To my wife: As is the case with every other positive aspect of my life, this could not have happened without your love and support. Thank you so much for all that you do and all that you are for me. I love you. vii Chapter 1 Introduction Optimization is a branch of mathematics that is of great practical importance in the world today. The very basic objective of an optimizer is to find the minimizers of a given function that is subject to a set of constraints. So that the reader may visualize this concept clearly, let us consider the following example. Imagine that you are the CEO of a company that makes soft drinks. You want to design a plastic bottle that holds a certain volume of your product, and you want to use the minimum quantity of material possible in manufacturing the bottle. This is a logical objective, since you will be mass-producing the bottles and any savings in the amount of material used will affect the bottom line of the company. So the initial question is, what shape should the bottle be? This addresses the first part of the optimizer’s objective: finding the minimizer. In the language of a mathematician, the goal is to find the shape that minimizes the surface area of a container while maintaining its volume unchanged. There are well-known formulas for the volume and surface area of any basic three-dimensional shape we would like to consider: a cylinder, a cube, etc. Calculus can be used to determine that the most efficient shape for our purposes is a sphere. If we want to contain a fixed volume of liquid in a container that uses a minimum of material, the container should be the shape of a ball. Problem solved? Well, not really. How do you drink out of a spherical bottle? How do you keep it from rolling off the table? How do you stack them on the shelves in the grocery stores? We have solved one problem and created a multitude of others. So this is the moment to employ the other component to the optimizer’s basic objective: the set of constraints. We will reconsider the same problem, but first we make a list of restrictions on the bottle, such as the following. − It must stand up on its own on a flat surface, − it must be a shape that can be held comfortably with one hand, 1 − it must be convenient to drink out of, − it must fit in a standard cupholder of a vehicle, and so on. One may list as many constraints as one wishes, and then ask the original question again. Now, subject to this set of constraints, what shape will use a minimum of material? So we continue with our calculations. The next-best shape after a sphere is a cylinder with hemispheres on both ends. This one can be held comfortably with one hand, but it still does not meet the other criteria. So we continue the process until we have the shape in use today. Think about the plastic soda bottles we have in the stores. They are, in fact, cylinders with hemispheres on the top, and a spout added for drinking. The bottom is not a hemisphere of course, that would not stand up on its own, but it is not flat either. A flat bottom would be a waste of material, so the feet are little hemispheres. That is a more efficient use of the material, and that is why it is the shape we see everywhere soft drinks are sold. This example is a great illustration of the importance of optimization. Every business is concerned with issues such as maximizing production, minimizing overhead expenses, maximizing efficiency, minimizing waste, maximizing profit, minimizing risk. So aspects of the research we do in this field are applicable to anyone and everyone in industry. This thesis deals with a particular family of functions seen in optimization problems, called parametrically prox-regular (para-prox-regular) functions. In chapters that follow, we shall see the definition and particular uses of these functions, some examples, and a few methods of constructing them. For now, we will talk about the origin of these functions and why we are interested in learning about them. We begin with an important ancestor, the convex function. In the late nineteenth century, mathematicians such as Minkowski [22], Brunn [8], and many others began to work with convex functions, as their desirable properties became apparent. Research on convexity continues today; convex analysis is a prominent area of research for optimizers. So what is a convex function? Graphically speaking, it is exactly what we might intuitively expect: a function that has a convex shape. Figure 1.1 illustrates three such functions. Figure 1.1: Three convex functions. We will see the formal definition of convex functions in the next chapter, but for now let us just consider the graphs. If we want to verify that a function is convex, we can use a line-segment test. Every convex function has the property that, if you choose any two points that are on or above the graph of the function, the line segment that joins those two points is also entirely on or above the graph. For example, all three graphs in Figure 1.1 have that property. If we look more closely at one of those graphs, we can see more clearly that this is the case. Figure 1.2: Line segments on or above the curve. This gives us a way to prove graphically that a function is not convex. If we have the graph of a function and we can find two points that lie on or above the graph, but the line segment that joins them does not, then we know that we have a non-convex function. It has failed the line-segment test. For example, Figure 1.3 shows a nonconvex function. Here we have chosen two points that are above the graph, but a section of the line segment that joins them falls below the graph. So this function is not convex. Figure 1.3: A non-convex function. There are many benefits to working with convex functions, from an optimization point of view. However, there are other classes of functions that are not necessarily convex, but that also have desirable properties. Most notably among those is the class of differentiable functions, or so-called ‘smooth’ functions. Again considering them from a graphical standpoint, differentiable functions are functions whose graphs are smooth curves, with no gaps or sharp points in them. Taking a second look at Figure 1.1, we see that the graph on the left is that of a smooth function, while the other two are not, since they both have a kink in them. Figure 1.3 also represents a smooth function. So we can see that there is some overlap of the two classes, and that there are some differences between them. There exist functions that are both convex and differentiable, convex functions that are not differentiable, differentiable functions that are not convex, and functions that are neither. It would be beneficial to have a class of functions that covers both these types of functions, so that any theory developed in this new class would be applicable to both convex functions and smooth functions. This has already been done, and we call this new class prox-regular functions. Prox-regularity was first introduced in the mid-nineties [10, 24, 25], so this branch of mathematics is less than twenty years old. The definition is quite technical and is better left to the next chapter, but it is a very useful property, as we shall see shortly. Many familiar classes of functions fall into this category, including convex functions and differentiable functions. Since its conception, a number of articles have been published on the topic [4, 6, 7, 11, 13–16, 20, 23, 26], furthering the theory and establishing practical, real-world applications for such functions. Para-prox-regularity, which is the focus of this thesis, was first in- troduced by Levy, Poliquin and Rockafellar in 2000 [19]. There, it was used to analyze the stability of solutions to parametrized optimization problems. Para-prox-regular functions share the characteristics of proxregular functions, but an extra level of complexity is added. The formal definition of parametric prox-regularity is presented in Definition 2.3.3. Little research has been done on para-prox-regular functions to date; the purpose of the present work is to take a step in that direction. We provide some examples of such functions, and develop an alternative characterization. We then examine several styles of constructing some commonly-used functions that are potentially para-prox-regular, and prove that indeed they are para-prox-regular under certain conditions. The organization of this thesis is as follows. Chapter 2 contains basic definitions we will need, and the definitions of parametrized functions, prox-regular functions and para-prox-regular functions. It also provides some preliminary results for use in subsequent sections. Chapter 3 presents some examples of prox-regular and para-prox-regular functions. These include the proximal average [2, 3] and the NCproximal average [14], which are parametrized functions designed to transform one function into another in a continuous manner. Chapter 4 contains an alternate definition of para-prox-regular functions. This result is similar to Poliquin’s and Rockafellar’s alternate definition of prox-regular functions presented in [25]. Chapter 5 presents three methods of constructing parametrized functions, and discusses when each method results in a para-prox-regular function. As a corollary, we answer an open question posed in [14]. The methods presented are: scalar multiplication of a function by a positive parameter (Lemma 5.2.3), the parametrized sum of a finite collection of functions (Theorem 5.2.5), and the finite parametrized max of continuous functions (Theorem 5.2.7). An expansion of [27, Theorem 3.2] is also seen in Chapter 5, extending the required two initial prox-regular functions to any finite number of prox-regular functions. Chapter 6 summarizes the work we have done and poses some suggestions for future research. Chapter 2 Preliminary Definitions This chapter includes the definitions of prox-regular and para-proxregular functions. We will start with some more basic terms that we need in order to make the main definitions. 2.1 Basic Notation and Definitions Throughout this thesis, we will use the following notation. dom f The domain of function f, i.e. all values of x ∈ Rn that deliver a real-valued f (x). Example: if f (x) := 2x + 1 for x ∈ R, then dom f = R. Proper function: A function f is said to be proper if f (x) > −∞ for all x ∈ Rn , and there exists at least one x ∈ dom f such that f (x) < +∞. C 0 The set of all continuous functions. C 1 The set of all continuous, differentiable functions whose gradient is continuous. C 2 The set of all twice continuously differentiable functions whose gradient and Hessian are continuous. For a vector x = (x1 , x2 , . . . , xn ) ∈ Rn , the Euclidean norm of x is denoted by |x| : |x| := x21 + x22 + · · · + x2n . For two vectors x, y ∈ Rn , the inner product of x and y is denoted by x, y : x, y := x1 y1 + x2 y2 + · · · + xn yn . A set U is called open if for any x ∈ U there exists V := {y : |y − x| < } ⊆ U. > 0 such that 6 A set U is called closed if fo any sequence (xn ) ⊆ U that tends to some x¯, x¯ ∈ U. An open ball centered at x¯ with radius δ, denoted Bδ (x), is defined as Bδ (¯ x) := {x : |x − x¯| < δ}. ¯δ (x), and is defined as A closed ball of the same type is denoted B ¯δ (x) := {x : |x − x¯| ≤ δ}. B Definition 2.1.1. (Derivative) A function f : Rn → R is said to be Fr´echet differentiable (henceforth differentiable) if there exists a function DT (x) such that lim 0= y →0 f (x + y) − f (x) − DT (x)y = 0. y The function DT (x) is called the Fr´echet derivative (henceforth derivative) of f at x, and DT (x) defines the gradient of f at x, denoted ∇f (x) : DT (x)y = y, ∇f (x) . For a set of real numbers U, the supremum of U, denoted sup U, is the least upper bound of U. That is, sup U is the smallest real number that is greater than or equal to every element of U. Similarly, the infimum of U, denoted inf U, is the greatest lower bound of U ; that is, the biggest real number that is less than or equal to every element of U. The supremum of f : Rn → R ∪ {+∞}, denoted sup f, is the smallest number that is greater than or equal to f (x) for all x ∈ dom f. In other words, it is the smallest upper bound of f. If no such number exists, the supremum is +∞. If the supremum is equal to f (x) for some x ∈ dom f, then it is the maximum value of f and can also be denoted max f. Similarly, the infimum of f, denoted inf f, is the greatest number that is less than or equal to f (x) for all x ∈ dom f. If no such number exists, the infimum is −∞. If the infimum is equal to f (x) for some x ∈ dom f, then it is the minimum value of f and can also be denoted min f. If max f exists, then the set of all maximizers of f is denoted argmax f : argmax f := {x ∈ Rn such that f (x) = max f }. If min f exists, then the set of all minimizers of f is denoted argmin f : argmin f := {x ∈ Rn such that f (x) = min f }. Definition 2.1.2. (Lower Semicontinuity) A function f : Rn → R ∪ {+∞} is said to be lower semicontinuous (lsc) at x¯ if for every > 0 there exists a neighborhood U of x¯ such that f (x) ≥ f (¯ x − ) for all x ∈ U. If f is lsc at x for all x such that |x − x¯| < δ for some δ > 0, then f is said to be locally lsc around x¯. Definition 2.1.3. (Lipschitz Continuity) A function f : D ⊆ Rm → Rn is called Lipschitz continuous if K := sup x1 ,x2 ∈D |f (x1 ) − f (x2 )| |x2 − x1 | is a real number. If so, then K is called the Lipschitz constant of f, denoted Lip f. If, rather than considering all x1 , x2 ∈ D, we find that K is real for all x1 , x2 such that |x1 − x¯| < and |x2 − x¯| < for some x¯ ∈ D and some < 0, then f is called locally Lipschitz continuous at x¯, with local Lipschitz constant K denoted Lip(f, x¯). Definition 2.1.4. (Epigraph) The epigraph of a function f : Rn → R is defined as epif := {(x, α) : α ≥ f (x)}. It is the set of all points that lie on or above the curve of f. Definition 2.1.5. (Convex Set) A set S ∈ Rn is convex if for every x1 , x2 ∈ S, the line segment αx1 + (1 − α)x2 for α ∈ [0, 1] is a subset of S. Definition 2.1.6. (Convex Function) A function f : Rn → R is a convex function if epif is a convex set. Definition 2.1.7. (Convex Hull) For any set S ∈ Rn , the convex hull of S, denoted conv S, is the smallest convex set that contains S. Definition 2.1.8. (Little-o) For two real functions f and g, we say (x) f (x) = o(g(x)) at x0 if and only if lim fg(x) = 0, with x = x0 in limit x→x0 and g(x) = 0 near x0 . 2.2 Subgradients The first definition we will see in detail is that of subgradients. For a continuously differentiable function, we know from calculus that its gradient exists everywhere on its domain, and that the gradient is calculated using the partial derivatives of the function [28, p. 47]. The gradient gives important information, such as the direction of steepest ascent of the function at any point. However, many of the functions we routinely use in the optimization field are not so well-behaved. One example is the norm function, f (x) := |x|. This function has a point of non-differentiability at the origin. For functions such as these, the definition of gradient fails us and we cannot apply the associated rules of calculus. So we would like to define a similar concept to the gradient that applies to non-smooth functions. That similar concept is what we call the subgradient. There are several types of subgradients; we first state their definitions, and follow with some examples and graphical interpretations. Definition 2.2.1. [28, Definition 8.3] Let f : Rn → R ∪ {+∞} be proper, x¯ ∈ dom f . A vector v¯ ∈ Rn is ˆ (¯ a) a regular subgradient of f at x¯, written v¯ ∈ ∂f x), if for all x ∈ dom f f (x) ≥ f (¯ x) + v¯, x − x¯ + o(|x − x¯|). (2.2.1) b) a (general) subgradient of f at x¯, written v¯ ∈ ∂f (¯ x), if there exist k ∞ k k ˆ sequences {x }k=1 ⊆ dom f and v ∈ ∂f (x ) for all k ∈ N, such that xk → x¯, f (xk ) → f (¯ x), and v k → v¯. x), if the same c) a horizon subgradient of f at x¯, written v ∈ ∂ ∞ f (¯ holds as in b) except that instead of v k → v one has λk v k → v for some sequence λk 0. d) a proximal subgradient of f at x¯ if, when the error term o(|x− x¯|) of inequality (2.2.1) is 2r |x− x¯|2 , the inequality holds for all x such that |x − x¯| < δ for some δ > 0. ˆ (¯ The set of all regular subgradients of f at x¯ is denoted ∂f x) and is called the regular subdifferential of f at x¯. Similarly, the set of subgradients is denoted ∂f (¯ x) and is called the subdifferential and the set of horizon subgradients is denoted ∂ ∞ f (¯ x) and called the horizon m n subdifferential. For a function f : R × R → R ∪ {+∞}, f (x, λ), we ¯ to denote the subdifferential with respect to x. shall also use ∂x f (·, λ) To understand what subgradients are, we begin with the case of convex functions. Recall that a continuously differentiable function f is convex on an open set U ⊆ dom f if and only if f (x) ≥ f (¯ x) + ∇f (¯ x), x − x¯ (2.2.2) for all x, x¯ ∈ U. For a convex non-differentiable function then, we would like to have a similar definition of convexity that employs subgradients at points of non-differentiability. Note that if we choose the zero function for the last term in inequality (2.2.1), we are left with f (x) ≥ f (¯ x) + v¯, x − x¯ (2.2.3) for all x, x¯ ∈ U. Any vector v¯ that satisfies inequality (2.2.3) is referred to as a convex subgradient. In the case of a continuously differentiable function f , inequality (2.2.3) reverts to inequality (2.2.2), since the only subgradient of f (¯ x) is ∇f (¯ x) [28, Proposition 8.5], as the next paragraph illustrates. Graphically speaking, a differentiable function f is convex if and only if the function lies entirely above the linear function f (¯ x) + ∇f (¯ x), x − x¯ of inequality (2.2.2). Here the gradient ∇f (¯ x) describes the slopes of said linear functions at the point (¯ x, f (¯ x)). For example, the differentiable function f : R → R, f (x) = x2 is convex, since any tangent line to the function at any point lies completely below the graph of the function (see Figure 2.1). Now we can see why the gradient is the only subgradient of a continuously differentiable function, because a tangent line that contains f (¯ x) and has any slope other than ∇f (x) would not lie entirely below the graph of f [28, Exercise 8.8]. Figure 2.1: The function x2 and some tangent lines. For non-smooth convex functions, the convex subgradient has the same graphical interpretation. That is, a convex subgradient of f at x¯ is any vector v¯ such that the function f (¯ x) + v¯, x − x¯ lies entirely below the graph of the function f. One of the differences here is that a subgradient at a point may not be unique. There is only one gradient at each point of a C 1 function, but for a non-C 1 function there might be multiple subgradients at the same point. If we look at the graph of the norm function (Figure 2.2), which is convex and non-smooth, then we see there are many lines that pass through the origin and lie below the graph of the function. Each of those lines is defined by a distinct subgradient. Figure 2.2: The norm function has multiple subgradients at the origin. Now we move on to the non-convex case. On our way to visualizing (non-convex) subgradients, we define the epigraph, projection, and normal cone. Definition 2.2.2. The epigraph of a function f : Rn → R ∪ {+∞}, denoted epi f , is the set of all points that lie on or above the graph of f. That is, epi f := {(x, α) ∈ Rn × R : α ≥ f (x)}. Figure 2.3: The epigraph of sin x. Definition 2.2.3. Let S = epi f. The projection of a point x onto S, denoted PS (x), is the set of all points in S that are closest to x. That is, PS (x) = {¯ x ∈ S : |x − x¯| ≤ |x − y| for all y ∈ S}. Figure 2.4: The projection of (0, −1) onto epi−|x| . Definition 2.2.4. For a set S ∈ Rn , the proximal normal cone to S at a point y ∈ S is the cone NSP (y) = {λ(x − y) : λ ≥ 0, y ∈ PS (x)}. In other words, the proximal normal cone at y consists of all vectors, called proximal normals, that point along the directions of projection back onto y. Figure 2.5: The proximal normal cone to epi|x| at the origin. With this in mind, we may now give a graphical interpretation of proximal subgradients. Proposition 2.2.5. [28, Proposition 8.46] A vector v is a proximal subgradient to f at x¯ if and only if (v, −1) is a proximal normal to epi f at (¯ x, f (¯ x)). For example, referring to Figure 2.5, we can see that the vector (x, −1) lies in the red area if x ∈ [−1, 1], and it does not otherwise. The interval [−1, 1] then, is the set of proximal subgradients to |x| at x = 0. As we saw in Definition 2.2.1 b), if there exist sequences {xk } and {v k } such that f (xk ) → f (¯ x), each v k is a proximal subgradient of the corresponding f (xk ), and v k → v¯, then v¯ is a subgradient of f at x¯. Finally, the horizon subgradients describe what happens at the outer edges of the normal cone. Consider the example of 1 f : R → R ∪ {+∞}, f (x) = x 3 . 1 Figure 2.6: The graph of f (x) = x 3 , and its normal cone at x = 0 in red. In this case the proximal subdifferential is empty, since there does P not exist any x ∈ R such that (x, −1) ∈ Nepi f (0, 0). However, the horizon subdifferential is not empty. To see this, let v ∈ R+ = [0, +∞). Consider any sequence (xk ) ⊆ R such that xk → 0 and xk = 0. Then the corresponding sequence of proximal subgradients is (vk ) such that −2 2 vk = ∇f (xk ) = 13 xk 3 . By using the sequence of scalars λk = 3vxk3 , we 2 see that λk ≥ 0 for all k, and that λk −2 3vxk 13 xk 3 2 3 0 since v is fixed and xk3 0. Then we have that λk vk = = v for all k, so that λk vk → v. Hence, R+ ⊆ ∂ ∞ f (0). Now considering any v < 0, since vk ≥ 0 for all k, and since we must have λk ≥ 0 for all k, then there do not exist sequences (vk ) and (λk ) such that λk vk → v. Therefore, ∂ ∞ f (0) = R+ . 2.3 Prox-Regularity Now we are ready for the definition of prox-regular functions. Definition 2.3.1. [28, Definition 13.27] A proper function f : Rn → R ∪ {+∞} that is finite at x¯ is prox-regular at x¯ for v¯ with parameters > 0, r ≥ 0, where v¯ ∈ ∂f (¯ x), if and only if f is locally lsc at x¯ and r f (x ) ≥ f (x) + v, x − x − |x − x|2 2 (2.3.1) whenever |x − x¯| < , |x − x¯| < , |f (x) − f (¯ x)| < , v ∈ ∂f (x) and |v − v¯| < . If this holds true for all v¯ ∈ ∂f (¯ x), then f is said to be prox-regular at x¯ (without reference to any subgradient). If f is proxregular for all x¯ ∈ dom f , then it is said to be a prox-regular function (without reference to any point). Prox-regularity implies for all x near enough to x¯, v near enough to v¯, and f (x) near enough to f (¯ x), that v is a proximal subgradient of f at x, and that the radius or curvature 1r of the quadratic term − 2r |x − x¯|2 is constant [28, Definition 8.45]. Let us consider a graphical interpretation of prox-regular functions. This is not a formal definition, but a visual aid so that the reader may better visualize what is happening. Graphically speaking, due to the last term on the right-hand side of (2.3.1), we can think of proxregular functions as those functions which are locally bounded below by a tangent concave quadratic function with radius of curvature − 1r . Figure 2.7: Left: a prox-regular function. Right: a function that is not prox-regular at x = 0. In Figure 2.7, we have on the left a prox-regular function. At each point there exists a parabola that bounds the function below in a neighborhood of that point; three such parabolas are illustrated in red for three different points on the curve. Notice that for different points (¯ x, f (¯ x)), the values of and r may be different. Prox-regularity is a local property, and as such the -neighborhoods and the radii of curvature of the bounding parabolas are not necessarily the same for every point. Instead it only holds locally; for every point x¯ at which f is prox-regular, there exist > 0 and a tangent bounding parabola that is also tangent and bounding at every point in an -neighborhood of x¯. The function on the right of Figure 2.7, however, is not prox-regular at the origin. There is an interior sharp point there, hence there does not exist a parabola tangent to the curve at that point that lies entirely below the function. An alternate representation of prox-regularity, which will be used in Chapter 5, is described in the following theorem. Theorem 2.3.2. [25, Theorem 3.2] A proper function f : Rn → R ∪ {+∞} is prox-regular at x¯ for v¯ with parameters > 0, r ≥ 0 if and only if f is locally lsc at x¯, v¯ ∈ ∂f (¯ x) and v − v, x − x ≥ −r|x − x|2 whenever |x − x¯| < , |x − x¯| < , |f (x ) − f (¯ x)| < , |f (x) − f (¯ x)| < , v ∈ ∂f (x ), v ∈ ∂f (x), |v − v¯| < , and |v − v¯| < . Theorem 2.3.2 gives us an expression that defines prox-regularity in terms of local pre-monotonicity of the subdifferential operator. This form is the inspiration for our main result in Chapter 4. Para-prox-regularity is a natural extension of prox-regularity that allows for a parametrized function. The definition is the following. Definition 2.3.3. [19, Definition 2.1] The proper function f : Rn × Rm → R ∪ {+∞} is para-prox-regular in x at x¯ for v¯ with compatible ¯ (also said to be para-prox-regular in x at parametrization by λ at λ ¯ for v¯), with parameters > 0, r ≥ 0, if and only if f is locally (¯ x, λ) ¯ and lsc at x¯, v¯ ∈ ∂x f (¯ x, λ) r f (x , λ) ≥ f (x, λ) + v, x − x − |x − x|2 (2.3.2) 2 ¯ < , v ∈ ∂x f (x, λ), whenever |x − x¯| < , |x − x¯| < , |f (x, λ) − f (¯ x, λ)| ¯ < . The function f is continuously para-prox|v − v¯| < , and |λ − λ| ¯ regular in x at (¯ x, λ) for v¯ if, in addition, f (x, λ) is continuous as a ¯ It is para-prox-regular in x function of (x, v, λ) ∈ gph ∂x f at (¯ x, v¯, λ). ¯ ¯ at (¯ x, λ) (with no mention of v¯) if it is para-prox-regular in x at (¯ x, λ) ¯ and it is a para-prox-regular function in x (with for all v¯ ∈ ∂x f (¯ x, λ), ¯ if it is para-prox-regular in x at (¯ ¯ for all no mention of x¯ or λ) x, λ) ¯ ∈ dom f . (¯ x, λ) A para-prox-regular function f of (x, λ) is a function that is prox¯ and continues to be so at f (x, λ) regular as a function of x at (¯ x, λ), ¯ respectively. for all x and for all λ in -neighborhoods of x¯ and λ, ¯ That is, if we do not wander too far away from (¯ x, λ) in x or in λ, the prox-regularity property of f and its corresponding prox-regularity parameters are preserved. Para-prox-regular functions were first presented by Levy, Poliquin and Rockafellar in [19], where they were used to analyze the stability of locally optimal solutions to minimization problems involving parametrized functions. They considered problems of the type ¯ x ∈ Rn minimize f (x, λ), and compared the solutions with those obtained by perturbing the ¯ to a nearby λ. If a small change in λ function slightly by changing λ results in only a small change in the solutions, the function is considered stable. Function stability is of vital importance in many real-world situations; consider the following example as an illustration. Imagine it is your job to design a sport utility vehicle for retail sale. Such vehicles have a higher tendency to roll over compared to other vehicles, due to their higher center of gravity. So one of the concerns in designing an SUV is, given a constant highway speed of 80 kph, the vehicle must be able to navigate curves in the road of a certain radius, the curves that one typically finds on the highways. Let us suppose that you are successful in designing your vehicle to handle such curves at that speed. However, how many drivers drive at or below the posted speed limit? Since we know the rules of the road are not followed rigorously by everyone, it should be of great interest for you to know what will happen if a driver does not respect the 80 kph road sign and tries to take a curve at a higher rate of speed. Will the vehicle roll over at 81, 82, 85 kph? Can one drive as fast as 90 without danger? We would hope that a small increase in speed would result in only a small increase in the level of risk to the people in the vehicle. In attemping to answer these types of questions, one may think of the problem as an analysis of the stability of the function ‘level of risk’ ¯ = 80 kph. This examples in terms of the parameter ‘speed,’ close to λ serves to show that para-prox-regular functions, since they are used in such analyses, have a very important (sometimes even life-or-death important) role to play in the world. In the next chapter, we shall see some examples of specific functions that are para-prox-regular. Chapter 3 Basic Examples, Proximal Average 3.1 Basic Examples We begin with a few basic examples of para-prox-regular functions. The first example is the result of multiplying the function |x| by the parameter λ. Example 3.1.1. For x ∈ Rn , λ ∈ R, define f (x, λ) := λ|x|. ¯ for any λ ¯ > 0, and it is not paraThen f is para-prox-regular at (0, λ) ¯ ¯ ¯ prox-regular at (0, λ) at any λ ≤ 0, for any v¯ ∈ ∂x f (¯ x, λ). ¯ > 0. Then Proof: Let x¯ = 0 and λ ¯ ≥ f (x, λ) ¯ + v, x − x f (x , λ) ¯ as f is convex in x and λ ¯ > 0. for any x , x and for any v ∈ ∂x f (x, λ), ¯ with any λ > 0, for the same reasons. The same is true if we replace λ ¯ ¯ < implies λ > 0) and r = 0 Thus, for any ∈ (0, λ) (so that |λ − λ| we have r f (x , λ) ≥ f (x, λ) + v, x − x − |x − x|2 2 ¯ < , v ∈ ∂x f (x, λ), for |x − x¯| < , |x − x¯| < , |f (x, λ) − f (¯ x, λ)| ¯ ¯ |v − v¯| < , and |λ − λ| < . That is, f is para-prox-regular at (¯ x, λ). ¯ < 0, f ceases to be a prox-regular function at x¯ = 0, since For λ ¯ ∂x f (¯ x, λ) = ∅. So the conditions of para-prox-regularity immediately ¯ = 0 para-prox-regularity fails as well, since we require f to fail. For λ ¯ but any neighborhood of λ ¯=0 be prox-regular in a neighborhood of λ, necessarily contains some λ < 0. Notice that for x¯ = 0, f is para-prox-regular no matter the value of 18 ¯ This is because the function is continuously differentiable there, so λ. all subgradients are gradients, and inequality (2.3.2) becomes trivially true. This example hints at the next result: if a function is convex in terms of x no matter the value of λ, then it is para-prox-regular. This result will be used in Section 3.2 to prove that the proximal average is para-prox-regular. ¯ ∈ dom f, where f : Rn × Rm → R ∪ {+∞} is Lemma 3.1.2. Let (¯ x, λ) ¯ < , where > 0. convex as a function of x for all λ such that |λ − λ| ¯ Then f is para-prox-regular in x at (¯ x, λ), with parameters r = 0 and . ¯ is convex in terms of x, for any x ∈ dom f (·, λ) ¯ Proof: Since f (x, λ) ¯ and any v ∈ ∂x f (x, λ) we have that ¯ ≥ f (x, λ) ¯ + v, x − x f (x , λ) (3.1.1) ¯ < , since f for all x ∈ dom f (·, λ). For every λ such that |λ − λ| remains convex, for any x ∈ dom f (·, λ) and any vλ ∈ ∂fx (x, λ) we have that f (x , λ) ≥ f (x, λ) + vλ , x − x (3.1.2) for all x ∈ dom f (·, λ). In particular, inequality (3.1.2) is true for all ¯ < x and x such that |x − x¯| < , |x − x¯| < and |f (x, λ) − f (¯ x, λ)| for any x¯ fixed, and for all vλ ∈ ∂x f (x, λ) such that |vλ − v| < . This demonstrates para-prox-regularity with parameters r = 0 and . The next two examples demonstrate that we can take a prox-regular function f and use the parameter λ in the argument of f to create para¯ values. Example 3.1.3 examines prox-regular functions at particular λ a linear shift of the argument by λ and Example 3.1.5 examines a scalar multiple of the argument by λ. Example 3.1.3. Let f˜ : Rn → R ∪ {+∞} be prox-regular at x¯ for v¯ ∈ ∂ f˜(¯ x), and let λ ∈ Rn . Define f (x, λ) := f˜(x − λ). Then f is para-prox-regular at (¯ x, 0). Proof: By assumption, there exist ˜ > 0 and r˜ ≥ 0 such that r˜ f˜(˜ x ) ≥ f˜(˜ x) + v, x˜ − x˜ − |˜ x − x˜|2 2 (3.1.3) whenever x˜ = x˜, |˜ x − x¯| < ˜, |˜ x − x¯| < ˜, |f˜(˜ x) − f˜(¯ x)| < ˜, v ∈ ∂ f˜(˜ x), ¯ |v − v¯| < ˜. We need to show that when λ = 0, there exist > 0 and r ≥ 0 such that r f (x , λ) ≥ f (x, λ) + v, x − x − |x − x|2 2 (3.1.4) ¯ < , |f (x, λ)−f (¯ ¯ < , when x = x, |x − x¯| < , |x− x¯| < , |λ− λ| x, λ)| ˜ v ∈ ∂x f (x, λ) and |v − v¯| < . Let = 2 , r = r˜. Since < ˜, inequality (3.1.3) holds when we replace r˜ with r and ˜ with everywhere. Select ¯ < , and set x˜ = x − λ and x˜ = x − λ. Then any λ such that |λ − λ| |˜ x − x¯| = |x − x¯ − λ| ≤ |x − x¯| + |λ| < + = ˜, so we have that |˜ x − x¯| < ˜. Similarly, |˜ x − x¯| < ˜. If in addition we restrict our function values using instead of ˜, then we have (remem¯ = 0) that bering λ ¯ < ⇔ |f (x, λ) − f (¯ ¯ < . |f˜(˜ x) − f˜(¯ x)| < ⇔ |f˜(x − λ) − f˜(¯ x − λ)| x, λ)| Finally we observe that v ∈ ∂ f˜(˜ x) if and only if v ∈ ∂x f (x, λ), so we may rewrite inequality (3.1.3) as r f˜(x − λ) ≥ f˜(x − λ) + v, (x − λ) − (x − λ) − |(x − λ) − (x − λ)|2 2 ¯ < , |f (x, λ)−f (¯ ¯ < , when x = x, |x − x¯| < , |x− x¯| < , |λ− λ| x, λ)| v ∈ ∂x f (x, λ), |v − v¯| < . That is, inequality (3.1.4) holds under the required conditions. For the next example, we first need a lemma that describes the subgradient chain rule. Lemma 3.1.4. [28, Example 10.7] Suppose f (x) = g(F (x)) for a proper, lsc function g : Rm → R ∪ {+∞} and a smooth mapping F : Rn → Rm , and let x¯ be a point where f is finite and the gradient ∇F (¯ x) has rank m. Then ∂f (¯ x) = ∇F (¯ x)∗ ∂g(F (¯ x)). Example 3.1.5. Let f˜ : Rn → R ∪ {+∞} be prox-regular at x¯ for v¯, and let λ ∈ R. Define f (x, λ) := f˜(λx). Then f is para-prox-regular at (¯ x, 1) for v¯. Proof: By assumption, there exist prox-regularity parameters ˜ > 0 and r˜ ≥ 0 such that r˜ f˜(˜ x ) ≥ f˜(˜ x) + v, x˜ − x˜ − |˜ x − x˜|2 2 (3.1.5) for |˜ x − x¯| < ˜, |˜ x − x¯| < ˜, |f˜(˜ x)− f˜(¯ x)| < ˜, v ∈ ∂ f˜(˜ x), and |v − v¯| < ˜. Substituting x˜ = λx and x˜ = λx, inequality (3.1.5) becomes r˜ f˜(λx ) ≥ f˜(λx) + v, λx − λx − |λx − λx|2 2 (3.1.6) for |λx − x¯| < ˜, |λx − x¯| < ˜, |f˜(λx) − f˜(¯ x)| < ˜, v ∈ ∂ f˜(λx), and |v − v¯| < ˜. The subgradient chain rule (Lemma 3.1.4 above) gives us λv ∈ ∂x f (x, λ), so it is convenient to rewrite inequality (3.1.6) as f (x , λ) ≥ f (x, λ) + λv, x − x − r˜λ2 |x − x|2 2 (3.1.7) ¯ < ˜, λv ∈ ∂x f (x, λ), for |λx − x¯| < ˜, |λx − x¯| < ˜, |f (x, λ) − f (¯ x, λ)| and |v − v¯| < ˜. If we restrict our neighborhood of x¯ further, say |x − x¯| < 2˜ and |x − x¯| < 2˜ , then there exists ˆ > 0 such that for ¯ < ˆ and if |x − x¯| < ˜ and |x − x¯| < ˜ , then all λ with |λ − λ| 2 2 we necessarily have that |λx − x¯| < ˜ and |λx − x¯| < ˜. Hence by taking = min{ 2˜ , ˆ}, inequality (3.1.7) remains true for |x − x¯| < , ¯ < , λv ∈ ∂x f (x, λ), |v − v¯| < , and |x − x¯| < , |f (x, λ) − f (¯ x, λ)| ¯ < . Since |λ − λ| ¯ < , we may take r = max{˜ ¯ + |2 }. |λ − λ| r, r˜|λ Renaming λv as v˜ we have that r f (x , λ) ≥ f (x, λ) + v˜, x − x − |x − x|2 2 (3.1.8) ¯ < , |f (x, λ) − f (¯ ¯ < , for |x − x¯| < , |x − x¯| < , |λ − λ| x, λ)| v˜ ∈ ∂x f (x, λ), and |˜ v − v¯| < . 3.2 Proximal Average The next examples are those of the proximal average and NCproximal average, for which we will need some background information. Recall that for a proper function f , the Fenchel conjugate f ∗ is defined as f ∗ (y) := sup{ x, y − f (y)}. y For two convex functions f0 and f1 , using the Fenchel conjugate, the proximal average is defined as follows. Definition 3.2.1. [2, Definition 4.1] The proximal average P Af0 ,f1 of two proper convex functions f0 and f1 is defined as 1 1 1 P Af0 ,f1 (x, λ) := ((1 − λ)(f0 (x) + |x|2 )∗ + λ(f1 (x) + |x|2 )∗ )∗ − |x|2 2 2 2 for λ ∈ [0, 1]. The proximal average has been shown to be an effective method of transforming one convex function into another in a continuous manner [2]. It has also been shown to be a convex function [3, Theorem 6.1], and as such it is prox-regular. In fact, it is para-prox-regular, as the following proposition demonstrates. Proposition 3.2.2. The proximal average P Af0 ,f1 of two proper con¯ ∈ (0, 1). vex functions f0 , f1 is para-prox-regular for all λ Proof: Bauschke, Matouˇskov´a and Reich [3, Theorem 6.1] proved that this function is convex in x for all λ ∈ (0, 1), therefore it satisfies the conditions of Lemma 3.1.2. In [14], an alternate representation of the proximal average function was given using Moreau envelopes. The convexity of the proximal average is further studied in [17]. Recall that for a function f the Moreau envelope with parameter r ≥ 0 is defined as r er f (x) := inf {f (y) + |y − x|2 }, y 2 and its associated proximal point mapping is defined as r Pr f (x) := argmin{f (y) + |y − x|2 }. 2 y If f is proper and convex, then er f is proper for all r > 0. Properties of the Moreau envelope give rise to the equivalent definition of proximal average [14, equation (4)], P Af0 ,f1 (x, λ) = −e1 (−(1 − λ)e1 f0 − λe1 f1 )(x). However, this formulation is only useful if f0 and f1 are convex functions. This is because the double envelope of a function is the original function only when it is a convex function. For two not necessarily convex functions f0 and f1 , the NC-proximal average P Ar is similarly defined (see Definition 3.2.4 below). We need one more definition first, that of prox-boundedness. Definition 3.2.3. [28, Definition 1.23] A function f is prox-bounded if there exists r > 0 and a point x¯ such that er f (¯ x) > −∞. The infimum of the set of all such r is called the threshold of prox-boundedness. Definition 3.2.4. [14, equation (5)] Let f0 and f1 be proper, proxbounded functions, and let r > 0 be greater than the threshold of proxboundedness for both f0 and f1 . The NC-proximal average of f0 and f1 is defined as P Ar (x, λ) := −er+λ(1−λ) (−(1 − λ)er f0 − λer f1 )(x). Showing that the NC-proximal average is para-prox-regular requires a technical condition from [14, Theorem 4.6], specifically that the Lipschitz constant of λPr f0 + (1 − λ)Pr f1 − Id is bounded above by 1. This condition ensures that the NC-proximal average is well-behaved. For the proof of the following example, we recall the definition of a lower-C 2 function. Definition 3.2.5. [28, Definition 10.29](Lower-C 2 ) A function f : Rn → R is said to be lower-C 2 on Rn if on some neighborhood V of each x¯ ∈ Rn there is a representation f (x) = max ft (x) t∈T in which the functions ft are of class C 2 on V and the index set T is a compact space such that ft (x) and ∇ft (x) depend continuously and jointly on (t, x) ∈ T × V. Proposition 3.2.6. Any lower-C 2 function is locally Lipschitz continuous. Proof:[28, Theorem 10.31 Proof] For each t ∈ T, the function ft has Lipschitz constant Lip ft (x) = |∇ft (x)| by [28, Theorem 9.7]. Hence, Lip f (x) ≤ sup |∇ft (x)| by [28, Proposition 9.10]. This supremum is t∈T finite, since T is compact and ∇f is continuous in t. Therefore, f is Lipschitz continuous on V. Example 3.2.7. Consider two proper, prox-bounded, prox-regular functions f0 and f1 . If the Lipschitz constant of λPr f0 + (1 − λ)Pr f1 − I is bounded above by 1, then for r sufficiently large, the NC-proximal ¯ ∈ (0, 1). average is para-prox-regular for all λ ¯ ∈ (0, 1). Let v¯ ∈ ∂x P Ar (¯ ¯ Enlarging r if necProof: Fix λ x, λ). essary, we know that P Ar (x, λ) is locally Lipschitz continuous in λ [14, Theorem 4.6], and that it is a lower-C 2 function in x [14, Proposition 2.5]. Since all lower-C 2 functions are prox-regular [25], we have that P Ar (x, λ) is prox-regular as a function of x at x¯ for v¯, say with parameters 1 > 0 and ρ ≥ 0. Therefore ¯ ≥ P Ar (x, λ) ¯ + v, x − x − ρ |x − x|2 P Ar (x , λ) 2 (3.2.1) ¯ − P Ar (¯ ¯ < 1, v ∈ when |x − x¯| < 1 , |x − x¯| < 1 , |P Ar (x, λ) x, λ)| ¯ ∂x P Ar (x, λ), and |v − v¯| < 1 . Notice that if x = x, then inequality (3.2.1) is trivially true, and if x = x, then we can replace ρ with ρ + δ for δ > 0 so that we may change the inequality to a strict one. We will therefore (writing ρ to mean the new ρ + δ so as not to change notation) concern ourselves with the non-trivial case x = x and use ¯ > P Ar (x, λ) ¯ + v, x − x − ρ |x − x|2 P Ar (x , λ) 2 (3.2.2) instead of inequality (3.2.1). We have pointed out above that it is known that P Ar is continuous in λ, and therefore there exists 2 > 0 ¯ is replaced by any λ such that inequality (3.2.2) remains true when λ ¯ ¯ < 2 such that |λ− λ| < 2 . We can reduce 2 if necessary, so that |λ− λ| implies λ ∈ (0, 1). Taking 3 = min{ 1 , 2 }, we have ρ P Ar (x , λ) > P Ar (x, λ) + v, x − x − |x − x|2 2 (3.2.3) ¯ < 3, v ∈ when x = x , |x −¯ x| < 3 , |x−¯ x| < 3 , |P Ar (x, λ)−P Ar (¯ x, λ)| ¯ |v − v¯| < 3 , and |λ − λ| ¯ < 3 . If we use 3 -neighborhoods ∂x P Ar (x, λ), 2 instead of 3 -neighborhoods, then inequality (3.2.3) certainly remains true. Since P Ar is a C 1 function with respect to x, we know that ¯ = {∇x P Ar (x, λ)}, ¯ ∂x P Ar (x, λ) and by [14, Theorem 4.6], we know ¯ that ∇x P Ar (x, λ) changes in a continuous manner with respect to λ. Hence there exists 4 > 0 such that inequality (3.2.3) remains valid ¯ is replaced by vˆ ∈ ∂x P Ar (x, λ) for |ˆ when v ∈ ∂x P Ar (x, λ) v − v| < 4 ¯ and |λ − λ| < 4 . Shrinking 4 if necessary, we can ensure that 4 < 23 . v − v¯| < Thus we have |ˆ v − v| < 23 and |v − v¯| < 23 , which implies that |ˆ 3 3 + = . Therefore, choosing = min{ , } we have that 3 3 4 2 2 ρ P Ar (x , λ) > P Ar (x, λ) + vˆ, x − x − |x − x|2 2 (3.2.4) ¯ < , vˆ ∈ when |x − x¯| < , |x − x¯| < , |P Ar (x, λ) − P Ar (¯ x, λ)| ¯ < . ∂x P Ar (x, λ), |ˆ v − v¯| < , and |λ − λ| Chapter 4 Relationship to Monotonicity In this chapter we give a characterization of para-prox-regular functions in terms of monotonicity. As we shall see, it is possible to express the para-prox-regularity of a function f in terms of monotonicity of T + rI, where T is an f -attentive localization of ∂f as defined in Definition 4.1.1. This was done in [25] for prox-regular functions, and the same procedure works here in the para-prox-regular case. We shall present Theorem 4.2.1, which is the para-prox-regular analog of [25, Theorem 3.2], and Corollary 4.2.2, which is the characterization we seek. 4.1 Localization of the Subdifferential We begin with the definition of an f -attentive localization of the ¯ v¯) for a subdifferential. An f -attentive localization of ∂x f around (¯ x, λ, fixed λ is a set-valued mapping Tλ : Rn ⇒ Rn whose graph in Rn × Rn is the intersection of gph ∂x f with an f -attentive neighborhood of x¯ and an ordinary neighborhood of v¯. (This contrasts with an ordinary localization, in which the f -attentive neighborhood of x¯ is relaxed to an ordinary neighborhood.) The following definition is adapted from [25, Definition 3.1]. Definition 4.1.1. For an > 0 and a fixed λ, the f -attentive ¯ v¯) is localization of ∂fx around (¯ x, λ, Tλ (x) = {v ∈ ∂x f (x, λ) : |v − v¯| < }, ∅, ¯ < |x − x ¯| < , |f (x, λ) − f (¯ x, λ)| otherwise. (4.1.1) In order to prove Theorem 4.2.1, we need the following well-known definition, propositions and lemma. Definition 4.1.2. Let f = f1 + · · · + fm for proper, lsc functions fi : Rn → R ∪ {+∞}, and let x ¯ ∈ dom f . We say that f satisfies the non-degeneracy 26 condition (NDC) at x ¯ if the only combination of vectors vi ∈ ∂ ∞ fi (¯ x) with v1 + · · · + vm = 0 is v1 = · · · = vm = 0. Non-degeneracy combined with prox-regularity will allow us to conclude that the subdifferential of a sum of functions is equal to the sum of subdifferentials. Proposition 4.1.3. [28, Theorem 9.13] If f : Rn → R ∪ {+∞} is C 1 at x ¯, ∞ then ∂ f (¯ x) = {0}. Proof: We know that ∇f (¯ x) is the only subgradient of f at x ¯, and since f is continuously differentiable, the subgradients nearby x ¯ are gradients as well. So for a sequence (xk ) that tends to x ¯, the tail of the corresponding sequence (vk ) as in Definition 2.2.1 c) will be (∇f (xk )) which tends to ∇f (¯ x). Then for any sequence λk that tends to zero, λk v k necessarily tends to zero. Therefore, zero is the only horizon subgradient. Definition 4.1.2, coupled with Proposition 4.1.3, enables us to prove the following lemma. The lemma takes a para-prox-regular function at x ¯ for v¯ and forms a new function that is para-prox-regular at 0 for 0, in effect shifting the original function. This is an extension of the prox-regular version found in [28, Exercise 13.35], and ir will allow us to make some assumptions in Theorem 4.2.1 without loss of generality in order to simplify the proof. ¯ for v¯. Define Lemma 4.1.4. Let f be para-prox-regular at (¯ x, λ) f˜(x, λ) := f (x + x ¯, λ) − v¯, x − x ¯. ¯ for 0. Then f˜ is para-prox-regular at (0, λ) > 0 and r ≥ 0 such that r f (x , λ) ≥ f (x, λ) + v, x − x − |x − x ¯|2 (4.1.2) 2 ¯ < , v ∈ ∂x f (x, λ), whenever x = x, |x −¯ x| < , |x−¯ x| < , |f (x, λ)−f (¯ x, λ)| ¯ |v−¯ v | < , and|λ− λ| < . For ease of discussion let f1 (x, λ) = f (x+¯ x, λ) and Proof: By assumption, there exist f2 (x, λ) = − v¯, x−¯ x . Notice that v ∈ ∂x f (x, λ) implies v−¯ v ∈ ∂x f (x, λ)−¯ v. ∞ Since f2 is a continuously differentiable function, ∂x f2 (x, λ) = {0} for all (x, λ). Hence, the only solution to v1 +v2 = 0 for vi ∈ ∂x∞ fi (¯ x, λ) is v1 = v2 = 0, and NDC holds. Since prox-regularity implies local regularity, Proposition ?? gives us that ∂x f (x, λ) − v¯ = ∂x [f (x + x ¯, λ) − v¯, x − x ¯ ] = ∂x f˜(x, λ). ˜ Therefore v − v¯ ∈ ∂x f (x, λ). Translating inequality (4.1.2) by x ¯, we may rewrite it as r f (x + x ¯, λ) ≥ f (x+ x ¯, λ)+ v, (x − x ¯)−(x− x ¯) − |(x − x ¯)−(x− x ¯)|2 (4.1.3) 2 ¯ < , now with the conditions x = x, |x | < , |x| < , |f˜(x, λ) − f˜(0, λ)| ¯ < . The x v − v¯ ∈ ∂x f˜(x, λ), |v − v¯| < , and |λ − λ| ¯ term is cancelled in two places, leaving r f (x + x ¯, λ) ≥ f (x + x ¯, λ) + v, x − x − |x − x|2 . (4.1.4) 2 Subtracting v¯, x − x ¯ and v¯, x − x ¯ from both sides of inequality (4.1.4), we get f (x + x ¯, λ) − v¯, x − x ¯ − v¯, x − x ¯ ≥ r f (x + x ¯, λ) − v¯, x − x ¯ − v¯, x − x ¯ + v, x − x − |x − x|2 . 2 Now the first two terms on each side of the above inequality are how we defined f˜, so we can rewrite r f˜(x , λ) − v¯, x − x ¯ ≥ f˜(x, λ) − v¯, x − x ¯ + v, x − x − |x − x|2 2 r ⇒ f˜(x , λ) ≥ f˜(x, λ) + v, x − x − v¯, x − x ¯−x+x ¯ − |x − x|2 2 r ˜ ˜ ⇒ f (x , λ) ≥ f (x, λ) + v, x − x − v¯, x − x − |x − x|2 2 r 2 ˜ ˜ ⇒ f (x , λ) ≥ f (x, λ) + v − v¯, x − x − |x − x| . 2 ¯ for 0. Therefore f˜ is para-prox-regular at (0, λ) 4.2 Characterization Now we are ready to present the main result of this chapter: the relationship between para-prox-regularity and monotonicity. Theorem 4.2.1. Let f : Rn × Rm → R ∪ {+∞} be a proper function that ¯ Then the following are equivalent. is locally lsc near (¯ x, λ). ¯ for v¯. a) The function f is para-prox-regular at (¯ x, λ) ¯ b) The vector v¯ is a proximal subgradient of f with respect to x at (¯ x, λ), and there exist > 0 and r ≥ 0 such that the f -attentive -localization ¯ v¯) has Tλ +rIx monotone for all λ such that |λ− λ| ¯ < Tλ of ∂x f at (¯ x, λ, . That is, there is a constant r such that v1 − v0 , x1 − x0 ≥ −r|x1 − x0 |2 (4.2.1) ¯ < . Here Ix is the identity when vi ∈ Tλ (xi ), i ∈ {0, 1}, and |λ − λ| mapping with respect to x. c) There exist > 0, r ≥ 0 such that r ¯ ≥ f (¯ ¯ + v¯, x − x f (x, λ) x, λ) ¯ − |x − x ¯|2 2 (4.2.2) when |x − x ¯| < , and the mapping (∂x f + rIx )−1 has the following single-valuedness property near z¯ = v¯ + r¯ x: if |z − z¯| < , then for i ∈ {0, 1} one has (xi , λ) ∈ (∂x f + rIx )−1 (z) ⇒ x0 = x1 (4.2.3) ¯ − f (¯ ¯ < , and |λ − λ| ¯ < . when |xi − x ¯| < , |f (xi , λ) x, λ)| Proof: ¯ for v¯, so there exist a)⇒b): We have that f is para-prox-regular at (¯ x, λ) > 0 and r ≥ 0 such that f (x1 , λ) ≥ f (x0 , λ) + v0 , x1 − x0 − 2r |x1 − x0 |2 f (x0 , λ) ≥ f (x1 , λ) + v1 , x0 − x1 − 2r |x0 − x1 |2 (4.2.4) ¯ < , vi ∈ ∂x f (xi , λ), whenever i ∈ {0, 1}, |xi − x ¯| < , |f (xi , λ) − f (¯ x, λ)| ¯ |vi − v¯| < , and |λ − λ| < . Adding the two inequalities in (4.2.4) together, we get f (x1 , λ) + f (x0 , λ) ≥ f (x0 , λ) + f (x1 , λ) + v0 − v1 , x1 − x0 − r|x1 − x0 |2 , which simplifies to inequality (4.2.1). Finally, since f is para-prox-regular at ¯ for v¯, it is prox-regular in x at x (¯ x, λ) ¯ as well, giving us that v¯ is a proximal ¯ subgradient of f with respect to x at (¯ x, λ). b)⇒c): For this section of the proof and the next, without loss of generality ¯ = 0, and v¯ = 0 (by Lemma 4.1.4). Let ¯ > 0 we assume that x ¯ = 0, f (0, λ) and r¯ ≥ 0 be the parameters assumed by condition b). Since v¯ is a proximal ¯ we have prox-regularity of f subgradient of f with respect to x at (¯ x, λ), ¯ with respect to x at (¯ x, λ) and we can use Definition 2.3.1 to write r ¯ ≥ f (¯ ¯ + v¯, x − x f (x, λ) x, λ) ¯ − |x − x ¯|2 2 when |x − x ¯| < for some (4.2.5) ∈ (0, ¯) and some r ∈ (¯ r, +∞). ¯ < . By decreasing To see the second statement in c), let |λ − λ| ¯ and increasing r if necessary, we can arrange that < 2r with r ≥ 1, ¯ hence < 2 . Suppose that for z such that |z − (¯ v + r¯ x)| < we have (xi , λ) ∈ (∂x f + rIx )−1 (z), i ∈ {0, 1}. By our assumptions that x ¯ = 0, ¯ = 0, and v¯ = 0, this simplifies to |z| < . For vi = z − rxi we have f (0, λ) vi ∈ ∂x f (xi , λ), and by the triangle inequality |vi | ≤ |z| + r|xi | < 2¯ + 2¯ = ¯. Also |xi | < ¯ and |f (xi , λ)| < ¯, thus, vi ∈ Tλ (xi ). So we have that −¯ r|x1 −x0 |2 ≤ v1 −v0 , x1 −x0 = (z−rx1 )−(z−rx0 ), x1 −x0 = −r|x1 −x0 |2 . Since r > r¯, we conclude that x1 = x0 . c)⇒a): Remember that without loss of generality we assume here that x ¯ = 0, ¯ = 0, and v¯ = 0 (by Lemma 4.1.4). Let > 0 and r ≥ 0 be the f (0, λ) parameters assumed by condition c). It will suffice to show that there exists ¯ ∈ (0, ) such that para-prox-regularity of f is satisfied by ¯ and r at x ¯ = 0, ¯ f (0, λ) = 0 for v¯ = 0. That is, we seek to show that whenever v0 ∈ ∂x f (x0 , λ), |v0 | < ¯, |f (x0 , λ)| < ¯, (4.2.6) ¯ < ¯, we have and |x| < ¯, |λ − λ| r f (x, λ) ≥ f (x0 , λ) + v0 , x − x0 − |x − x0 |2 . 2 (4.2.7) In place of equation (4.2.7), we can aim at guaranteeing the stronger condition r f (x, λ) > f (x0 , λ) + v0 , x − x0 − |x − x0 |2 (4.2.8) 2 ¯ ≤ . Notice that for x = x0 , |x| ≤ , and |λ − λ| r r r − |x − x0 |2 = rx0 , x − x0 − |x|2 + |x0 |2 , 2 2 2 so we can rewrite inequality (4.2.8) as r r f (x, λ) > f (x0 , λ) + v0 , x − x0 + rx0 , x − x0 − |x|2 + |x0 |2 2 2 r r = f (x0 , λ) + v0 + rx0 , x − v0 + rx0 , x0 − |x|2 + |x0 |2 . 2 2 Rearranging, we get r r f (x, λ) − v0 + rx0 , x + |x|2 > f (x0 , λ) − v0 + rx0 , x0 + |x0 |2 . 2 2 ¯ ≤ ¯, Thus, we seek to show that for every fixed λ such that |λ − λ| r argmin{f (x, λ) + |x|2 − z0 , x } = {(x0 , λ)} 2 |x|≤¯ (4.2.9) where z0 = v0 +rx0 . In summary, we need only demonstrate that conditions (4.2.6) imply equation (4.2.9) when ¯ is small enough. Our assumption that ¯ with f (0, λ) ¯ = 0 ensures that f is lsc within a f is locally lsc at (0, λ) compact set of the form ¯ ≤ ˆ, f (x, λ) ≤ 2ˆ} C = {(x, λ) : |x| ≤ ˆ, |λ − λ| ¯ itself, and since for some ˆ > 0. In particular, f must be lsc at (0, λ) ¯ f (0, λ) = 0, we have lim inf f (x, λ) = 0. ¯ x,|λ−λ|→0 Shrinking if necessary, we can arrange that ¯ ≤ ⇒ |x| ≤ ˆ, |λ − λ| ¯ ≤ ˆ, and f (x, λ) > −2ˆ. |x| ≤ , |λ − λ| (4.2.10) Enlarging r if necessary, we can arrange that 2 6ˆ ⇒ |x| < , |x| < , and f (x, λ) > − . r 2ˆ 2 |x|2 < Define (4.2.11) r {f (x, λ) + |x|2 − z, x }, ¯ 2 |x|,|λ−λ|≤ r G(z) := argmin {f (x, λ) + |x|2 − z, x }. 2 ¯ |x|,|λ−λ|≤ g(z) := inf Since g is the pointwise infimum of a collection of affine functions of z, it ¯ = 0. These assumptions tell is concave. Recall: x ¯ = 0, v¯ = 0, and f (¯ x, λ) us that g(0) = 0 and G(0) = {0}, whereas g(z) ≤ 0 in general. We see this ¯ = 0 and hence f (0, λ) ¯ + r |0|2 − z, 0 = 0, and since by noting that f (0, λ) 2 g(z) is the infimum of all such expressions with |x| < , then 0 is an upper bound for g. We claim that under these circumstances, |z| < ˆ ⇒ G(z) = ∅ and |f (x, λ)| < 2ˆ for all (x, λ) ∈ G(z). (4.2.12) To see this, observe that because g ≤ 0, the minimization in the definition of g(z) is unaffected if the attention is restricted to the points (x, λ) satisfying ¯ < , but also f (x, λ) + r |x|2 − z, x ≤ ˆ, in not only |x| < and |λ − λ| 2 which case f (x, λ) ≤ ˆ + |z|. Therefore, as long as |z| < ˆ, attention can be restricted to points (x, λ) satisfying f (x, λ) < 2ˆ. Recalling conditions (4.2.10) and the choice of the set C, we deduce that when |z| ≤ ˆ, g(z) = inf {f (x, λ) + 2r |x|2 − z, x }, (x,λ)∈C G(z) = argmin{f (x, λ) + 2r |x|2 − z, x }. (4.2.13) (x,λ)∈C Since f is lsc relative to C, which is compact, the infimum is sure to be attained. Thus, implication (4.2.12) is correct. Also we observe that g is finite on a neighborhood of z = 0, and hence continuous (by concavity) [28, Theorem 2.35]. Choose ˜ ∈ (0, ) small enough that ˜(1 + r) < ˆ and g(z) > − 2 when |z| < ˜(1 + r). Under this choice we are ready to consider elements satisfying conditions (4.2.6) and show that we get equation (4.2.9). The vector z0 = v0 + rx0 has |z0 | ≤ |v0 | + r|x0 | < ˜(1 + r), hence |z0 | < ˆ and g(z0 ) > − 2 . In particular, G(z0 ) must be nonempty. Consider any (x1 , λ) ∈ G(z0 ). We have to show on the basis of the single-valuedness property that x1 = x0 . We have that (x0 , λ) ∈ (∂x f + rI)−1 (z0 ), due to the fact that v0 ∈ ∂x f (x0 , λ) and hence v0 + rx0 ∈ (∂x f + rIx )(x0 , λ). If we can establish that |x1 | < , |λ| < , |f (x1 , λ)| < and (x1 , λ) ∈ (∂f + rI)−1 (z0 ), (4.2.14) then the single-valuedness property can be invoked and we will get x1 = x0 as required. Because |z0 | < ˆ, we know from implication (4.2.12) that |f (x1 , λ)| < 2ˆ. Then from 2r |x1 |2 = z0 , x1 − f (x1 , λ) + g(z0 ), where g(z0 ) ≤ 0, we know that 2r |x1 |2 ≤ |z0 ||x1 | + |f (x1 , λ)| < ˆ + 2ˆ = 3ˆ, so |x1 |2 < 6ˆ r . Through 2 (4.2.11) this implies that |x1 | < and also that |x1 | < 2ˆ and f (x1 , λ) > − . Then from f (x1 , λ) = z0 , x1 − 2r |x1 |2 +g(z0 ) ≤ |z0 ||x1 |+|g(z0 )| and the fact 2 ¯ < , that g(z0 ) < 2 we obtain f (x1 ) < ˆ 2ˆ + 2 = . Hence |x1 | < , |λ − λ| and |f (x1 , λ)| < . The fact that the minimum for g(z0 ) is attained at x1 gives us r r f (x, λ) + |x|2 − z0 , x ≥ f (x1 , λ) + |x1 |2 − z0 , x1 (4.2.15) 2 2 ¯ < . Hence, for all |x| < and |λ − λ| ¯ < , when |x| < and |λ − λ| r r f (x, λ) ≥ f (x1 , λ) + z0 , x − z0 , x1 − |x|2 + |x1 |2 2 2 r r = f (x1 , λ) + z0 , x − x1 + r x1 , x1 − x, x − x1 , x1 2 2 r = f (x1 , λ) + z0 − rx1 , x − x1 − |x − x1 |2 . 2 Since |x1 | < , this implies that z0 − rx1 ∈ ∂x f (x1 , λ), so z0 ∈ (∂x f + rIx )(x1 , λ). The subgradient relation in (4.2.14) is therefore correct as well, and our single-valuedness assumption implies x0 = x1 . Thus equation (4.2.9) holds and we are done. This result allows us to obtain the following characterization of paraprox-regularity. A similar result was shown in [25, Corollary 3.4]. Corollary 4.2.2 below is stronger than [25, Corollary 3.4] as the function defined here is para-prox-regular as opposed to prox-regular. Corollary 4.2.2. Let f be a proper, lsc function. Then f is para-prox¯ for v¯, with parameters > 0, r > 0, if and only if v¯ is regular in x at (¯ x, λ) ¯ and a proximal subgradient of f with respect to x at (¯ x, λ) v1 − v0 , x1 − x0 ≥ −r|x1 − x0 |2 ¯ < , vi ∈ ∂x f (xi , λ), when i ∈ {0, 1}, |xi − x ¯| < , |f (xi , λ) − f (¯ x, λ)| ¯ < . |vi − v¯| < , and |λ − λ| This result is an extension of [18, Proposition 3.1], which gives the proof of a) ⇒ c) but with a stronger condition and stronger result. He assumes the NDC in his conditions, and arrives at the conclusion that (∂x f + rIx )−1 is not only single-valued, but also monotone and Lipschitz continuous. Chapter 5 Constructing Para-prox-regular Functions In this chapter we examine some common situations where parameters arise in optimization problems, and we explore when para-prox-regularity is present. In all cases we focus on identifying the prox-regularity parameters. In general we begin with prox-regular functions and build upon them. We already know that many important classes of functions are prox-regular, such as proper, lsc, convex functions, lower-C 2 functions (C 2 functions included), strongly amenable functions, and primal-lower-nice functions. Our goal here is to begin to analyze these functions to see what conditions are necessary for them to be para-prox-regular as well, or how we can combine or alter them to get para-prox-regular functions. In particular, we show that the scalar multiplication of a prox-regular function by a positive parameter results in a function that remains prox-regular. This leads to the construction of para-prox-regular functions via the weighted sum and weighted max of prox-regular functions. 5.1 Naming Parameters Explicitly The following proposition and theorem are results first provided in [27], showing that strongly amenable functions, that is functions which can be locally represented as the composition of a C 2 function with a proper, lsc, convex function, are prox-regular. In that work, Poliquin and Rockafellar simply conclude the function is prox-regular and they do not single out the prox-regularity parameters and r in the statement of the theorem. In the present work, the theorem has been restated with a proposition beforehand, in order to pull out the prox-regularity parameters from the proof into the statement of the theorem. In later results we require the ability to identify such parameters instead of merely stating that they exist. Other than clarifying the prox-regularity parameters, Proposition 5.1.1 and Theorem 5.1.2 and their proofs are essentially the same as [27, Theorem 3.1]. 34 Proposition 5.1.1. [27, Theorem 3.1] Let f (x) = g(F (x)), where F : Rn → ¯ is lsc and proper, and suppose there is no Rm is of class C 2 , g : Rm → R ∞ vector y = 0 in ∂ g(F (¯ x)) with ∇F (¯ x)∗ y = 0, where x ¯ ∈ dom f . Then a) there exists 0 > 0 such that ∂f (x) ⊂ ∇F (x)∗ ∂g(F (x)) when |x − x ¯| < 0 and |f (x) − f (¯ x)| < 0. b) S : (x, v) → {y ∈ ∂g(F (x)) : ∇F (x)∗ y = v} has closed graph and is locally bounded at (¯ x, v¯) for v¯ ∈ ∂f (¯ x), within the same neighborhoods B 0 (¯ x) and B 0 (f (¯ x)). In particular, S(¯ x, v¯) is a compact set. c) If v¯ ∈ ∂f (¯ x) is such that for every y ∈ ∂g(F (¯ x)) with ∇F (¯ x)∗ y = v¯ the function g is prox-regular at F (¯ x) with parameters y > 0, ry ≥ 0, then for the parameters = min{ 0 , min{ y }} and some r¯ ≥ 0 we have y y1 − y0 , u1 − u0 ≥ −¯ r|u1 − u0 |2 (5.1.1) whenever yi ∈ ∂g(ui ), |ui − F (¯ x)| < ¯, |g(ui ) − g(F (¯ x))| < ¯, and dist(yi , S(¯ x, v¯)) < ¯. Proof: See [27, Theorem 3.1] and the first half of its proof. The purpose of pulling out Proposition 5.1.1 from [27, Theorem 3.1] is that now we have an explicit expression for the that we will use in Theorem 5.1.2. In Theorem 5.1.2 we must provide greater detail, in order to fully describe the parameter r. Theorem 5.1.2. [27, Theorem 3.1] Let the conditions of Proposition 5.1.1 hold. Assume further that v¯ ∈ ∂f (¯ x) is a vector such that for every y ∈ ∗ ∂g(F (¯ x)) such that ∇F (¯ x) y = v¯, the function g is prox-regular at F (¯ x) for y with parameters y > 0, ry ≥ 0. Then f is prox-regular at x ¯ for v¯ with parameters > 0 as in Proposition 5.1.1 c), and r ≥ 0 as described by equations (5.1.2), (5.1.4), (5.1.6), and (5.1.8) below. Proof: By our assumptions, g is prox-regular at F (¯ x) for each y ∈ S(¯ x, v¯), with constants y and ry . Since S(¯ x, v¯) is compact, we know that there is a finite open cover of S using a finite number of the y , say { 1 , . . . , k }. Then using the corresponding {ri , . . . , rk } we can define r¯ := max {ri } ≥ 0. i∈{1,...,k} (5.1.2) We then have > 0, r¯ ≥ 0, S a compact set, and yi ∈ ∂g(F (xi )) ∩ S with ∇F (xi )∗ yi = vi such that y1 − y0 , F (x1 ) − F (x0 ) ≥ −¯ r|F (x1 ) − F (x0 )|2 when |xi − x ¯| < , |f (xi ) − f (¯ x)| < , vi ∈ ∂f (xi ), and |vi − v¯| < . Note that when vi = ∇F (xi )∗ yi then v1 − v0 , x1 − x0 = ∇F (x1 )∗ y1 − ∇F (x0 )∗ y1 + ∇F (x0 )∗ y1 − ∇F (x0 )∗ y0 , x1 − x0 , which gives v1 − v0 , x1 − x0 = [∇F (x1 )∗ − ∇F (x0 )∗ ]y1 , x1 − x0 + ∇F (x0 )∗ (y1 − y0 ), x1 − x0 . (5.1.3) Because S is compact, we know that there exists L such that |y| ≤ L for all y ∈ S. Because F ∈ C 2 , we know that for |x0 − x ¯| < and |x1 − x ¯| < there exists K such that |∇F (x1 ) − ∇F (x0 )| ≤ K|x1 − x0 |. Hence, we can find |[∇F (x1 )∗ − ∇F (x0 )∗ ]y| ≤ r1 |x1 − x0 |. (5.1.4) Thus by the Cauchy-Schwarz Inequality, we have that for all y ∈ S, r1 > 0 such that for all y ∈ S, [∇F (x1 )∗ − ∇F (x0 )∗ ]y, x1 − x0 ≥ −r1 |x1 − x0 |2 . (5.1.5) The final term in equation (5.1.3) can be written as ∇φ(x0 ), x1 − x0 for the C 2 function φ(x) = y1 − y0 , F (x) . Let r2 be an upper bound for the eigenvalues of the Hessian of x → η, F (x) , (5.1.6) where x ranges over B (¯ x) and η ranges over the compact set S − S. Then in particular, by the Taylor expansion of φ about x0 , we have φ(x) ≤ φ(x0 ) + ∇φ(x0 ), x − x0 + r2 |x − x0 |2 when |x − x ¯| ≤ . It follows that ∇φ(x0 ), x1 − x0 ≥ −r2 |x1 − x0 |2 + φ(x1 ) − φ(x0 ) = −r2 |x1 − x0 |2 + y1 − y0 , F (x1 ) − F (x0 ) ≥ −r2 |x1 − x0 |2 − r¯|F (x1 ) − F (x0 )|2 . (5.1.7) But since F ∈ C 2 , there is also a constant k such that |F (x1 ) − F (x0 )| ≤ k|x1 − x0 | when |xi − x ¯| < . Using this fact with Inequalities (5.1.5) and (5.1.7) for the two terms at the end of equation (5.1.3) we obtain v1 − v0 , x1 − x0 ≥ −r1 |x1 − x0 |2 − r2 |x1 − x0 |2 − r¯k 2 |x1 − x0 |2 . Thus for r := r1 + r2 + r¯k 2 (5.1.8) we have that v1 − v0 , x1 − x0 ≥ −r|x1 − x0 |2 whenever |xi − x ¯| < , |f (xi ) − f (¯ x)| < , vi ∈ ∂f (xi ), and |vi − v¯| < . 5.2 Scalar Multiplication and Finite Sum Now we are ready to begin with parametrized functions. Lemma 5.2.3 demonstrates that multiplying a prox-regular function by a positive parameter does not affect its prox-regularity. We shall use this property in the theorems that follow. First we require a pair of basic lemmas on computing subgradients after scaling a function. These are well-known results, but we provide short proofs for completeness. Lemma 5.2.1. [28, Proposition 10.19] Let f : Rn → R and λ > 0. Then λ∂f = ∂λf. Proof: Let x ¯ ∈ dom f, v¯ ∈ ∂f (¯ x). Then there exists a sequence {xk } ⊆ ˆ (xk ) and dom f that tends to x ¯, and a sequence {v k } such that v k ∈ ∂f k k k v → v¯. So for each (x , v ) we have f (x) ≥ f (xk ) + v k , x − xk + o(|x − xk |). (5.2.1) Multiplying inequality (5.2.1) by λ yields λf (x) ≥ λf (xk ) + λv k , x − xk + o(|x − xk |) (5.2.2) for each (xk , v k ). Note that λ does not affect the little-o term. This gives us that each λv k is a regular subgradient of λf (xk ). So we have a sequence ˆ (xk ) and xk → x λv k → λ¯ v such that λv k ∈ ∂λf ¯, which says that λ¯ v ∈ ∂λf (¯ x). Hence, λ∂f ⊆ ∂λf. Now by starting with λ¯ v ∈ ∂λf (¯ x) and following the same procedure as above in reverse order, we see that ∂λf ⊆ λ∂f. Therefore, λ∂f = ∂λf. Lemma 5.2.2. [28, Proposition 10.19] Let f : Rn → R and λ > 0. Then ∂ ∞ λf = ∂ ∞ f. Proof: Let x ¯ ∈ dom f , v¯ ∈ ∂ ∞ f (¯ x). Then there exist sequences {xk }, ˆ (xk ), and γ k v k → v¯. {γ k } and {v k } such that xk → x ¯, γ k 0, v k ∈ ∂f 1 k Since λ > 0, then the sequence λ γ 0, and we know from Lemma 5.2.1 1 k k k ˆ that λv ∈ ∂λf (x ). So we see that λ γ λv k = γ k v k → v¯, and we have the conditions needed to conclude that v¯ ∈ ∂ ∞ λf (¯ x). Hence, ∂ ∞ f (¯ x) ⊆ ∞ ∂ λf (¯ x). By following the same procedure in reverse order, we see that ∂ ∞ λf (¯ x) ⊆ ∂ ∞ f (¯ x), and we arrive at the conclusion that the two horizon subdifferentials are equal. Lemmas 5.2.1 and 5.2.2 will allow us to present our first minor result of this section, Lemma 5.2.3 below. Lemma 5.2.3. Let f : Rn → R be prox-regular at x ¯ for v¯ with parameters ˜ > 0 and r˜ ≥ 0, and let λ ∈ R, λ > 0. Then λf (x) is prox-regular at x ¯ for λ¯ v with parameters = min{˜, λ˜} and r = λ˜ r. Proof: By prox-regularity of f , ˜ and r˜ are such that r˜ f (x ) ≥ f (x) + v, x − x − |x − x|2 2 whenever |x − x ¯| < ˜, |x − x ¯| < ˜, |f (x) − f (¯ x)| < ˜, v ∈ ∂f (x), and |v − v¯| < ˜. Multiplying by λ > 0 yields λf (x ) ≥ λf (x) + λv, x − x − λ˜ r |x − x|2 2 whenever |x − x ¯| < ˜, |x − x ¯| < ˜, |f (x) − f (¯ x)| < ˜, v ∈ ∂f (x), and |v − v¯| < ˜. Since v ∈ ∂f (x), we know by Lemma 5.2.1 that λv ∈ ∂λf (x). When |v − v¯| < ˜ and |f (x) − f (¯ x)| < ˜ we have |λv − λ¯ v | < λ˜ and |λf (x) − λf (¯ x)| < λ˜. Let = min{˜, λ˜}, and let r = λ˜ r. We conclude that r λf (x ) ≥ λf (x) + λv, x − x − |x − x|2 2 whenever |x − x ¯| < , |x − x ¯| < , |λf (x) − λf (¯ x)| < , λv ∈ ∂f (x), and |λv − λ¯ v| < . A direct result of Lemma 5.2.3 is that the scalar multiplication of a proxregular function with a positive parameter results in a para-prox-regular function. Corollary 5.2.4. Let f : Rn → R ∪ {+∞} be prox-regular at x ¯ for v¯ with ¯ ¯ parameters ˜ > 0 and r˜ ≥ 0, and let λ ∈ R, λ > 0. Then g(x, λ) := λf (x) ¯ λ ¯ ¯ for λ¯ ¯ v , with parameters = min{˜, λ˜ is para-prox-regular at (¯ x, λ) 2 , 2 } and ¯r r = 3λ˜ 2 . ¯ ¯ − δ, λ ¯ + δ). Then for each such λ, Proof: Let δ = λ2 and consider λ ∈ (λ using ¯ = min{˜, δ˜} we have that g(x , λ) ≥ g(x, λ) + λv, x − x − λ˜ r |x − x|2 2 ¯ − g(¯ ¯ < ¯, λv ∈ ∂x g(x, λ), whenever |x − x ¯| < ¯, |x − x ¯| < ¯, |g(x, λ) x, λ)| ¯r ¯ v | < ¯. Let = min{δ, ¯} and r = sup{λ˜ and |λv − λ¯ r} = 3λ˜ 2 . Then we have λ r g(x , λ) ≥ g(x, λ) + λv, x − x − |x − x|2 2 ¯ < , λv ∈ ∂x g(x, λ), when |x − x ¯| < , |x − x ¯| < , |g(x, λ) − g(¯ x, λ)| ¯ > , and |λ − λ| ¯ < . |λv − λv| Next we have a minor extension of [27, Theorem 3.2], where Poliquin and Rockafellar showed that, under certain conditions, the sum of two proxregular functions is again prox-regular. We extend this to the sum of any finite number of prox-regular functions. The result also extracts the proxregularity parameters from the proof, thus allowing their use in later results. Theorem 5.2.5. For i ∈ {1, 2, . . . , m}, let fi : Rn → R. Let x ¯∈ m dom fi i=1 m fi (x) and let v¯ ∈ ∂f (¯ x). and assume that NDC holds. Define f (x) := i=1 Suppose there exist m i > 0 and ri ≥ 0 such that for any vi ∈ ∂fi (¯ x) with vi = v¯ we have that fi is prox-regular at x ¯ for vi with parameters i i=1 and ri . Then f is prox-regular at x ¯ for v¯ with parameters r = m maxi {ri }. = mini { i } and m Proof: Define g(u1 , u2 , . . . , um ) := fi (ui ) and F (x) := (x, x, . . . , x). i=1 Then g is lsc and proper, F ∈ C 2 , and f (x) = g(F (x)). Notice that ∇F (x) = (I, I, . . . , I) , so the only possible y ∈ ∂ ∞ g(F (¯ x)) with ∇F (¯ x)∗ y = 0 is y = 0. Therefore the constraint qualification on y in Theorem 5.1.2 holds. We have that g is prox-regular at F (¯ x) for every y ∈ ∂g(F (¯ x)) with ∇F (¯ x)∗ y = v¯, so all the conditions of Theorem 5.1.2 are satisfied. Let and r be as described in the proof of that theorem. In particular, r1 = 0 (equation (5.1.4)) in this case since ∇F (x1 ) − ∇F (x0 ) = 0. Also r2 = 0 (equation (5.1.6)), since η, F (x) = η1 , x + · · · + ηn , x , which implies ∇ η, F (x) = η1 + · · · + ηn and ∇2 η, F (x) = 0. Finally, |F (x1 ) − F (x0 )| = |(x1 − x0 , x1 − x0 , . . . , x1 − x0 )| = = which gives k = √ √ m|x1 − x0 |2 m|x1 − x0 |, m (equation (5.1.8)). Hence r = m¯ r = m max{ri } (equai tion (5.1.2)). We conclude that f is prox-regular at x ¯ for v¯ with these parameters. Corollary 5.2.6. For i ∈ {1, 2, . . . , m}, let fi : Rn → R be prox-regular at x ¯ for vi ∈ ∂fi (¯ x) with parameters i > 0 and ri ≥ 0, and let NDC hold. Let m ¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) > 0 (i.e. λ ¯ i > 0 for all i). Define λ ∈ R , and fix λ m f (x, λ) := λi fi (x). i=1 ¯ is prox-regular in x at x ¯ v with parameters = min{ i , λ ¯i i} Then f (·, λ) ¯ for λ¯ ¯ i ri }, where v¯ = and r = m max{λ i m i ¯ i vi ∈ ∂x f (¯ ¯ λ x, λ). i=1 Proof: Apply Lemma 5.2.3 to each fi in Theorem 5.2.5. We now examine parametrized functions by multiplying each of the proxregular functions fi in Theorem 5.2.5 by a parameter λi , and we show that the weighted sum of prox-regular functions is a para-prox-regular function. m Theorem 5.2.7. For i ∈ {1, 2, . . . , m}, let fi : Rn → R. Let x ¯∈ ¯ = (λ ¯1, λ ¯2, . . . , λ ¯ m ) > 0. Assume that NDC holds. Define and λ dom fi i=1 m f (x, λ) := λi fi (x) i=1 ¯ If fi is prox-regular at x where λ = (λ1 , λ2 , . . . , λm ), and let v¯ ∈ ∂x f (¯ x, λ). ¯ m ¯ i v¯i = v¯, with parameters i and ri , then for all v¯i ∈ ∂fi (¯ x) such that λ i=1 ¯ for v¯, with parameters f is para-prox-regular at (¯ x, λ) ¯ and r = m max{ 32λi ri }. i ¯ ¯ = min{ λ2i , i , λ2i i } i ¯ i > 0 and fi is prox-regular at x Proof: Since λ ¯ for v¯i for all i, Lemma 5.2.3 ¯ ¯ i v¯i with paramapplies and we have that each λi fi is prox-regular at x ¯ for λ ¯ ¯ ¯ ¯ i fi = ∂ ∞ fi eters ˜i = min{ i , λi i } and r˜i = λi ri . Since λi > 0, we have ∂ ∞ λ by Lemma 5.2.2, so NDC holds. Thus by Corollary 5.2.6, we have that ¯ is prox-regular in x at x f (x, λ) ¯ for v¯ with parameters ¯ = min{˜i } and i ¯ r¯ = m max{˜ ri }. (Note that here we are viewing f as a function of x, with λ i fixed.) Define δ := ¯ i }, and B := {λ : λi ∈ [λ ¯ i − δ, λ ¯ i + δ] for all i}. min{λ 1 2 i Since λ > 0 for all λ ∈ B, by similar arguments as above we have that f (x, λ) is prox-regular in x at x ¯ for v¯ for every fixed λ ∈ B, with parameters λ = min{ i , λi i } and rλ = m max{λi ri }. But min{λi } = δ and i i λ∈B ¯ i } + δ, so for all λ ∈ B we have that f (x, λ) is prox-regular max{λi } = max{λ i λ∈B ¯ i + δ)ri }. in x at x ¯ for v¯ with parameters ˆ = min{ i , δ i } and rˆ = m max{(λ ¯ i +δ ≤ Since λ ¯i 3λ 2 , we may overestimate and set r = i ¯i 3λ m max{ 2 ri }. i Therefore ¯ we may conclude that f is para-prox-regular at (¯ x, λ) for v¯ with parameters ¯i ¯i ¯ λ λ = min{δ, ˆ} = min{ 2 , i , 2 i } and r = m max{ 32λi ri }. i i lower-C 2 In the next example, for f1 and f2 functions this result has already been proved [14]. The lower-C 2 requirement is generalized to prox-regular by using Theorem 5.2.7. Example 5.2.8. Let f1 : Rn → R ∪ {+∞} and f2 : Rn → R ∪ {+∞} be 2 prox-regular at x ¯∈ dom fi . Assume NDC holds. Define i=1 f (x, λ) := λf1 (x) + (1 − λ)f2 (x). ¯ such that v¯ = λv1 + (1 − λ)v2 with v1 ∈ ∂f1 (¯ Let v¯ ∈ ∂x f (¯ x, λ) x) and ¯ v2 ∈ ∂f2 (¯ x). Then f is para-prox-regular at (¯ x, λ) for v¯ = λv1 + (1 − λ)v2 , ¯ ∈ (0, 1). for any λ 5.3 Finite Maximum The following proposition is a restatement of a result from an earlier work. We shall use it and the corollary that follows in the proof of Theorem 5.3.3, para-prox-regularity of a weighted finite max of C 0 functions. Proposition 5.3.1. [19, Proposition 2.2] Suppose that f : Rn × Rm → R ∪ ¯ in the sense that on some neigh{+∞} is strongly amenable in x at (¯ x, λ), ¯ there is a composite representation f (x, λ) = g(F (x, λ)) in borhood of (¯ x, λ) which F : Rn × Rd → Rm is a C 2 mapping and g : Rm → R is a convex, ¯ ∈ D := dom g and proper, lsc function for which F (¯ x, λ) ¯ ∇x F (¯ ¯ ∗ z = 0 ⇒ z = 0. z ∈ ND (F (¯ x, λ)), x, λ) ¯ one has f continuously para-prox-regular in Then as long as v¯ ∈ ∂x f (¯ x, λ), ¯ x at (¯ x, λ) for v¯. This allows us to state Corollary 5.3.2 below, which extends [25, Exercise 2.9], the similar result for the prox-regular case. ¯ = Corollary 5.3.2. For i ∈ {1, 2, . . . , m}, let fi : Rn → R be C 2 . Let λ m ¯1, λ ¯2, . . . , λ ¯ m ) > 0. Let x dom fi and define (λ ¯∈ i=1 f (x, λ) := max{λi fi (x)} i where λ = (λ1 , λ2 , . . . , λm ). Then f is continuously para-prox-regular in x ¯ for v¯ ∈ ∂x f (¯ ¯ at (¯ x, λ) x, λ). Proof: Let F (x, λ) = (λ1 f1 (x), λ2 f2 (x), . . . , λm fm (x)) and g(u1 , u2 , . . . , um ) = max{u1 , u2 , . . . , um }. Then F is C 2 , since each component is a product of two C 2 functions, g is proper lsc convex, and f (x, λ) = g(F (x)). ¯ = Therefore f is strongly amenable. Since D = dom g is Rm , ND (F (¯ x, λ)) {0}, so the constraint qualification of Proposition 5.3.1 holds. We have all the requirements of Proposition 5.3.1, and its conclusion is our desired result. Our next result is a relaxation of the C 2 condition of Corollary 5.3.2. It is sufficient that the fi be C 0 , prox-regular functions. In relaxing the result, we provide a direct proof and also extract the prox-regularity parameters from the proof. ¯ = Theorem 5.3.3. For i ∈ {1, 2, . . . , m}, let fi : Rn → R be C 0 . Let λ m ¯1, λ ¯2, . . . , λ ¯ m ) > 0. Let x (λ ¯∈ dom fi . Define i=1 f (x, λ) := max{λi fi (x)} i ¯ If the fi are prox-regular where λ = (λ1 , λ2 . . . , λm ), and let v¯ ∈ ∂x f (¯ x, λ). at x ¯ for each vi ∈ ∂fi (¯ x) with parameters i > 0 and ri ≥ 0, and if f is Clarke regular at x ¯ (i.e. every normal vector to epi f at x ¯ is a regular ¯ for v¯ with parameters normal vector), then f is para-prox-regular at (¯ x, λ) ¯ ¯ = min{ i , λi2 i } and r = max{ 3λ2i ri }. i i ¯ = {i : f (¯ ¯ = λ ¯ i fi (¯ Proof: Let A(¯ x, λ) x, λ) x)} be the active set of in¯ = conv {λ ¯ i ∂fi (¯ dices. Then ∂x f (¯ x, λ) x)} [9, Theorem 2.8.2], so every ¯ i∈A(¯ x,λ) ¯ has the form v¯ ∈ ∂x f (¯ x, λ) ¯ i vi for some vi ∈ ∂fi (¯ θi λ x) and some set ¯ i∈A(¯ x,λ) of non-negative {θi }i∈A(¯x,λ) ¯ such that θi = 1. Corollary 5.2.4 tells ¯ i∈A(¯ x,λ) ¯ i ) for all λ ¯ i vi ∈ ∂ λ ¯ i fi (¯ us that the λi fi are para-prox-regular at (¯ x, λ x). In ¯i i λ ¯i ¯ λ particular, for each i we can take ˆi = min{ i , 2 , 2 } and rˆi = 3λ2i ri to get λi fi (x ) ≥ λi fi (x) + λi vˆi , x − x − rˆi |x − x|2 2 (5.3.1) ¯ i f (¯ for |x − x ¯| < ˆi , |x − x ¯| < ˆi , |λi fi (x) − λ x)| < ˆi , λi vˆi ∈ ∂λi fi (x), ¯ ¯ |λi vˆi − λi vi | < ˆi , and |λi − λi | < ˆi . This is a system of m inequalities, so using = min{ˆi } and r = max{ˆ ri } we can say that for all i ∈ {1, 2, . . . , m} i i r λi fi (x ) ≥ λi fi (x) + λi vˆi , x − x − |x − x|2 2 (5.3.2) ¯ i | < , λi vˆi ∈ ∂λi fi (x), |λi vˆi − λ ¯ i vi | < , for all |x − x ¯| < , |x − x ¯| < , |λi − λ ¯ i fi (¯ |λi fi (x) − λ x)| < (with r and no longer dependent on i). Since f (x , λ) ≥ λi fi (x ) for all i, we can replace the left-hand side of inequality (5.3.2) with f (x , λ) to get r f (x , λ) ≥ λi fi (x) + λi vˆi , x − x − |x − x|2 2 (5.3.3) ¯ i | < , λi vˆi ∈ ∂λi fi (x), |λi vˆi − λ ¯ i vi | < , for all |x − x ¯| < , |x − x ¯| < , |λi − λ ¯ i fi (¯ |λi fi (x) − λ x)| < . Let x , x, λ, v be such that |x − x ¯| < , |x − x ¯| < , v ∈ ∂x f (x, λ), |v − ¯ ¯ v¯| < , |λ − λ| < , |f (x, λ) − f (¯ x, λ)| < . Let A(x, λ) = {i : f (x, λ) = λi fi (x)}. Then ∂x f (x, λ) = conv {λi ∂fi (x)}. So v ∈ ∂x f (x, λ) has the i∈A(x,λ) φi λi vˆi for some vˆi ∈ ∂fi (x) and some non-negative {φi }i∈A(x,λ) form i∈A(x,λ) such that φi = 1. We can also substitute f (x, λ) on the right-hand i∈A(x,λ) side of inequality (5.3.3) for those i ∈ A(x, λ) to get r f (x , λ) ≥ f (x, λ) + λi vˆi , x − x − |x − x|2 . 2 (5.3.4) Multiplying inequality (5.3.4) by φi and adding over i ∈ A(x, λ) yields φi f (x , λ) ≥ i∈A(x,λ) i∈A(x,λ) φi λi vˆi , x −x − φi f (x, λ)+ i∈A(x,λ) i∈A(x,λ) r φi |x −x|2 2 which simplifies to r f (x , λ) ≥ f (x, λ) + v, x − x − |x − x|2 . 2 (5.3.5) ¯ for v¯, with parameters and r. Therefore f is para-prox-regular at (¯ x, λ) Chapter 6 Conclusion The goal of this thesis was to present para-prox-regular functions as one of the next logical areas of exploration in the study of prox-regularity, and to take a few steps in that direction. We understand the importance of proxregular functions, as they cover convex functions and C 2 functions among others, and we have seen that para-prox-regular functions are a very natural extension of prox-regular functions. We know that para-prox-regular functions have their place and their importance in the study of the field, since parametrized functions are commonly studied in optimization, in particular with problems that involve stability of minimizers [19, 26]. We have discussed and analyzed several examples of functions that are para-proxregular, and seen how we can create such functions by manipulating a proxregular function or a set of them. In each of those examples, we have defined explicitly the prox-regularity parameters that result. We have also seen an example of where para-prox-regularity of a prox-regular function can fail, ¯ = 0. namely f (x, λ) = λ|x| at λ So we have made significant progress in further understanding how to create para-prox-regular functions, and in getting a feel for what they look like and how they behave. However, the results seen here are limited in at least two key aspects. First, our results concerning weighted parametrized sums and weighted parametrized max functions rely on starting with a finite number of prox-regular functions. A logical direction in which to continue with the development of these theorems would be to explore the case where the family of functions is possibly infinite. One of the difficulties here would be to be able to name, or even prove the existence of, the prox-regularity parameters. For a finite number of prox-regular functions fi with respective parameters i and ri , we are guaranteed the existence of min{ i } and max{ri }, which we used in Theorem 5.2.7 and Theorem 5.3.3. In general for an infinite number of such functions, we have no such guarantee. Second, all the results found in this paper are for functions in Rn , finitedimensional space. A natural path to take in extending these results would be to reexamine them in the setting of a Hilbert or Banach space. This would be a particularly interesting challenge, as up to now we have relied 45 on properties of finite-dimensional space in order to produce the results we have. For example, the proofs of Proposition 5.1.1 and Theorem 5.1.2 use the fact that a closed, bounded set is compact, a property which fails to be preserved when we make the change to infinite dimensions. Since 2005, there have been a number of papers published that extend particular properties of prox-regular functions from finite-dimensional spaces to Hilbert spaces [1, 5, 6, 21, 29], so it is likely the techniques and results seen in those works would be of use in progressing in this direction for para-prox-regular functions. 46 Bibliography [1] M. Baˇc´ ak, J. M. Borwein, A. Eberhard, and B. S. Mordukhovich. Infimal convolutions and Lipschitzian properties of subdifferentials for prox-regular functions in Hilbert spaces. J. Convex Anal., 17(3-4):737– 763, 2010. → pages 46 [2] H.H. Bauschke, Y. Lucet, and M. Trienis. How to transform one convex function continuously into another. SIAM Rev., 50(1):115–132, 2008. → pages 5, 22 [3] H.H. Bauschke, E. Matouˇskov´a, and S. Reich. Projection and proximal point methods: convergence results and counterexamples. Nonlinear Anal., 56(5):715–738, 2004. → pages 5, 22 [4] F. Bernard and L. Thibault. Prox-regularity of functions and sets in Banach spaces. Set-Valued Anal., 12(1-2):25–47, 2004. → pages 4 [5] F. Bernard and L. Thibault. Prox-regular functions in Hilbert spaces. J. Math. Anal. Appl., 303(1):1–14, 2005. → pages 46 [6] S. Boralugoda and R. A. Poliquin. Local integration of prox-regular functions in Hilbert spaces. J. Convex Anal., 13(1):27–36, 2006. → pages 4, 46 [7] M. Bounkhel and A. Jofr´e. Subdifferential stability of the distance function and its applications to nonconvex economies and equilibrium. J. Nonlinear Convex Anal., 5(3):331–347, 2004. → pages 4 ¨ [8] H. Brunn. Uber Kerneigebiete. Math. Ann., 73(3):436–440, 1913. → pages 2 [9] F. H. Clarke. Optimization and nonsmooth analysis, volume 5 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 1990. → pages 43 47 [10] F. H. Clarke, R. J. Stern, and P. R. Wolenski. Proximal smoothness and the lower-C 2 property. J. Convex Anal., 2(1-2):117–144, 1995. → pages 4 [11] A. Eberhard and R. Wenczel. A study of tilt-stable optimality and sufficient conditions. Nonlinear Anal., 75(3):1260–1281, 2012. → pages 4 [12] W. Hare and C. Planiden. Parametrically prox-regular functions, 2012. submitted to J. Convex Anal. → pages iii [13] W. Hare and C. Sagastiz´abal. A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim., 20(5):2442–2473, 2010. → pages 4 [14] W. L. Hare. A proximal average for nonconvex functions: a proximal stability perspective. SIAM J. Optim., 20(2):650–666, 2009. → pages 5, 22, 23, 24, 25, 41 [15] W. L. Hare. A proximal method for identifying active manifolds. Comput. Optim. Appl., 43(2):295–306, 2009. → pages [16] W. L. Hare and R. A. Poliquin. Prox-regularity and stability of the proximal mapping. J. Convex Anal., 14(3):589–606, 2007. → pages 4 [17] J. Johnstone, V. Koch, and Y. Lucet. Convexity of the proximal average. J. Optim. Theory Appl., 148(1):107–124, 2011. → pages 22 [18] A. B. Levy. Calm minima in parameterized finite-dimensional optimization. SIAM J. Optim., 11(1):160–178, 2000. → pages ii, 33 [19] A. B. Levy, R. A. Poliquin, and R. T. Rockafellar. Stability of locally optimal solutions. SIAM J. Optim., 10(2):580–604, 2000. → pages 5, 16, 41, 45 [20] A. S. Lewis, D. R. Luke, and J. Malick. Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math., 9(4):485–513, 2009. → pages 4 [21] D. R. Luke. Finding best approximation pairs relative to a convex and prox-regular set in a Hilbert space. SIAM J. Optim., 19(2):714–739, 2008. → pages 46 [22] H. Minkowski. Diophantische Approximationen. Eine Einf¨ uhrung in die Zahlentheorie. Chelsea Publishing Co., New York, 1957. → pages 2 48 [23] A. Moudafi. An algorithmic approach to prox-regular variational inequalities. Appl. Math. Comput., 155(3):845–852, 2004. → pages 4 [24] R. A. Poliquin and R. T. Rockafellar. Generalized Hessian properties of regularized nonsmooth functions. SIAM J. Optim., 6(4):1121–1137, 1996. → pages 4 [25] R. A. Poliquin and R. T. Rockafellar. Prox-regular functions in variational analysis. Trans. Amer. Math. Soc., 348(5):1805–1838, 1996. → pages ii, 4, 5, 15, 24, 26, 33, 42 [26] R. A. Poliquin and R. T. Rockafellar. Tilt stability of a local minimum. SIAM J. Optim., 8(2):287–299, 1998. → pages 4, 45 [27] R. A. Poliquin and R. T. Rockafellar. A calculus of prox-regularity. J. Convex Anal., 17(1):203–210, 2010. → pages ii, 5, 34, 35, 39 [28] R.T. Rockafellar and R.J.B. Wets. Variational analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1998. → pages 9, 10, 13, 14, 15, 20, 23, 24, 27, 32, 37, 38 [29] M. Sebbah and L. Thibault. Metric projection and compatibly parameterized families of prox-regular sets in Hilbert space. Nonlinear Anal., 75(3):1547–1562, 2012. → pages 46 49 Appendices Appendix A: Glossary of notation R Rn Σ |·| ·, · × → ∇ argmin inf sup C0 C1 1+ C C2 dom := ∂ˆ ∂ ∞ ∂ f∗ P Af0 ,f1 er Pr P Ar Set of all real numbers. Set of all n-dimensional real vectors. Summation. Euclidean norm. Vector inner product. Set product. Maps to. Function gradient. Set of minimizers of a function. Infimum of a function. Supremum of a function. Set of continuous functions. Set of continuously differentiable functions. Set of strictly continuously differentiable functions. Set of twice continuously differentiable functions. Domain of a function. Function definition. Set of regular subgradients. Set of subgradients. Set of horizon subgradients. Fenchel conjugate of function f. Proximal average of functions f0 and f1 . Moreau envelope. Proximal mapping. NC proximal average. 50 Appendix B: Index C 0, 6 C 1, 6 C 2, 6 f -attentive localization, 26 Argmax, 7 Argmin, 8 Ball, 7 Clarke regularity, 43 Closed set, 7 Convex function, 8 Convex hull, 8 Convex set, 8 Derivative, 7 Differentiable function, 7 Domain, 6 Epigraph, 8 Fenchel conjugate, 22 Gradient, 7 Horizon subdifferential, 10 Horizon subgradient, 9 Infimum, 7 Inner product, 6 Lower semicontinuous, 8 Lower-C 2 , 23 lsc, 8 Monotone, 29 Moreau envelope, 22 NC-proximal average, 23 NDC, 27 Non-degeneracy condition, 27 Norm, 6 Open set, 6 Para-prox-regular, 16 Parametrically prox-regular, 16 Projection, 12 Proper function, 6 Prox-bounded, 23 Prox-regular, 14 Proximal average, 22 Proximal normal cone, 12 Proximal subgradient, 9 Regular subdifferential, 9 Regular subgradient, 9 Strongly amenable function, 34 Subdifferential, 9 Subgradient, 9 Supremum, 7 Lipschitz constant, 8 Lipschitz continuity, 8 Little-o, 8 51
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Parametrically prox-regular functions
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Parametrically prox-regular functions Planiden, Chayne 2013
pdf
Page Metadata
Item Metadata
Title | Parametrically prox-regular functions |
Creator |
Planiden, Chayne |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | Prox-regularity is a generalization of convexity that includes all lower-C² functions. Therefore, the study of prox-regular functions provides insight on a broad spectrum of important functions. Parametrically prox-regular (para-prox-regular) functions are a further extension of this family, produced by adding a parameter. Such functions have been shown to play a key role in understanding the stability of minimizers in optimization problems. This thesis discusses para-prox-regular functions in ℝn. We begin with some basic examples of para-prox-regular functions, and move on to the more complex examples of the convex and non-convex proximal averages. We develop an alternate representation of para-prox-regular functions, related to the monotonicity of an f-attentive ε-localization as was done for prox-regular functions [25]. Levy in [18] provided proof of one implication of this relationship; we provide a characterization. We analyze two common forms of parametrized functions that appear in optimization: finite parametrized sum of functions, and finite parametrized max of functions. The example of strongly amenable functions by Poliquin and Rockafellar [27] is given, and a relaxation of its necessary conditions is presented. Some open questions and directions of further research are stated. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-04-12 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0073783 |
URI | http://hdl.handle.net/2429/44178 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Irving K. Barber School of Arts and Sciences (Okanagan) Mathematics, Department of (Okanagan) |
Degree Grantor | University of British Columbia |
Graduation Date | 2013-05 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2013_spring_planiden_chayne.pdf [ 471.8kB ]
- Metadata
- JSON: 24-1.0073783.json
- JSON-LD: 24-1.0073783-ld.json
- RDF/XML (Pretty): 24-1.0073783-rdf.xml
- RDF/JSON: 24-1.0073783-rdf.json
- Turtle: 24-1.0073783-turtle.txt
- N-Triples: 24-1.0073783-rdf-ntriples.txt
- Original Record: 24-1.0073783-source.json
- Full Text
- 24-1.0073783-fulltext.txt
- Citation
- 24-1.0073783.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073783/manifest