Approximate Subdifferential ofPiecewise Linear-Quadratic FunctionsbyAnuj BajajB.Sc. Hons., The University of Delhi, 2012M.Sc., The University of Delhi, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE COLLEGE OF GRADUATE STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)April 2016c© Anuj Bajaj, 2016The undersigned certify that they have read, and recommend to the College of Gradu-ate Studies for acceptance, a thesis entitled:Approximate Subdifferential of Piecewise Linear-Quadratic Functionssubmitted by Anuj Bajaj in partial fulfillment of the requirements ofthe degree of Master of Science .Dr. Warren Hare, Irving K. Barber School of Arts and Sciences, Unit 5, MathematicsSupervisor, Professor (please print name and faculty/school above the line)Dr. Yves Lucet, Irving K. Barber School of Arts and Sciences, Unit 5, Computer ScienceSupervisory Committee Member, Professor (please print name and faculty/school above the line)Dr. Heinz Bauschke, Irving K. Barber School of Arts and Sciences, Unit 5, MathematicsSupervisory Committee Member, Professor (please print name and faculty/school above the line)Dr. Julian Cheng, Faculty of Applied Science, School of EngineeringUniversity Examiner, Professor (please print name and faculty/school above the line)June 16, 2016(Date submitted to Grad Studies)iiAbstractOptimization is a branch of mathematics dealing with the selection of the best ele-ment(s) (based on some criteria) from a set of available options. In the study of optimiza-tion, we often come across problems involving functions that are not differentiable, thus,one cannot employ the tool of differentiation to solve or analyze such functions. Such anissue can be resolved with the help of subdifferentials.Subgradients generalize derivatives to nondifferentiable functions, which makes themone of the most useful and powerful instruments in nonsmooth optimization. However, theymay come with a few limitations. Some limitations may be overcome by using approximatesubdifferentials, which are a certain relaxation of true subdifferentials. In this thesis, wepresent a general algorithm to compute approximate subdifferential of any proper convexfunction. We then implement the algorithm for the family of convex univariate piecewiselinear-quadratic functions - an important model class of functions for nonlinear systems.We provide many numerical examples. Finally, some directions and insights for futurework are detailed.iiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Basic Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 7Chapter 3: Approximate Subdifferential Calculus . . . . . . . . . . . . . . . 103.1 Approximate Subdifferential of Convex Quadratic Functions . . . . . . . . . 103.2 Approximate Subdifferential of Maximum of Two Quadratics . . . . . . . . 16Chapter 4: Algorithmic Computation of -subdifferential . . . . . . . . . . . 214.1 The Outline of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 An Implementation of Univariate PLQ Functions . . . . . . . . . . . . . . . 254.3 Numerical Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Chapter 5: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 50Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52iiiList of TablesTable 4.1 Functions in the CCA Toolbox relevant to Algorithm 1 . . . . . . . . 28Table 4.2 Notations in Algorithm Complexity . . . . . . . . . . . . . . . . . . . 36Table 4.3 Core subroutines in Algorithm 2 and their complexity . . . . . . . . 37ivList of FiguresFigure 1.1 A typical example of a profit function of a company . . . . . . . . . 1Figure 1.2 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Figure 1.3 Line-segment test to check convexity . . . . . . . . . . . . . . . . . 2Figure 1.4 A non-convex function . . . . . . . . . . . . . . . . . . . . . . . . . 3Figure 1.5 An illustration of a simple Cutting Planes Method . . . . . . . . . . 4Figure 3.1 PLQ function may not be a maximum of finitely many quadratics. . 20Figure 3.2 PLQ function may not be a maximum of finitely many quadratics. . 20Figure 4.1 Visualizations for ∂f with f(x) = |x| (x ∈ R) . . . . . . . . . . . . 23Figure 4.2 Visualizations for ∂f with f(x) =|x|55 (x ∈ R), x¯ = 0.3 and = 0.1 24Figure 4.3 Visualizations for ∂f with f(x) = −ln(x) (x > 0) . . . . . . . . . . 25Figure 4.4 A discontinuous piecewise linear-quadratic function . . . . . . . . . 27Figure 4.5 Visualizations for ∂f with f(x) = |x| (x ∈ R) . . . . . . . . . . . . 38Figure 4.6 Visualizations for ∂f with f(x) =x24+ |x| (x ∈ R) . . . . . . . . . 39Figure 4.7 Visualizations for ∂f with f(x) = max{x2,x2+ 5}(x ∈ R) . . . . 40Figure 4.8 Visualizations for ∂f with f(x) as defined in Example 4.21 . . . . . 40Figure 4.9 Visualizations for ∂f with f(x) as defined in Example 4.22 . . . . . 41Figure 4.10 Visualizations for ∂f with f(x) as defined in Example 4.22 . . . . . 41Figure 4.11 Visualizations for ∂f with f(x) as defined in Example 4.23 . . . . . 42Figure 4.12 Visualizations for ∂f with f(x) = ι{x¯} + 0 . . . . . . . . . . . . . . 42Figure 4.13 Visualizations for ∂f with f(x) = 2x (x ∈ R) . . . . . . . . . . . . 43Figure 4.14 Visualizations for ∂f with f(x) = |x| (x ∈ R) . . . . . . . . . . . . 44Figure 4.15 Visualizations for ∂f with f(x) = |x| (x ∈ R) . . . . . . . . . . . . 45Figure 4.16 Animated version of ∂f(x) as a function of x . . . . . . . . . . . . 46Figure 4.17 Animated version of ∂f(x) as a function of x . . . . . . . . . . . . 47Figure 4.18 Animated version of ∂f(x) as a function of x . . . . . . . . . . . . 48Figure 4.19 Animated version of ∂f(x) as a function of > 0 . . . . . . . . . . 49Figure 5.1 Minimum of two quadratics may not always be PLQ representable. 50vAcknowledgementsI take this opportunity to acknowledge and express my gratitude towards my supervisorsDr. Warren Hare and Dr. Yves Lucet for entrusting upon me confidence and providing mewith such a wonderful opportunity.I am thankful for their constant encouragement and support throughout the course ofthe thesis, especially for the expertise, patience and useful suggestions given during thecourse of the research. It added considerably to my graduate experience.I would like to thank my committee members Dr. Heinz Bauschke, Dr. Warren Hare,and Dr. Yves Lucet for the assistance they provided at all levels of the research, it wastruly overwhelming.I am also thankful to University of British Columbia, Okanagan and Natural Sciencesand Engineering Research Council of Canada (NSERC) for providing financial assistance,without which this research would seem like a distant dream.Finally, I take this opportunity to extend my deep appreciation to my family and friendsfor all that they meant to me during the crucial times of the completion of my research.viDedicationTo my parents and sister for their constant love and support.“It’s all about loving your family”viiChapter 1IntroductionFor a long time, the study of nondifferentiable functions (also referred as nonsmoothfunctions) has been of interest to many researchers. One of the reasons being the frequentoccurrence of such a family of functions in real-world applications.For instance, suppose Figure 1.1 depicts the profit function of a company for a givenfiscal year for a given commodity. Since one of the key objectives for any company is tomaximize profit, finding an optimal solution of the profit function is a matter of importance.However, in this hypothetical example, one cannot employ the tool of differentiation, whichis learned in first year calculus, to solve the concerned problem.Figure 1.1: A typical example of a profit function of a company.Studying the analytical properties of functions that are not differentiable is not new.A concept introduced by Rockafellar in early 1960s, is the subdifferential. Subdifferentials(and subgradients) generalize the derivatives to nonsmooth functions, which makes themone of the most useful and powerful instruments in optimization.In this thesis, we focus on the subdifferential and subgradients of nonsmooth convexfunctions, therefore we now shed some light upon the notion of convexity. Figure 1.2 showstwo examples of convex functions.1Chapter 1. IntroductionFigure 1.2: Convex functionsTo check convexity of a function, we typically follow a simple line-segment test. Everyconvex function satisfies the property that the line segment joining any two points on orabove the graph of the function is never below the graph. For instance, the two functionsdepicted in Figure 1.2 enjoy this property. A closer look at one of these functions givesa better insight (Figure 1.3). On the contrary, if we are able to find any two points onor above the graph of the function such that the line-segment test fails, then we say thefunction is non-convex. Figure 1.4 illustrates a non-convex function.Figure 1.3: Line-segment test to check convexity.2Chapter 1. IntroductionFigure 1.4: A non-convex function.After having familiarized ourselves with the notion of convexity we now present theformal definition of a convex function. Prior to which, we define terms seen in the definitionnamely, domain of a function and a convex set.Definition 1.1. The domain of a function f : Rn → R ∪ {−∞,+∞} denoted by dom(f),is the set of all x ∈ Rn such that f(x) < +∞.Definition 1.2. A set C ⊆ Rn is convex if given any points x ∈ C, y ∈ C the line segmentjoining those points is in C, i.e, {z ∈ Rn : z = θx+ (1− θ)y, θ ∈ [0, 1]} ⊆ C.Definition 1.3. A function f : Rn → R ∪ {−∞,+∞} is said to be convex if dom(f) is aconvex set andf(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y)for all x, y ∈ dom(f), θ ∈ (0, 1).Now that we have a brief understanding about convex functions, we present the formaldefinition of subdifferential of a convex function.Definition 1.4. Let f : Rn → R ∪ {+∞} be convex and x ∈ dom(f). The subdifferential(also known as the exact subdifferential) of f at x is the set∂f(x) = {z ∈ Rn : f(y) ≥ f(x) + 〈z, y − x〉 for all y ∈ dom(f)}.The elements of the set ∂f(x), are vectors, known as the subgradients of f at the point x.Moreover, ∂f(x) may also be denoted as ∂0f(x).Geometrically, subgradient vectors correspond to linearization to the function f(x) ata given point x. The collection of all such vectors is known as the subdifferential of f(x)at x.3Chapter 1. IntroductionRemark 1.5. For a convex function f , if f is continuously differentiable at x, then∂f(x) = {∇f(x)}.Based on the notion of subdifferentials, several numerical methods have been devel-oped to solve nondifferentiable optimization problems. One group of classical methods ofparticular interest to this thesis, is Cutting Planes methods. Cutting Planes methods areoptimization methods that work by approximating the objective function by a piecewiselinear model constructed using subgradients and function values, given by Equation (1.1),fˇm(x) = maxi=1,2,...,m{f(xi) + 〈si, x− xi〉}, (1.1)where f(xi) is the function value at the point xi and si = s(xi) is a subgradient of f atthe point xi. Note that fˇm being a maximum of a collection of linear functions is convex[RW09, Example 2.8, Page 42], [Roc15, Theorem 5.5, Page 35].The basic idea behind the Cutting Planes method is that it iteratively refines theobjective function by means of function value and its subgradient. Figure 1.5 allows us abetter understanding of the basic working behind the Cutting Planes method on a functionf(x).Figure 1.5: An illustration of a simple Cutting Planes MethodIn Figure 1.5, we begin by choosing the points x1 and x2 whose function values aredenoted by f1 and f2 respectively in the figure. The respective model that approximates4Chapter 1. Introductionf(x) at these points is fˇ2 (as shown). The next iterate x3 shall be the point where theminimizer of the model fˇ2 appears whose function value is denoted by f3 in the figure.As with any optimization algorithm, it is a key research interest to study the conver-gence properties of the Cutting Planes method. To prove convergence, notice that (x3, f˜2)where f˜2 = fˇ2(x3), is a point on the graph of fˇ2(x) . Also, 0 ∈ ∂fˇ2(x3) as x3 is a minimizerof fˇ2(x) [HUL13, Page 23]. Moreover, fˇ2(x) ≤ f(x) for all x ∈ R.To see this, note that from definition of fˇ2(x), si ∈ ∂f(xi) for i = {1, 2}. By definition,we havef(x) ≥ f(xi) + 〈si, x− xi〉 for all x ∈ R.Since, the above inequality holds for all i = {1, 2}, we get,f(x) ≥ maxi=1,2{f(xi) + 〈si, x− xi〉} = fˇ2(x),that is,fˇ2(x) ≤ f(x) for all x ∈ R.So we havef˜2 ≤ fˇ2(x) ≤ f(x) for all x ∈ R,f˜2 + 〈0, x− x3〉 ≤ f(x) for all x ∈ R,which yieldsf(x3) + 〈0, x− x3〉 − (f(x3)− f˜2) ≤ f(x) for all x ∈ R. (1.2)This leads us to the central concept of this thesis, approximate subdifferential of a convexfunction, formally defined as follows.Definition 1.6. Let f : Rn → R ∪ {−∞,+∞} be convex, x ∈ dom(f) and ≥ 0. The-subdifferential (also known as the approximate subdifferential) of f at x is the set∂f(x) = {z¯ ∈ Rn : f(y) ≥ f(x) + 〈z¯, y − x〉 − for all y ∈ Rn}.The elements of ∂f(x) are known as the -subgradients of f at x.Note that ∂f(x) ⊆ ∂f(x) for any ≥ 0. Throughout the thesis, we shall define∂f(x) = ∅ when x /∈ dom(f).Thus, from Equation (1.2) and Definition 1.6 we have shown0 ∈ ∂3f(x3) for 3 = f(x3)− f˜2 ≥ 0.Hence, the concept of approximate subdifferential is the key to proving that the CuttingPlanes method works. In practice, as the algorithm advances, i.e., as m→∞, the modelfˇm(x) = maxi=1,2,...,m{f(xi) + 〈si, x− xi〉}5Chapter 1. Introductionbecomes a better approximation of f(x). So, from [BGLS06, Theorem 9.6, Page 133 ](f(xi)− f˜i)→ 0 as i→∞.Thus taking limit of 0 ∈ ∂if(xi), yields (after some math)0 ∈ ∂0f(x¯) = ∂f(x¯) where x¯ = limi→∞xi.Noting ∂0f(x¯) = ∂f(x¯) givesf(x¯) + 〈0, x− x¯〉 ≤ f(x) for all x ∈ R.f(x¯) ≤ f(x) for all x ∈ R.which shows x¯ is the minimizer of f(x), in the given case x¯ = 0. This highlights thepractical importance of approximate subdifferential of a convex function.Interestingly, in optimization, the usefulness of approximate subdifferentials has notonly been confined to practical purposes. Over the years, a body of calculus rules ofsubdifferentials have been presented in optimization [RW09] that require certain conditionsto be satisfied. However, for the case when functions are extended real-valued, i.e., functionvalues either being a finite real number or infinity, checking the validity of such conditionsmay be difficult. Such issues can be treated by approximate subdifferentials [HUMSV95].Noting the importance of approximate subdifferential the question then arises of whetherthere is a technique that enables us to numerically compute and hence visualize it. In thepresent thesis, we focus on finding the approximate subdifferential of convex nonsmoothfunctions particularly, piecewise-linear quadratic functions. Although, this might seem abit restrictive, it serves as a starting point for a better understanding of the concept. Notealso that any convex function can be approximated by a convex piecewise-linear function.The present thesis is structured as follows. Chapter 2 contains some basic definitionsrequired for understanding of the notations used. Chapter 3 contains some previouslyknown results concerning approximate subdifferential of convex quadratic functions (The-orem 3.4) and maximum of two convex functions (Fact 3.6). As a corollary (Corollary3.5), we present a new formula for the approximate subdifferential of the maximum oftwo convex quadratic functions. Section 4.1 presents a general algorithm for computingthe approximate subdifferential of any proper convex function along with a few numericalexamples. Section 4.2 presents an implementation of the general algorithm for the classof convex piecewise-linear quadratic functions. It also discusses the data structure andthe complexity of the algorithm. Section 4.3 illustrates the implementation with somenumerical examples. Finally, Chapter 5 summarizes the work we have done and containsa discussion on the limitations of extending the required implementation. It also providessome directions for future work.6Chapter 2Background2.1 Basic Notation and DefinitionsIn this chapter, we define some basic terms seen in nonsmooth optimization that arerequired to understand this thesis. We also provide a previously known result concerningFenchel conjugate of a convex function. In addition, the notations and a few commonterms of linear algebra are also presented.Definition 2.1. The closure of a set S ⊆ Rn, denoted by cl(S), is the smallest closed setcontaining S.Definition 2.2. The convex hull of a set C ⊆ Rn, denoted by conv(C), is the smallestconvex set containing C. Equivalently,conv(C) = {y ∈ Rn : y =m∑i=1θixi, xi ∈ C,m∑i=1θi = 1, θi ≥ 0,m ≥ 1}.Definition 2.3. A function f : Rn → R∪{−∞,+∞} is said to be strictly convex if dom(f)is a convex set andf(θx+ (1− θ)y) < θf(x) + (1− θ)f(y)for all x 6= y ∈ dom(f), θ ∈ (0, 1).Definition 2.4. A function f : Rn → R ∪ {+∞} is said to be proper if dom(f) 6= ∅ andf(x) 6= −∞ for all x ∈ Rn.Definition 2.5. The closure of a convex function f : Rn → R∪{−∞,+∞} is the functiondefined byf(x) ={lim infy→x f(y), if x ∈ cl(dom(f)),+∞, if x /∈ cl(dom(f)).Here, lim infy→x f(y) = limη→0(inf{f(y) : y ∈ Rn ∩ B(x; η)\{x}}) where B(x; η) denotes the ballof radius η about x. A similar explanation applies to Definition 2.6Definition 2.6. A function f : Rn → R ∪ {−∞,+∞} is said to be lower semi-continuous(l.s.c.) if,lim infx→x¯ f(x) ≥ f(x¯) for all x¯ ∈ Rn.72.1. Basic Notation and DefinitionsDefinition 2.7. For a non-empty set C ⊆ Rn. The support function σC : Rn → R ∪{−∞,+∞} of C is defined byσC(y) = sup{〈x, y〉 : x ∈ C}.Definition 2.8. For a non-empty set S ⊆ Rn and a point x ∈ Rn. The Projection of xonto S is the set, denoted by Proj(x, S), defined asProj(x, S) = argmin{‖x− y‖ : y ∈ S},i.e., it is the set of all those points in S that are closest to x in terms of the Euclideandistance.Definition 2.9. Given a function f (not necessarily convex), the convex conjugate (com-monly known as the Fenchel Conjugate) of f denoted by f∗ is defined asf∗(s) = supx∈Rn{〈s, x〉 − f(x)}.Definition 2.10. A set S ⊆ Rn is called polyhedral if it can be specified as finitely manylinear constraints, i.e.,S = {x : 〈ai, x〉 ≤ bi , i = 1, 2, . . . ,m},where for i = 1, 2, . . . ,m, ai ∈ Rn and bi ∈ R.We now present some basic terms of linear algebra that are of interest in proving resultsand theorems in this thesis.Definition 2.11. Subspaces Y1, Y2 of Rn are said to be orthogonal, denoted by Y1 ⊥ Y2, if〈y1, y2〉 = 0 for every y1 ∈ Y1 and y2 ∈ Y2.Definition 2.12. For a subspace Y of Rn. The orthogonal complement of Y , denoted byY ⊥, defined asY ⊥ = {x ∈ Rn : 〈x, y〉 = 0 for all y ∈ Y }.i.e., it is the set of all vectors in Rn that are orthogonal to every vector in Y . Note thatY ⊥ ⊥ Y .Definition 2.13. Let Y1, Y2 be subspaces of a vector space V . Then V is said to be thedirect sum of Y1 and Y2, denoted by V = Y1 ⊕ Y2, if for every v ∈ V , there exists uniquey1 ∈ Y1, y2 ∈ Y2 such that v = y1 + y2.Definition 2.14. For an n ×m matrix A. The Kernel (also known as Null space) of A,denoted by Ker(A), is the set defined asKer(A) = {x ∈ Rm : Ax = 0}.82.1. Basic Notation and DefinitionsDefinition 2.15. For an n ×m matrix A. The Image (also known as Column space) ofA, denoted by Im(A), is the set defined asIm(A) = {y ∈ Rn : y = Ax for some x ∈ Rm}.Definition 2.16. [CM09, Definition 1.1.3, Page 9] For a matrix A ∈ Rm×n. The Moore-Penrose pseudo-inverse (or, pseudo-inverse) of A denoted by A† ∈ Rn×m is a unique matrixsatifying the following conditions:1. AA†A = A,2. A†AA† = A†,3. (AA†)∗ = AA†, and4. (A†A)∗ = A†A.Note that a computationally simple and accurate way to compute Moore-Penrosepseudo-inverse is through singular value decomposition as detailed in [Mon11, Page 141].Definition 2.17. The set of all symmetric positive semi-definite matrices in Rn×n denotedby Sn+ is defined asSn+ = {A ∈ Rn×n : AT = A, xTAx ≥ 0 for all x ∈ Rn}.Definition 2.18. The set of all symmetric positive definite matrices in Rn×n denoted bySn++ is defined asSn++ = {A ∈ Rn×n : AT = A, xTAx > 0 for all x ∈ Rn \ {0}}.Finally, we define piecewise linear-quadratic functions, a class of functions of primaryconcern to this thesis which serves as an essential tool in the computation of FenchelConjugate (Fact 2.20).Definition 2.19. A function f : Rn → R∪{−∞,+∞} is piecewise linear-quadratic (PLQ)if dom(f) can be represented as the union of finitely many polyhedral sets, relative to eachof which f(x) =12〈Ax, x〉+ 〈b, x〉+ c where, A ∈ Rn×n is a symmetric matrix, b ∈ Rn andc ∈ R.Fact 2.20. [HUL13, Theorem 2.2.1, Page 253] For a convex function f : Rn → R, thefollowing are equivalent:(i) f is minimized at x over Rn, i.e.: f(y) ≥ f(x) for all y ∈ Rn, and(ii) 0 ∈ ∂f(x).Note that the condition 0 ∈ ∂f(x) is the generalization of the usual stationary condition∇f(x) = 0 of the smooth case.9Chapter 3Approximate SubdifferentialCalculusIn this chapter, we shall present a few classical results concerning the approximatesubdifferential followed by a new result that derives the approximate subdifferential of themaximum of two quadratic functions.3.1 Approximate Subdifferential of Convex QuadraticFunctionsConsider the quadratic function f : Rn → R, defined as follows,f(x) =12〈Ax, x〉+ 〈b, x〉+ c,where A ∈ Sn+, b ∈ Rn and c ∈ R.Our goal in this section is to compute ∂f(x¯) at an arbitrary point x¯ and > 0. Inorder to compute ∂f , we need to understand the conjugate f∗. We begin by consideringthe case when A is invertible. In this case, the conjugate is computed as follows, the detailsof which are an expansion on [HUL93, Example 1.1.3, Page 38]. By definition,f∗(s) = supx∈Rn{〈s, x〉 − f(x)} = supx∈Rn{〈s− b, x〉 − 12〈Ax, x〉 − c}. (3.1)Letf˜(x) = 〈s− b, x〉 − 12〈Ax, x〉 − c,then∇f˜(x) = s− b−Ax.Since A ∈ Sn++, the supremum of Equation (3.1) is attained when ∇f˜(x) = 0:s− b−Ax = 0⇔ Ax = s− b⇔ x = A−1(s− b).Substituting x = A−1(s− b) in Equation (3.1), and applying AA−1 = I, we getf∗(s) = 〈s− b, A−1(s− b)〉 − 12〈AA−1(s− b), A−1(s− b)〉 − c=12〈s− b, A−1(s− b)〉 − c.103.1. Approximate Subdifferential of Convex Quadratic FunctionsIn summary, if A ∈ Sn++, thenf∗(s) =12〈s− b, A−1(s− b)〉 − c.Following the same approach, we shall now compute f∗(s) for the case when A ∈ Sn+,the details of which are an expansion on [HUL93, Example 1.1.4, Page 38].Lemma 3.1. Let f(x) =12〈Ax, x〉+ 〈b, x〉+ c, where A ∈ Sn+, b ∈ Rn and c ∈ R. Then,f∗(s) ={ 12〈s− b, A†(s− b)〉 − c, (s− b) ∈ Im(A),+∞, (s− b) /∈ Im(A).where A† is the pseudo-inverse of A.Proof. Letf˜(x) = 〈s− b, x〉 − 12〈Ax, x〉 − c,then∇f˜(x) = s− b−Ax.We consider the following two cases: (s− b) ∈ Im(A) and (s− b) /∈ Im(A).Case 1. Suppose (s− b) ∈ Im(A).Let xˆ ∈ Rn be such that s−b = Axˆ, so ∇f˜(xˆ) = 0. As A ∈ Sn+, we have f˜ is maximizedat xˆ. Since s = b+Axˆ, we see that〈s− b, xˆ〉 − 12〈Axˆ, xˆ〉 − c = 〈s− b, xˆ〉 − 12〈s− b, xˆ〉 − c = 12〈s− b, xˆ〉 − c.Therefore,f∗(s) =12〈s− b, xˆ〉 − c where xˆ is any solution to the system Axˆ = (s− b).In particular, taking xˆ = A†(s− b) [CM09, Theorem 2.1.1, Page 28] we obtain,f∗(s) =12〈s− b, A†(s− b)〉 − c.Case 2. Suppose (s− b) /∈ Im(A).Define (s− b)K = Proj((s− b), Ker(A)) and (s− b)K⊥ = Proj((s− b), (Ker(A))⊥). By[Leo80, Theorem 5.2.3, Page 218] we have,(s− b) = (s− b)K + (s− b)K⊥ .113.1. Approximate Subdifferential of Convex Quadratic FunctionsNote that ‖(s − b)K‖ 6= 0, as otherwise, (s − b) ∈ (Ker(A))⊥ = Im(A) [Leo80, Theorem5.2.1, Page 216], a contradiction to (s− b) /∈ Im(A).Now consider λ ≥ 0 and set xˆλ = λ(s− b)K .f∗(s) = supx∈Rn{〈s− b, x〉 − 12〈Ax, x〉 − c},≥ supλ≥0{〈s− b, xˆλ〉 − 12〈Axˆλ, xˆλ〉 − c},= supλ≥0{〈s− b, λ(s− b)K〉 − 12〈A(λ(s− b)K), λ(s− b)K〉} − c.Since (s− b)K ∈ Ker(A), we have A(s− b)K = 0, therefore,f∗(s) ≥ supλ≥0{〈s− b, λ(s− b)K〉} − c= supλ≥0{〈(s− b)K + (s− b)K⊥ , λ(s− b)K〉} − c= supλ≥0{λ〈(s− b)K , (s− b)K〉+ λ〈(s− b)K⊥ , (s− b)K〉} − c.Since (Ker(A))⊥ ⊥ Ker(A), we have 〈(s− b)K⊥ , (s− b)K〉 = 0. Therefore,f∗(s) ≥ supλ≥0{λ}‖(s− b)K‖2 − c = +∞.Thus, we havef∗(s) ={ 12〈s− b, A†(s− b)〉 − c, (s− b) ∈ Im(A),+∞, (s− b) /∈ Im(A).(3.2)Notice that the supremum is finite only if (s− b) ∈ Im(A).Now that we have the required background about the convex conjugate of a convexquadratic function, we proceed to the expression for its approximate subdifferential.In order, to establish the desired formula we require the following proposition. Proposi-tion 3.2 appears in [HUL93, Proposition 1.2.1, Page 95], we recreate (and slightly expand)the proof for completeness.Proposition 3.2. A vector s ∈ Rn is an -subgradient of a proper closed convex functionf at x ∈ dom(f) if and only iff∗(s) + f(x)− 〈s, x〉 ≤ . (3.3)123.1. Approximate Subdifferential of Convex Quadratic FunctionsProof. Let x ∈ dom(f). By definition, s ∈ ∂f(x) if and only iff(y) ≥ f(x) + 〈s, y − x〉 − for all y ∈ Rn⇔ f(y) ≥ f(x) + 〈s, y〉 − 〈s, x〉 − for all y ∈ Rn⇔ ≥ 〈s, y〉 − f(y) + f(x)− 〈s, x〉 for all y ∈ Rn⇔ ≥ supy∈Rn{〈s, y〉 − f(y)}+ f(x)− 〈s, x〉⇔ ≥ f∗(s) + f(x)− 〈s, x〉.As a consequence of Proposition 3.2 we have the following result.Corollary 3.3. Let f : Rn → R∪{+∞} be a proper closed convex function. If > 0, thendom(∂f) = dom(f).Proof. By definition, dom(∂f) ⊆ dom(f).To see the other direction, let x ∈ dom(f). Since, f is closed convex,f(x) = f∗∗(x) = sups∈Rn{〈x, s〉 − f∗(s)}.Since, > 0, there exists s ∈ Rn such thatf(x)− ≤ 〈x, s〉 − f∗(s),which impliesf∗(s) + f(x)− 〈x, s〉 ≤ .By Proposition 3.2 this is equivalent to s ∈ ∂f(x). Hence, x ∈ dom(f).Theorem 3.4 appears in [HUL93, Example 1.2.2, Page 95], we recreate (and expand)the proof for completeness.Theorem 3.4. Let f(x) =12〈Ax, x〉+ 〈b, x〉+ c, where A ∈ Sn+, b ∈ Rn and c ∈ R. Then,for ≥ 0 and x ∈ Rn∂f(x) = {s ∈ Rn : (s− b) ∈ Im(A), f(x) + 12〈s− b, A†(s− b)〉 − c ≤ 〈s, x〉+ }. (3.4)Alternatively,∂f(x) = ∇f(x) + {p ∈ Im(A) : 12〈p,A†p〉 ≤ }. (3.5)133.1. Approximate Subdifferential of Convex Quadratic FunctionsProof. In view of Lemma 3.1 and Equation (3.3) the -subdifferential of f is given by∂f(x) = {s ∈ Rn : f∗(s) + f(x) ≤ 〈s, x〉+ }= {s ∈ Rn : s− b ∈ Im(A), f(x) + 12〈s− b, A†(s− b)〉 − c ≤ 〈s, x〉+ },which is Equation (3.4).To see Equation (3.5), fix xˆ ∈ Rn and suppose (s− b) ∈ Im(A). Let p = (s− b)− Axˆ,so s− b− p = Axˆ. First note p ∈ Im(A).Indeed, since (s− b) ∈ Im(A) then there exists yˆ ∈ Rn such that Ayˆ = (s− b). So,p = Ayˆ −Axˆ = A(yˆ − xˆ),which implies p ∈ Im(A). Thus, from [Lau05, Page 52] and property of A† we havep = Proj(p, Im(A)) = AA†p,(s− b) = Proj((s− b), Im(A)) = AA†(s− b). (3.6)Now note thatf(xˆ) +12〈s− b, A†(s− b)〉 − c ≤ 〈s, xˆ〉+ ⇔ 12〈Axˆ, xˆ〉+ 〈b, xˆ〉+ c+ 12〈s− b, A†(s− b)〉 − c ≤ 〈s, xˆ〉+ ⇔ 12〈Axˆ, xˆ〉+ 12〈s− b, A†(s− b)〉 ≤ 〈s− b, xˆ〉+ ⇔ 12〈(s− b− p), xˆ〉+ 12〈s− b, A†(s− b)〉 ≤ 〈s− b, xˆ〉+ ⇔ 12〈s− b, A†(s− b)〉 ≤ 〈(s− b)− 12(s− b− p), xˆ〉+ ⇔ 12〈s− b, A†(s− b)〉 ≤ 〈12(s− b) + 12p, xˆ〉+ ⇔ 12〈s− b, A†(s− b)〉 ≤ 12〈(s− b), xˆ〉+ 12〈p, xˆ〉+ .Notice since A is symmetric we have (A†)T = (AT )† = A†. Thus, using p = (s − b) − Axˆ143.1. Approximate Subdifferential of Convex Quadratic Functionsand Equation (3.6) we getf(xˆ) +12〈s− b, A†(s− b)〉 − c ≤ 〈s, xˆ〉+ ⇔ 12〈s− b, A†(s− b)〉 ≤ 12〈(s− b), xˆ〉+ 12〈p, xˆ〉+ ⇔ 12〈s− b, A†(s− b)〉 ≤ 12〈AA†(s− b), xˆ〉+ 12〈AA†p, xˆ〉+ ⇔ 12〈A†(s− b), s− b〉 ≤ 12〈A†(s− b), Axˆ〉+ 12〈A†p,Axˆ〉+ ⇔ 12〈A†(s− b), s− b〉 ≤ 12〈A†(s− b), s− b− p〉+ 12〈A†p, s− b− p〉+ ⇔ 0 ≤ −12〈A†(s− b),−p〉+ 12〈A†p, s− b− p〉+ ⇔ 0 ≤ −12〈(s− b), A†p〉+ 12〈A†p, s− b− p〉+ ⇔ 0 ≤ 12〈A†p,−p〉+ ⇔ 12〈A†p, p〉 ≤ ⇔ 12〈p,A†p〉 ≤ .Hence, we obtain∂f(xˆ) = {s ∈ Rn : s− b ∈ Im(A), 12〈p,A†p〉 ≤ , p = (s− b)−Axˆ}= {Axˆ+ b+ p ∈ Rn : p+Axˆ ∈ Im(A), 12〈p,A†p〉 ≤ }= {∇f(xˆ) + p ∈ Rn : p ∈ Im(A), 12〈p,A†p〉 ≤ } (∇f(xˆ) = Axˆ+ b)= ∇f(xˆ) + {p ∈ Im(A) : 12〈p,A†p〉 ≤ }.That is∂f(xˆ) = ∇f(xˆ) + {p ∈ Im(A) : 12〈p,A†p〉 ≤ }.Now that we have the required background about the approximate subdifferential ofconvex quadratic functions, we proceed to deriving the expression for the approximatesubdifferential of the maximum of two convex quadratic functions.153.2. Approximate Subdifferential of Maximum of Two Quadratics3.2 Approximate Subdifferential of Maximum of TwoQuadraticsIn this section, we shall derive the expression for approximate subdifferential of themaximum of two convex quadratic functions. We shall restrict ourselves to two single-variable convex quadratics with the same curvature, as otherwise the formula becomesunwieldy.Corollary 3.5. Let f1, f2 : R → R be such that f1(x) = 12a1x2 + b1x + c1 and f2(x) =12a2x2 + b2x + c2, where a1 = a2 ≥ 0, b1 ≤ b2 ∈ R and c1, c2 ∈ R. Let x¯ ∈ R be such thatf1(x¯) = f2(x¯). Then, for f(x) = max{f1(x), f2(x)} and > 0 we have∂f(x¯) = a1x¯+ [b1 −√2a1, b2 +√2a1].Prior to proving the corollary, we require the following results.Fact 3.6. ([HUMSV95], Theorem 2.2) Suppose f1, f2 are (extended real-valued) properl.s.c convex functions on Rn. Let x0 ∈ Rn be a point at which f1(x0) = f2(x0). Then, forf = max(f1, f2), we have for all > 0∂f(x0) = cl( ⋃α1>0,α2>0α1+α2=1⋃1≥0,2≥01+2={α1∂1/α1f1(x0) + α2∂2/α2f2(x0)}).Lemma 3.7. If f : Rn → R has the form f = 〈a, x〉 + b where, a ∈ Rn and b ∈ R, thenfor ≥ 0 and x˜ ∈ Rn∂f(x˜) = ∂f(x˜) = {∇f(x˜)} = {a}.Proof. Clearly, ∇f(x) = a for all x ∈ Rn. Since f is convex and differentiable, we have[HUL93, Proposition 1.5.1, Page 49]∂f(x˜) = {∇f(x˜)} = {a}.Next, we show that ∂f(x˜) = {u ∈ Rn : u = a}.By definition,u ∈ ∂f(x˜) ⇔ f(y) ≥ f(x˜) + 〈u, y − x˜〉 − for all y ∈ dom(f)⇔ 〈u, y − x˜〉 − f(y) + f(x˜) ≤ for all y ∈ dom(f)⇔ supy∈dom(f){〈u, y〉 − f(y)− 〈u, x˜〉+ f(x˜)} ≤ ⇔ supy∈dom(f){〈u, y〉 − 〈a, y〉 − b− 〈u, x˜〉+ 〈a, x˜〉+ b} ≤ ⇔ supy∈dom(f){〈(u− a), (y − x˜)〉} ≤ .163.2. Approximate Subdifferential of Maximum of Two QuadraticsSince,supy∈dom(f){〈(u− a), (y − x˜)〉} ={0, u = a,+∞, u 6= a,Thus, u ∈ ∂f(x˜)⇔ u = a. Hence, ∂f(x˜) = {a} = ∂f(x˜).Now, we turn to the proof of Corollary 3.5.Proof. (Corollary 3.5) Notice, since a1 = a2, so x¯ is the unique intersection point. Also, fis convex [Roc15, Theorem 5.5, Page 35].We consider the following two cases: a1 = 0 and a1 > 0 . Consider α1 > 0, α2 > 0 suchthat α1 + α2 = 1 and 1 ≥ 0, 2 ≥ 0 such that 1 + 2 = .Case 1. Suppose a1 = 0.Since 0 = a1 = a2 we have ∇f1(x) = b1 and ∇f2(x) = b2 for all x ∈ R. From Lemma3.7 we have∂1/α1f1(x¯) = b1, and∂2/α2f2(x¯) = b2.Since f(x¯) = f1(x¯) = f2(x¯), from Fact 3.6 we get∂f(x¯) = cl( ⋃α1>0,α2>0α1+α2=1{α1b1 + α2b2})= cl(conv(b1, b2)).Since conv(b1, b2) = [b1, b2] is a closed set, we have∂f(x¯) = [b1, b2].Case 2. Suppose a1 > 0.Note that, in 1-dimension, for i = {1, 2} with ai > 0, we have a†i = a−1i . From Theorem3.4 we obtain∂i/αifi(x¯) = ∇fi(x¯) +{si ∈ R : 12〈si, a−1i si〉 ≤ i/αi}for i = {1, 2}= ∇fi(x¯) + {si ∈ R : 〈si, a−1i si〉 ≤ 2i/αi} for i = {1, 2}= ∇fi(x¯) + {si ∈ R : s2i ≤ 2aii/αi} for i = {1, 2}= ∇fi(x¯) + {si ∈ R : si ∈ [−√2aii/αi,√2aii/αi} for i = {1, 2}= [∇fi(x¯)−√2aii/αi,∇fi(x¯) +√2aii/αi] for i = {1, 2}.173.2. Approximate Subdifferential of Maximum of Two QuadraticsUsing a1 = a2 we get∂i/αifi(x¯) = [∇fi(x¯)−√2a1i/αi,∇fi(x¯) +√2a1i/αi] for i = {1, 2}.Notice, by linearity of gradient we haveαi∂i/αifi(x¯) = αi[∇fi(x¯)−√2a1i/αi,∇fi(x¯) +√2a1i/αi] for i = {1, 2}.= [αi∇fi(x¯)− αi√2a1i/αi, αi∇fi(x¯) + αi√2a1i/αi] for i = {1, 2}.= [∇(αifi)(x¯)−√2a1iαi,∇(αifi)(x¯) +√2a1iαi] for i = {1, 2}.From Fact 3.6 we get∂f(x¯) = cl( ⋃α1>0,α2>0α1+α2=1⋃1≥0,2≥01+2={[∇(α1f1)(x¯)−√2a11α1,∇(α1f1)(x¯) +√2a11α1]+[∇(α2f2)(x¯)−√2a12α2,∇(α2f2)(x¯) +√2a12α2]}).That is∂f(x¯) = cl( ⋃α1>0,α2>0α1+α2=1⋃1≥0,2≥01+2={[∇(α1f1)(x¯) +∇(α2f2)(x¯)− (√2a11α1 +√2a12α2),∇(α1f1)(x¯) +∇(α2f2)(x¯) + (√2a11α1 +√2a12α2))}).∂f(x¯) = cl( ⋃α1>0,α2>0α1+α2=1⋃1≥0,2≥01+2={[∇(α1f1 + α2f2)(x¯)− (√2a11α1 +√2a12α2),∇(α1f1 + α2f2)(x¯) + (√2a11α1 +√2a12α2)]}).Simplifying ∇(α1f1 +α2f2)(x¯) using linearity of gradient, a1 = a2 and α1 +α2 = 1 we get∇(α1f1 + α2f2)(x¯) = α1∇f1(x¯) + α2∇f2(x¯)= α1(a1x¯+ b1) + α2(a2x¯+ b2)= α1(a1x¯+ b1) + α2(a1x¯+ b2)= (α1 + α2)a1x¯+ α1b1 + α2b2= a1x¯+ α1b1 + α2b2.183.2. Approximate Subdifferential of Maximum of Two QuadraticsNow, since b1 ≤ b2 we obtain the lower bound (respectively, upper bound) of ∂f(x¯) bysetting 1 = and α1 = 1 (respectively, 2 = and α2 = 1 ). Hence, we obtain∂f(x¯) = cl(conv(a1x¯+ b1 −√2a1, a1x¯+ b2 +√2a1))= cl([a1x¯+ b1 −√2a1, a1x¯+ b2 +√2a1])= [a1x¯+ b1 −√2a1, a1x¯+ b2 +√2a1].Therefore,∂f(x¯) = a1x¯+ [b1 −√2a1, b2 +√2a1].Corollary 3.5 gives a closed form for -subdifferential for the maximum of two univariateconvex quadratic functions under the simplified assumption, which are quite restrictive interms of applicability. In Chapter 4, we will develop a different approach to computethe approximate subdifferential of a broader class of functions: 1-dimensional convex PLQfunctions. It is clear that in R1 the maximum of two quadratic functions is a PLQ function.We end this chapter with examples showing that, even in R1 there exists PLQ functionsthat are not the maximum of finitely many quadratic functions.Example 3.8. Consider for x ∈ Rf(x) ={2, x = 0,+∞, x 6= 0.Clearly f(x) is a (convex) piecewise linear-quadratic function with dom(f) 6= R. Since,quadratic functions are finite-valued on R, the given f(x) cannot be expressed as a maxi-mum of finite number of quadratics.Example 3.9. Consider for x ∈ Rf(x) =x2/2− x, x < −2,x2, −2 ≤ x ≤ 0,x2/2− x, 0 < x,= min{x22− x, x2}.Here, f(x) is a (non-convex) piecewise linear-quadratic function with dom(f) = R, beingthe minimum of two quadratic functions. As f(x) is non-convex, it cannot be representedas the maximum of convex functions. Figure 3.1 visualizes this fact.193.2. Approximate Subdifferential of Maximum of Two QuadraticsFigure 3.1: PLQ function not a maximum of finite quadratic functions.Let us now consider a (strictly convex) PLQ function.Example 3.10. For x ∈ R considerf(x) ={x2 − x, x ≤ 1,2x2 − 3x+ 1, 1 < x.Clearly dom(f) = R. Since max{x2−x, 2x2−3x+1} = 2x2−3x+1 and min{x2−x, 2x2−3x+ 1} = x2 − x, f(x) is not the maximum of finitely many quadratic functions, nor is itthe minimum of finite quadratics. Figure 3.2 visualizes this fact.Figure 3.2: PLQ function not a maximum nor a minimum of finite quadratic functions.20Chapter 4Algorithmic Computation of-subdifferentialIn this chapter, we shall present a general algorithm that enables us to compute theapproximate subdifferential for any proper convex function. While the algorithm would bedifficult (or impossible) to implement in a general setting, we shall present an implementa-tion specifically for 1-dimensional convex PLQ functions (Section 4.2). We then illustratethe implementation with some numerical examples (Section 4.3).4.1 The Outline of the AlgorithmIn this subsection, we present the general algorithm and prove it returns the correctset. Prior to presenting the algorithm we establish its validity.Theorem 4.1. Let f : Rn → R∪{+∞} be a proper convex function, x¯ ∈ dom(f) and > 0.Let l(s) = −f(x¯)+〈s, x¯〉. Let m(s) = min{f∗(s), l(s)} and X = {s ∈ Rn : m(s) = f∗(s)}.Then,∂f(x¯) = X. (4.1)Proof. By Proposition 3.2s˜ ∈ ∂f(x¯) ⇐⇒ f∗(s˜) + f(x¯)− 〈s˜, x¯〉 ≤ ⇐⇒ f∗(s˜) ≤ − f(x¯) + 〈s˜, x¯〉⇐⇒ f∗(s˜) ≤ l(s˜)⇐⇒ s˜ ∈ {s ∈ Rn : m(s) = f∗(s)}⇐⇒ s˜ ∈ X.(4.2)One may easily infer that ∂f(x¯) comprises all possible real solutions of the inequalityf∗(s) ≤ l(s). We now proceed to presenting the algorithm.214.1. The Outline of the AlgorithmAlgorithm 1: Approximate subdifferential of a proper convex functionInput: f (proper convex function), x¯ ∈ dom(f), > 0Output: X1: Compute f∗(s)2: Define l(s) = − f(x¯) + 〈s, x¯〉3: Define m(s) = min{f∗(s), l(s)}4: Output X = {s ∈ dom(f∗) : m(s) = f∗(s)}To shed some light upon the algorithm we consider the following examples.Example 4.2. Consider the function f(x) =‖x‖pp, x ∈ Rn where 1 < p < ∞. From[BL10, Table 3.1, Page 50] we have f∗(s) =‖s‖qq, s ∈ Rn where 1p+1q= 1. We also havel(s) = − f(x¯) + 〈s, x¯〉 = − ‖x¯‖pp+ 〈s, x¯〉. Thus, we have∂f(x¯) = {s ∈ Rn : f∗(s) ≤ l(s)}={s ∈ Rn : ‖s‖qq≤ − ‖x¯‖pp+ 〈s, x¯〉}. (4.3)In particular, for x¯ = 0, Equation (4.3) becomes∂f(0) ={s ∈ Rn : ‖s‖qq≤ }= {s ∈ Rn : ‖s‖q ≤ q}.Using1p+1q= 1 yields∂f(0) = {s ∈ Rn : ‖s‖q ≤ q}= {s ∈ Rn : ‖s‖ ≤ (q)1/q}={s ∈ Rn : ‖s‖ ≤(1− (1/p))1−(1/p)}={s ∈ Rn : ‖s‖ ≤(pp− 1)(p−1)/p}.For illustrative purposes, we consider the following particular cases of Example 4.2.Example 4.3. Consider f(x) =|x|55(x ∈ R) at x¯ = 0 and = 1.224.1. The Outline of the AlgorithmUsing Maple 17, evaluating{s ∈ R : |s| ≤(pp− 1)(p−1)/p}at p = 5 and = 1 weobtain ∂f(x¯) ≈ [−1.195, 1.195]. Figure 4.1 visualizes the -subdifferential and how thealgorithm computed it for this example.Figure 4.1: Primal View depicts the graph of f(x) = |x|55 (red curve) along with the blackdashed lines passing through the point (x¯, f(x¯) − ) = (0,−1) (red dot) having slopes−1.195 and 1.195 respectively (the lower and upper bounds of ∂f(x¯)). Dual View depictsthe graphs of f∗(s) = 45 |s|5/4 (red curve) and l(s) (solid black line). The green curve showswhen the graphs of m(s) and f∗(s) coincide.Remark 4.4. The Primal View/Dual View framework described in Figure 4.1 is repeatedin Figures 4.2, 4.3, and 4.5 - 4.13.Example 4.5. Consider f(x) =|x|55(x ∈ R) at x¯ = 0.3 and = 0.1.Using Maple 17, evaluating{s ∈ R : (p− 1)|s|p/(p−1)p≤ − |0.3|pp+ 〈s, 0.3〉}at p = 5and = 0.1 we obtain ∂f(x¯) ≈ [−0.128, 0.327]. Figure 4.2 visualizes the -subdifferentialand how the algorithm computed it for this example.234.1. The Outline of the AlgorithmFigure 4.2: Primal and Dual Views of ∂f(x¯) with f(x) =|x|55 , x¯ = 0.3 and = 0.1.Let us now consider a different function.Example 4.6. Considerf(x) ={ − ln(x), x > 0,+∞, x ≤ 0,at x¯ = 3 and = 1. Here, l(s) = 1 + ln(3) + 3s ≈ 3s+ 2.0986. From [BL10, Table 3.1, Page50], we have f∗(s) = −1− ln(−s), −∞ < s < 0. Having conducted the desired calculationsin Maple 17 we get ∂f(x¯) ≈ [−1.049,−0.053] as portrayed by Figure 4.3.244.2. An Implementation of Univariate PLQ FunctionsFigure 4.3: Primal and Dual Views of ∂f(x¯) with f(x) ={ − ln(x), x > 0,+∞, x ≤ 0, x¯ = 3 and = 1.4.2 An Implementation of Univariate PLQ FunctionsGiven the framework of Algorithm 1, a natural question to ask is whether there exists acollection of functions which allow for a general implementation. We consider a well-knownclass in Nonsmooth Analysis: PLQ functions. PLQ functions have shown wide applicability[ABP13], [RW86] [RJ00],[Roc88],[HP16],[DA90] and are computationally tractable in somesituations [GL10], [LBT09], [Tri07], [GL13], [GJL14].Our goal in this section is to compute and visualize ∂f(x¯) at an arbitrary point x¯ and > 0 for a PLQ function. We shall focus on 1-dimensional (proper) convex PLQ functions.Remark 4.7. Suppose f : R→ R ∪ {+∞} is a proper function. Then, f is a PLQ function254.2. An Implementation of Univariate PLQ Functionsif it can be represented in the formf(x) =q0(x) = a0x2 + b0x+ c0, if −∞ < x < x0,q1(x) = a1x2 + b1x+ c1, if x0 ≤ x ≤ x1,q2(x) = a2x2 + b2x+ c2, if x1 ≤ x ≤ x2,......qN−1(x) = aN−1x2 + bN−1x+ cN−1, if xN−2 ≤ x ≤ xN−1,qN (x) = aNx2 + bNx+ cN , if xN−1 < x < +∞,(4.4)where, ai ∈ R for i = {0, 1, · · · , N}, bi ∈ R for i = {0, 1, · · · , N}, ci ∈ R for i = {1, · · · , N−1} and ci ∈ R ∪ {+∞} for i = {0, N}.Note that Definition 2.19 and Remark 4.7 implies f is continuous on its domain. Fur-thermore, Equation (4.4) conforms with the definition stated in [RW09, Definition 10.20,Page 440] as dom(f) is a finite union of polyhedral sets namely, intervals relative to whichf(x) takes either a linear or a quadratic form.Notice, convexity of f requiresqi(xi) = qi+1(xi),q′i(xi) ≤ q′i+1(xi),ai, ai+1 ≥ 0,(4.5)for all i = {0, 1, · · · , N − 1} [Tri07, Theorem 3.34, Page 47].An interesting property of PLQ functions is that they are closed under many convexoperations: Fenchel conjugation, addition, scalar multiplication, and taking the Moreauenvelope [LBT09, Proposition 5.1, Page 107]. Note, for the univariate case, the minimumof two PLQ functions may be discontinuous, as shown by Example 4.8. However, such afunction can still be stored in the desired PLQ data structure discussed in Section 4.2.Example 4.8. Consider for x ∈ Rf1(x) ={0, x ∈ [−3, 1],+∞, x /∈ [−3, 1],andf2(x) = x.264.2. An Implementation of Univariate PLQ FunctionsThen,f(x) = min{f1(x), f2(x)} =x, x < 0,0, 0 ≤ x ≤ 1,x, 1 < x,is a discontinuous piecewise linear-quadratic function as shown by Figure 4.4.Figure 4.4: A discontinuous piecewise linear-quadratic function.Remark 4.9. Even though the minimum of two PLQ functions is not necessarily a PLQfunction, we can still compute the approximate subdifferential from the PLQ data structureshown next.Coming to the application aspect of Algorithm 1, for the given case, we shall be im-plementing it using the Computational Convex Analysis (CCA) toolbox which is openlyavailable for download at [Luc]. It is coded using Scilab, a numerical software freely avail-able [Sci15]. The toolbox encompasses many efficient algorithms to compute fundamentalconvex transforms of univariate PLQ functions, as introduced in [LBT09]. Table 4.1 out-lines the operations important to this work.274.2. An Implementation of Univariate PLQ FunctionsTable 4.1: Functions in the CCA Toolbox relevant to Algorithm 1Function Descriptionplq check(plqf) Checks integrity of a PLQ functionplq isConvex(plqf) Checks convexity of a PLQ functionplq lft(plqf) Fenchel conjugate of a PLQ functionplq min(plqf1,plqf2) Minimum of two PLQ functionsplq isEqual(plqf1,plqf2) Checks equality of two PLQ functionsplq eval(plqf ,X) Evaluates a PLQ function on the grid XData StructureAs the functions listed in Table 4.2 require the input to be a PLQ function, we mustnext shed some light on the data structure used in the CCA library. The CCA toolboxstores a PLQ function as an (N + 1)×4 matrix [LBT09], where each row represents onepiece.For example, the PLQ function f : R→ R ∪ {+∞} defined byf(x) =q0(x) = a0x2 + b0x+ c0, if −∞ < x < x0,q1(x) = a1x2 + b1x+ c1, if x0 ≤ x ≤ x1,......qN−1(x) = aN−1x2 + bN−1x+ cN−1, if xN−2 ≤ x ≤ xN−1,qN (x) = aNx2 + bNx+ cN , if xN−1 < x < +∞,(4.6)is stored asplqf =x0 a0 b0 c0x1 a1 b1 c1............xN−1 aN−1 bN−1 cN−1+∞ aN bN cN . (4.7)Note that, if c0 = +∞ or cN = +∞, then the structure demands that a0 = b0 = 0 oraN = bN = 0 respectively. If f(x) is a simple quadratic function (i.e., a PLQ functionwith 1 piece), then N = 0 and x0 = +∞. Finally, the special case of f(x) being a shiftedindicator function at a single point x˜ ∈ R,f(x) = ι{x˜} + c ={c, x = x˜,+∞, x 6= x˜,284.2. An Implementation of Univariate PLQ Functionswhere c ∈ R, is stored as a single row vector plqf = [x˜ 0 0 c].Remark 4.10. Note that, from now on, we shall designate f and plqf for the mathematicalfunction and the corresponding PLQ matrix representation.PLQ AlgorithmFollowing the PLQ data structure we rewrite Algorithm 1 for the specific class of 1-dimensional convex PLQ functions. In this subsection, we present the desired algorithmand prove it returns the correct set. Prior to presenting the algorithm we establish itsvalidity.Theorem 4.11. Let f : R → R ∪ {+∞} be a 1-dimensional convex PLQ function, x¯ ∈dom(f) and > 0. Note ∂f(x¯) = [vl, vu]. Then the following situations hold.1. If plqf∗ =[s0 a˜0 b˜0 c˜0]and s0 ∈ R then f∗(s) = ι{s0}(s) + c˜0 and a˜0 = b˜0 = 0;so dom(f∗) = {s0} andvl = vu = s0.In this case, f must be a linear function, i.e., f(x) = 〈s0, x〉 − c˜0 for c˜0 ∈ R.2. Otherwise, letplql =[+∞ 0 x¯ − f(x¯)]andplqm =sˆ0 aˆ0 bˆ0 cˆ0sˆ1 aˆ1 bˆ1 cˆ1............sˆk−1 aˆk−1 bˆk−1 cˆk−1sˆk aˆk bˆk cˆkbe the respective PLQ representations ofl(s) = − f(x¯) + 〈s, x¯〉andm(s) = min{f∗(s), l(s)}.Then the following situations hold.(a) If k = 0, then sˆ0 = +∞ andvl = −∞, vu = +∞.In this case, f must be the indicator function of x¯ plus a constant, i.e., f(x) =ι{x¯} − cˆ0 for cˆ0 ∈ R.294.2. An Implementation of Univariate PLQ Functions(b) If k ≥ 1, then sˆ0 ∈ R, sˆk−1 ∈ R,vl ={sˆ0, if[aˆ0 bˆ0 cˆ0]=[0 x¯ − f(x¯)] ,−∞, otherwise,andvu ={sˆk−1, if[aˆk bˆk cˆk]=[0 x¯ − f(x¯)] ,+∞, otherwise.In order, to prove Theorem 4.11, we require the following lemmas.Lemma 4.12. Let f : R→ R ∪ {+∞} be a proper convex PLQ function, x¯ ∈ dom(f) and > 0. Let l(s) = − f(x¯) + 〈s, x¯〉. Let m(s) = min{f∗(s), l(s)}. Then, there exists s ∈ Rsuch that m(s) < l(s).Proof. Note that x¯ ∈ dom(f) so, from Corollary 3.3 ∂/2f(x¯) 6= ∅.Suppose for eventual contradictionm(s) = l(s) for all s.That is,l(s) ≤ f∗(s) for all s.By definition of l(s), this implies− f(x¯) + 〈s, x¯〉 ≤ f∗(s) for all s,2− f(x¯) + 〈s, x¯〉 < f∗(s) for all s.This implies, the inequalityf∗(s) ≤ 2− f(x¯) + 〈s, x¯〉has no solution, which would imply ∂/2f(x¯) = ∅ [HUMSV95, Proposition 1.1 applied to/2]. This contradicts ∂/2f(x¯) 6= ∅.Lemma 4.13. Let f : R→ R ∪ {+∞} be a proper convex PLQ function, x¯ ∈ dom(f) and > 0. Let l(s) = − f(x¯) + 〈s, x¯〉. Let m(s) = min{f∗(s), l(s)}. Then,m(s) = f∗(s) for all s ⇐⇒ f∗(s) = 〈a, s〉+ b (4.8)where a = x¯ and b ≤ − f(x¯). In this case, f must be an indicator function, i.e.,f(x) = ι{x¯} − b for b ∈ R, and therefore ∂f(x¯) = R.Proof.304.2. An Implementation of Univariate PLQ FunctionsPart 1. (⇒): Suppose m(s) = f∗(s) for all s. Thenf∗(s) ≤ l(s) for all s ∈ R. (4.9)Define g(s) = f∗(s) − l(s). Then g is convex with dom(g) = R. From Equation (4.9) wehaveg(s) ≤ 0 for all s ∈ R.Using [Roc15, Corollary 8.6.2, Page 69] we get, g(s) is a constant function. Therefore,f∗(s) = l(s) +K for all s (4.10)where K ∈ R. From Equation (4.10) and the definition of l(s) we obtainf∗(s) = 〈x¯, s〉+ b, b ∈ R. (4.11)To see b ≤ − f(x¯), note that from Equation (4.9) we havef∗(s) ≤ l(s) for all s ∈ R.In particular, for s = 0 we obtain,f∗(0) ≤ l(0).From Equation (4.11) and the definition of l(s) we getb ≤ − f(x¯).Part 2. (⇐): Suppose f∗(s) = 〈a, s〉+ b with a = x¯ and b ≤ − f(x¯). Then,f∗(s) = 〈x¯, s〉+ b ≤ 〈x¯, s〉+ − f(x¯) = l(s) for all s ∈ R.Therefore,m(s) = f∗(s) for all s ∈ R.Part 3. (f(x) = ι{x¯} − b): Supposef∗(s) = 〈x¯, s〉+ bwith b ∈ R satisfying b ≤ − f(x¯). Since f is a proper convex PLQ function, we havethat f = f∗∗ [RW09, Theorem 11.1, Page 474]. Hence, f is an indicator function i.e.,f(x) = ι{x¯} − b [LBT09, Page 110], [HUL12, Proposition 1.3.1, Page 216]. Consequentlyfrom Theorem 4.1 we get∂f(x¯) = R.314.2. An Implementation of Univariate PLQ FunctionsRemark 4.14. Let f : R → R ∪ {+∞} be a proper convex PLQ function, x¯ ∈ dom(f)and > 0. Let l(s) = − f(x¯) + 〈s, x¯〉. Let plqm be the representation of m(s) =min{f∗(s), l(s)}. Supposeplqm =s0 a0 b0 c0s1 a1 b1 c1............sk−1 ak−1 bk−1 ck−1sk ak bk ck with k ≥ 1,where s0 < s1, . . . , sk−1 < sk = +∞ . Then, for any i ∈ {0, 1, . . . , k − 1}[ai bi ci] 6= [ai+1 bi+1 ci+1] .To understand Remark 4.14, suppose for some j ∈ {0, 1, . . . , k − 1} we have[aj bj cj]=[aj+1 bj+1 cj+1].Then, the j-th row would not be constructed by the plq min function so the result holds.Now, we turn to the formal proof of Theorem 4.11.Proof. (Theorem 4.11) Note that x¯ ∈ dom(f) and > 0 so, ∂f(x¯) 6= ∅.Case 1. Suppose plqf∗ =[s0 a˜0 b˜0 c˜0]with s0 ∈ R.By definition, f∗ is an indicator function i.e., f∗(s) = ι{s0} + c˜0. It follows that a˜0 =b˜0 = 0 and dom(f∗) = {s0}. Since the conjugate of an indicator function is the supportfunction [RW09, Example 11.4, Page 477] and [HUL12, Proposition 1.3.1, Page 216] weobtainf∗∗(x) = σ{s0} − c˜0.Since f is a proper convex PLQ function, we have that f = f∗∗ [RW09, Theorem 11.1,Page 474]. Since the support function of a singleton is a linear function [RW09, Page 318]we getf(x) = 〈s0, x〉 − c˜0,which, by Lemma 3.7, gives∂f(x¯) = {s0}.So, vl = vu = {s0}.Case 2. Supposeplqm =sˆ0 aˆ0 bˆ0 cˆ0sˆ1 aˆ1 bˆ1 cˆ1............sˆk−1 aˆk−1 bˆk−1 cˆk−1sˆk aˆk bˆk cˆk324.2. An Implementation of Univariate PLQ Functionswith k = 0. In this case,plqm =[sˆ0 aˆ0 bˆ0 cˆ0]then sˆ0 = +∞. Indeed, if sˆ0 ∈ R, thenplqm =[sˆ0 0 0 cˆ0](= ι{sˆ0} + cˆ0),which cannot happen as m(s) = min{f∗(s), l(s)} ≤ l(s) < +∞.To see vl = −∞, vu = +∞, note that from Lemma 4.12 there exists s¯ ∈ R such thatm(s¯) < l(s¯). This impliesm(s) = f∗(s) for all s,as otherwisem(s) ={f∗(s), s ∈ Xf∗ ,l(s), s ∈ Xl,for some Xf∗ , Xl ⊆ R with Xl 6= ∅. But then, m(s) would have at least two piecescontradicting k = 0. So m(s) = f∗(s) for all s, which by Lemma 4.13 gives∂f(x¯) = R.Therefore, vl = −∞ and vu +∞. Consequently from Lemma 4.13 f must be the indicatorfunction at x¯ plus a constant, i.e., f(x) = ι{x¯} − cˆ0.Case 3. Supposeplqm =sˆ0 aˆ0 bˆ0 cˆ0sˆ1 aˆ1 bˆ1 cˆ1............sˆk−1 aˆk−1 bˆk−1 cˆk−1sˆk aˆk bˆk cˆkwith k ≥ 1. By definition, sˆ0, sˆk−1 ∈ R. We consider four subcases:Subcase 3.1. Suppose[aˆ0 bˆ0 cˆ0]=[0 x¯ − f(x¯)].By definition, m(s) = l(s) for all s < sˆ0. From Remark 4.14 we havem(s) = f∗(s) for all sˆ0 ≤ s ≤ sˆ1.Therefore, from Algorithm 1 we haveinf{v ∈ dom(f∗) : m(v) = f∗(v)} = sˆ0.So, vl = sˆ0.334.2. An Implementation of Univariate PLQ FunctionsSubcase 3.2. Suppose[aˆ0 bˆ0 cˆ0] 6= [0 x¯ − f(x¯)].By definition, we havem(s) = f∗(s) for all s < sˆ0.Therefore,inf{v ∈ dom(f∗) : m(v) = f∗(v)} = −∞.So, vl = −∞.Subcase 3.3. Suppose[aˆk bˆk cˆk]=[0 x¯ − f(x¯)].By definition, m(s) = l(s) for all s > sˆk−1. From Remark 4.14 we havem(s) = f∗(s) for all sˆk−2 ≤ s ≤ sˆk−1.Therefore,sup{v ∈ dom(f∗) : m(v) = f∗(v)} = sˆk−1.So, vu = sˆk−1.Subcase 3.4. Suppose[aˆk bˆk cˆk] 6= [0 x¯ − f(x¯)].Following Subcase 3.2 we getm(s) = f∗(s) for all s > sˆk−1and vu = +∞.Remark 4.15. In the actual implementation, the Subcase 2(a) of Theorem 4.11 is coded bydetecting if f is an indicator function. That is, if plqf =[x0 0 0 c0], where x0 ∈ R,then vl = −∞ and vu = +∞ (Lemma 4.13) without computing plq min.We now present the desired algorithm for the specific class of 1-dimensional convexPLQ functions.344.2. An Implementation of Univariate PLQ FunctionsAlgorithm 2: Approximate subdifferential of 1-dimensional convex PLQ functionInput: plqf =x0 a0 b0 c0x1 a1 b1 c1............xN−1 aN−1 bN−1 cN−1+∞ aN bN cN, x¯, > 0Output: vˆl, vˆu1: Compute plq check(plqf)if true, continue; else stop return ‘the input function is not PLQ.’2: Compute plq isConvex(plqf)if true, continue; else stop return ‘the input function is not convex.’3: Compute plq eval(plqf ,x¯)if +∞, stop return ‘x¯ is not in the domain of the function.’; else continue.4: if plqf=[x0 0 0 c0]then stop return vl = −∞, vu = +∞; else continue.5: Compute plqf∗ = plq lft(plqf):plqf∗ =s0 a˜0 b˜0 c˜0s1 a˜1 b˜1 c˜1............sN˜−1 a˜N˜−1 b˜N˜−1 c˜N˜−1+∞ a˜N˜ b˜N˜ c˜N˜ .if plqf∗ =[s0 a˜0 b˜0 c˜0]and s0 ∈ R stop return vˆl = s0, vˆu = s0; else continue.6: Defineplql =[+∞ 0 x¯ (− plq eval(plqf, x¯))] .7: Compute plqm = plq min(plqf∗,plql):plqm =sˆ0 aˆ0 bˆ0 cˆ0sˆ1 aˆ1 bˆ1 cˆ1............sˆk−1 aˆk−1 bˆk−1 cˆk−1+∞ aˆk bˆk cˆk ,where sˆ0 < sˆ1, . . . , < sˆk−1 < sˆk = +∞.8: Compute vˆl, vˆu:If k = 0,vˆl = −∞, vˆu = +∞If k ≥ 1,vˆl ={sˆ0, if[aˆ0 bˆ0 cˆ0]=[0 x¯ (− plq eval(plqf, x¯))] ,−∞, otherwise,andvˆu ={sˆk−1, if[aˆk bˆk cˆk]=[0 x¯ (− plq eval(plqf, x¯))] ,+∞, otherwise.354.2. An Implementation of Univariate PLQ FunctionsComplexity of Algorithm 2Having established the validity of Algorithm 2, we next look at its complexity. In computationalcomplexity theory, algorithm analysis plays an important part, it provides a theoretical estimate forthe resources (such as time and space) needed to execute them. These estimates help in providinguseful insight into plausible directions towards efficient algorithms. Algorithm complexity quantifiesthe amount of time and/or space required by an algorithm for an input of given size. Usually, it isstated as a function relating the given input size to the number of times the principal activity ofthe algorithm has been performed (time complexity) and/or memory allocation (space complexity).An algorithm may take different amounts of time on the same input depending upon factors suchas processor speed, efficiency of compiler, size of the input, to name a few. Thus, it is quitecommon to estimate the complexity asymptotically, that is, to measure the complexity functionfor an arbitrary large input, represented by the “big-O” notation (O). Formally, we define forf : N→ N and g : N→ Nf(n˜) = O(g(n˜)) for n˜→ +∞if and only if there exists c∗ > 0 and n∗ > 0 such thatf(n˜) ≤ c∗g(n˜) for all n˜ ≥ n∗.Intuitively, we may say that the function f(n˜) does not grow faster than g(n˜). Table 4.2 outlinesa few commonly used notations for describing the complexity of an algorithm for a given input ofsize N˜ .Table 4.2: Notations in Algorithm ComplexityNotation NameO(1) ConstantO(N˜) LinearO(N˜2) QuadraticO(log(N˜)) LogarithmicNow that we have the required understanding about the complexity of an algorithm, we computethe complexity of Algorithm 2. In the given case, the size of the input is the total number of linearand/or quadratic pieces of the input PLQ function.Proposition 4.16. Algorithm 2 runs in linear time and space.Prior to proving Proposition 4.16, we require the following lemma.Lemma 4.17. If plqf has (N+1) pieces then plqf∗ has O(N) pieces.Proof. Since plq lft algorithm is developed to independently operate on (N + 1) pieces [LBT09,Table 2, Page 114] and has complexity of O(N) [GL13, Table 2, Page 182], therefore the size of theoutput plqf∗ cannot exceed O(N). As otherwise, the complexity of plq lft would be different.Now, we turn to the proof of Proposition 4.16.364.3. Numerical IllustrationsProof. Table 4.3 summarizes the complexity of the independent subroutines in Algorithm 2 asstated in [LBT09, Table 2, Page 114], [GL13, Table 2, Page 182] and the function description inScilab.Table 4.3: Core subroutines in Algorithm 2 and their complexityFunction Complexity Variable Descriptionplq check(plqf) O(N + 1) N + 1 = number of pieces in plqfplq isConvex(plqf) O(N + 1)plq lft(plqf) O(N + 1)plq min(plqf1,plqf2) O(N1 +N2) N1, N2 = number of pieces in plqf1, plqf2plq eval(plqf ,X) O((N + 1) + k˜) k˜ = number of points plqf is evaluated atNow, considering Algorithm 2 having input as a PLQ functionplqf =x0 a0 b0 c0x1 a1 b1 c1............xN−1 aN−1 bN−1 cN−1+∞ aN bN cNwe have the followingplq check(plqf) = O(N + 1) = O(N),plq isConvex(plqf) = O(N + 1) = O(N),plqf∗ = plq lft(plqf) = O(N + 1) = O(N),plq eval(plqf, x¯) = O(N + 1) = O(N),plq min(plqf∗,plql) = O(N˜ + 1).Notice, from Lemma 4.17 we haveplq min(plqf∗,plql) = O(N˜ + 1) = O(O(N) + 1) = O(N).Therefore, adding all the complexities of the subroutines we conclude Algorithm 2 has lineartime/space complexity.4.3 Numerical IllustrationsWe now present several examples for which we have computed the approximate subdifferentialset using Algorithm 2. The algorithm has been implemented in Scilab [Sci15]. In all of theseexamples f : R→ R ∪ {+∞}.Computing ∂f(x¯) for fixed x¯ and fixed > 0In this section, we compute and visualize ∂f(x¯) at an arbitrary point x¯ ∈ dom(f) and > 0for a 1-dimensional proper convex PLQ function f .374.3. Numerical IllustrationsExample 4.18. Consider the absolute value function, f(x) = |x| at x¯ = 0 and = 1. In PLQformat f is stored asplqf =[0 0 −1 0+∞ 0 1 0](={ −x, −∞ < x < 0x, 0 ≤ x < +∞).Using plq lft we obtainplqf∗(s) = −1 0 0 +∞1 0 0 0+∞ 0 0 +∞( = { 0, s ∈ [−1, 1]+∞, s /∈ [−1, 1]).Here l(s) = 1−f(0)+〈0, s〉 = 1, in PLQ format we have plql = [+∞ 0 0 1]. Next, we computeplq min(plqf∗,plql), we obtainplq min(plqf∗,plql) = −1 0 0 11 0 0 0+∞ 0 0 1( = { f∗(s), s ∈ [−1, 1]l(s), s /∈ [−1, 1]).Hence, we obtain ∂f(x¯) = [−1, 1] as visualized in Figure 4.5.Figure 4.5: Primal View depicts the graph of f(x) = |x| (red curve) along with the blackdashed lines passing through the point (x¯, f(x¯) − ) = (0,−1) (red dot) having slopes −1and 1 respectively (the lower and upper bounds of ∂f(x¯)). Dual View depicts the graphsof f∗(s) = ι{[−1,1]} + 0 (red curve) and l(s) (solid black line). The green curve showswhen the graphs of m(s) and f∗(s) coincide with f∗(s) = l(s) at s = −1, 1 (black dots)respectively.In 1-dimension, from Figure 4.5, we may geometrically interpret that the approximate subdif-ferential set consists of all possible slopes belonging to the interval [−1, 1], resulting in all possiblelines with the respective slopes passing through the point (x¯, f(x¯)− ) = (0,−1).In Examples 4.19 - 4.25 we provide fewer details, but the process is unchanged.Example 4.19. Consider f(x) =x24+ |x|, x¯ = 0 and = 1. This is stored asplqf∗ = −1 1 2 11 0 0 0+∞ 1 −2 1 .384.3. Numerical IllustrationsIn this case,plql =[+∞ 0 0 1] (= 1),andplq min(plqf∗,plql) =−2 0 0 1−1 1 2 11 0 0 02 1 −2 1+∞ 0 0 1(={f∗(s), s ∈ [−2, 2]l(s), s /∈ [−2, 2]).Thus, as visualized in Figure 4.6, ∂f(x¯) = [−2, 2].Figure 4.6: Primal and Dual Views of ∂f(x¯) with f(x) =x24+ |x|, x¯ = 0 and = 1.The next example examines the case of f(x) being composed of both linear and quadratic pieces.Example 4.20. Considerf(x) = x2, x < −2x/2 + 5, −2 ≤ x ≤ 2.5x2, 2.5 < x(= max{x2,x2+ 5}),x¯ = 0 and = 1. Here, l(s) = (−4). On computing we obtain,min(f∗(s), l(s)) ={f∗(s), s ∈ [0, 0.9],l(s), s /∈ [0, 0.9],so ∂f(x¯) = [0, 0.9] as visualized in Figure 4.7.394.3. Numerical IllustrationsFigure 4.7: Primal and Dual Views of ∂f(x¯) with f(x) =x2, x < −2,x/2 + 5, −2 ≤ x ≤ 2.5,x2, 2.5 < x,x¯ = 0 and = 1.Example 4.21. Letf(x) ={x2/2, x < 0,0, 0 ≤ x,x¯ = 0 and = 1. For the given case, as visualized in by Figure 4.8, ∂f(x¯) ≈ [−1.414, 0].Figure 4.8: Primal and Dual Views of ∂f(x¯) with f(x) ={x2/2, x < 0,0, 0 ≤ x, x¯ = 0 and = 1.The following example helps us envision the bounded-domain case.Example 4.22. Letf(x) = +∞, x < −2,−x, −2 ≤ x ≤ 2,+∞, 2 < x,x¯ = 0 and = 1. In this case, as visualized in by Figure 4.9, ∂f(x¯) = [−1.5,−0.5].404.3. Numerical IllustrationsFigure 4.9: Primal and Dual Views of ∂f(x¯) with f(x) =+∞, x < −2,−x, −2 ≤ x ≤ 2,+∞, 2 < x,x¯ = 0and = 1.For the case when x¯ = −2 and = 1 we obtain ∂f(x¯) = (−∞,−0.75] as portrayed by Figure4.10.Figure 4.10: Primal and Dual Views of ∂f(x¯) with f(x) =+∞, x < −2,−x, −2 ≤ x ≤ 2,+∞, 2 < x,x¯ = −2and = 1. The vertical line represents a slope of −∞.Example 4.23. For the (half) bounded-domain case consider,f(x) = 0, x < −2,x+ 2, −2 ≤ x ≤ 1,+∞, 1 < x.Let x¯ = 0 and = 1 we obtain ∂f(x¯) = [0.5, 2] as visualized in Figure 4.11.414.3. Numerical IllustrationsFigure 4.11: Primal and Dual Views of ∂f(x¯) with f(x) =0, x < −2,x+ 2, −2 ≤ x ≤ 1,+∞, 1 < x,x¯ = 0and = 1.Next we shall look into a few special cases.Example 4.24. (The indicator function at a single point) For x¯ = 0 and = 1 considerf(x) = ι{x¯} + 0 ={0, x = 0,+∞, x 6= 0.Here,plqf∗ =[+∞ 0 0 0] (= 0).Thus, we have ∂f(x¯) = (−∞,+∞) = R as visualized in Figure 4.12.Figure 4.12: Primal and Dual Views of ∂f(x¯) with f(x) ={0, x¯ = 0,+∞, x¯ 6= 0, x¯ = 0 (smallerred dot) and = 1.Example 4.25. Consider the linear function f(x) = 2x, x¯ = 0 and = 1.In this case the Fenchel Conjugate is the indicator function f∗(s) = ι{2} [LBT09, Page 110],hence424.3. Numerical Illustrationsplqf∗ =[2 0 0 0](= ι{2}).In the given case, ∂f(x¯) = {2}, which is in agreement with Lemma 3.7. Figure 4.13 visualizes thiscase.Figure 4.13: Primal and Dual Views of ∂f(x¯) with f(x) = 2x, x¯ = 0 and = 1.Computing ∂f(x¯) for varying x¯ and fixed > 0Now that we have a good understanding about ∂f(x¯). Let us have a look into the effect ofvarying x¯ on ∂f(x¯).Our goal in this section is to visualize the graph of ∂f(x) for a fixed > 0 and different choicesof x. In other words, for a given > 0 to envision ∂f(x) as a function of x.Example 4.26. We consider f(x) = |x|, = 1. As seen in Example 4.18, for x¯ = 0, ∂f(x¯) =[−1, 1]. Correspondingly, for the choices of x¯ = −2 and 0.75 we obtain ∂f(x¯) = [−1,−0.5] and∂f(x¯) ≈ [−0.333, 1], respectively, as visualized in Figures 4.14 and 4.15.434.3. Numerical IllustrationsFigure 4.14: Primal, Dual and Subgradient Views of ∂f(x¯) with f(x) = |x|, x¯ = −2 and = 1. The explanation of Primal and Dual Views are the same as in Figure 4.1. TheSubgradient View depicts the lower and upper bounds of ∂f(x) for x ∈ [−5, 5] (red curve).The green curve shows ∂f(−2).Note that, a similar explanation follows for the subsequent Figures 4.15, 4.16, 4.17 and 4.18.In Figure 4.14, the graph of ∂f(x) (red curve) as a function of x ∈ [−5, 5] with = 1 is sketchedby iteratively computing the respective lower and the upper bounds of ∂f(x) for 100 equally spacedpoints in the interval [−5, 5]. The green curve is drawn by joining the points (x¯, vl) = (−2,−1) and(x¯, vu) = (−2,−0.5) .A similar explanation follows for the subsequent Examples 4.27 and 4.28, unless otherwisestated.444.3. Numerical IllustrationsFigure 4.15: Primal, Dual and Subgradient Views of ∂f(x¯) with f(x) = |x|, x¯ = 0.75 and = 1.An animated visualization of ∂f(x¯) for the example is presented in the following Figure 4.16,that takes into account = 1 and the choices of x¯ as 50 equally spaced points between [−4, 4].454.3. Numerical IllustrationsFigure 4.16: Animated version of Primal, Dual and Subgradient Views of ∂f(x¯) withf(x) = |x|, x¯ ∈ [−4, 4] and = 1.Note, given > 0 the graph of ∂f(x) for x ∈ R in Figure 4.16 complies with its mathematicalequivalent [HUL93, Page 93]∂f(x) =[−1,−1− x], x < −/2,[−1, 1] , −/2 ≤ x ≤ /2,[1− x, 1], x > /2.We present a few additional examples to get some more intuition and visualization for ∂f(x)as a function of x, for a given > 0.Example 4.27. Consider = 1 and letf(x) =+∞, x < −6,−2x, −6 ≤ x ≤ 0,x2 − 2x, 0 < x ≤ 2,2x− 4, 2 < x ≤ 3,13x2 − 1, 3 < x.An animated visualization of ∂f(x¯) for the example is presented in the following Figure 4.17,that takes into account = 1 and the choices of x¯ as 50 equally spaced points between [−5, 4.5].464.3. Numerical IllustrationsFigure 4.17: Animated version of Primal, Dual and Subgradient Views of ∂f(x¯) withf(x) =+∞, x < −6,−2x, −6 ≤ x ≤ 0,x2 − 2x, 0 < x ≤ 2,2x− 4, 2 < x ≤ 3,13x2 − 1, 3 < x,x¯ ∈ [−5, 4.5] and = 1.Example 4.28. Consider = 1 and letf(x) = +∞, x < −2,−x, −2 ≤ x ≤ 2,+∞, 2 < x.An animated visualization of ∂f(x¯) for the example is presented in the following Figure 4.18, thattakes into account = 1 and the choices of x¯ as 50 equally spaced points between [−1.7, 1.8].474.3. Numerical IllustrationsFigure 4.18: Animated version of Primal, Dual and Subgradient Views of ∂f(x¯) withf(x) =+∞, x < −2,−x, −2 ≤ x ≤ 2,+∞, 2 < x,x¯ ∈ [−1.7, 1.8] and = 1.Computing ∂f(x¯) for fixed x¯ and varying > 0Since ∂f(x¯) also depends on > 0, we shall look into the effect of varying > 0 on ∂f(x¯) fora fixed x¯. In other words, for a fixed x¯ to envision ∂f(x¯) as a function of > 0.Example 4.29. Consider x¯ = 0 and letf(x) = 0, x < −2,x+ 2, −2 ≤ x ≤ 1,+∞, 1 < x.An animated visualization of ∂f(x¯) for the example is presented in the following Figure 4.19,that takes into account x¯ = 0 and the choices of > 0 as 50 equally spaced points between [0.1, 3].484.3. Numerical IllustrationsFigure 4.19: Animated version of Primal, Dual and Subgradient Views of ∂f(x¯) withf(x) =0, x < −2,x+ 2, −2 ≤ x ≤ 1,+∞, 1 < x,x¯ = 0 and ∈ [0.1, 3]. The explanation of Primal andDual Views are same as in Figure 4.1. The Subgradient View depicts the lower and upperbounds of ∂f(x¯) for ∈ [0.01, 3.2] (red curve). The green curve shows ∂f(0) for ∈ [0.1, 3].Remark 4.30. We observe that as → 0 then ∂f(x¯) → ∂f(x¯). Geometrically, as > 0 increasesthe graph of ∂f(x¯) dilates.49Chapter 5Conclusion and Future WorkThis chapter summarizes the work we have done by highlighting the research conducted onthe topic: Approximate Subdifferential of Piecewise-Linear Quadratic Functions particularly, 1-dimensional convex PLQ functions.As detailed in Chapter 4, one of the most rewarding outcomes of the research was to be ableto present a general algorithm that computes approximate subdifferential of any proper convexfunction. For the case of 1-dimensional convex PLQ functions, we were able to implement thealgorithm and visualize the desired outcome, serving to cogently address the concept. This mightseem a bit restrictive and leads us to a natural desire to extend the present algorithm to higherdimensions. However, such an attempt comes with a few challenges. In dimensions greater than1, the obvious question of how to visualize the results arise. Moreover, despite being able tocompute f∗(s) for bivariate convex PLQ functions [GL13], the minimum of two PLQ functionsis not necessarily representable in the PLQ data structure provided in the CCA library. As anexample in R2, consider the function g(s) = min(f∗(s), l(s)) with f∗(s) =12‖s‖2 − 1 and l(s) = 0.As visualized in Figure 5.1, dom(g) cannot be represented as a finite union of polyhedral sets.Hence, is not PLQ representable.210x-1-2-2-1y01-0.52.521.5130.502zFigure 5.1: Minimum of two quadratics may not always be PLQ representable.Another direction lies in developing an approach that covers a different class of functions makingit more widely applicable.In Section 4.3 we had an insight on the graph of ∂f(x) as a function of x drawn by computingapproximate subdifferential for the respective points and plotting it. This might seem computation-ally expensive in terms of memory usage and execution time. Thus, another element that may bestudied in future work could be to be able to identify points where the behaviour of ∂f(x) changes50Chapter 5. Conclusion and Future Workand obtain the graph by connecting the respective points. Similar research could be explored forthe case expressing ∂f(x) as a function of > 0. Moreover, in 1-dimension by handling the specialcases separately we may compute min(f∗(s),l(s)) in constant time by focusing on the end pieceswhich seems a logical advancement.51Bibliography[ABP13] A.Y. Aravkin, J.V. Burke, and G. Pillonetto. Sparse/robust estimation and kalmansmoothing with nonsmooth log-concave densities: Modeling, computation, and theory.The Journal of Machine Learning Research, 14(1):2689–2728, 2013. → pages 25[BGLS06] J.F. Bonnans, J.C. Gilbert, C. Lemare´chal, and C.A. Sagastiza´bal. Numerical Op-timization: Theoretical and Practical Aspects. Springer Science & Business Media,2006. → pages 6[BL10] J.M. Borwein and A.S. Lewis. Convex Analysis and Nonlinear Optimization: Theoryand Examples. Springer Science & Business Media, 2010. → pages 22, 24[CM09] S.L. Campbell and C.D. Meyer. Generalized inverses of linear transformations, vol-ume 56. SIAM, 2009. → pages 9, 11[DA90] R. Dembo and R. Anderson. An efficient linesearch for convex piecewise-linear/quadratic functions. In Advances in Numerical Partial Differential Equationsand Optimization: Proceedings of the Fifth Mexico-United States Workshop, vol-ume 47, pages 1–8. SIAM, 1990. → pages 25[GJL14] B. Gardiner, K. Jakee, and Y. Lucet. Computing the partial conjugate of convexpiecewise linear-quadratic bivariate functions. Computational Optimization and Ap-plications, 58(1):249–272, 2014. → pages 25[GL10] B. Gardiner and Y. Lucet. Convex hull algorithms for piecewise linear-quadraticfunctions in computational convex analysis. Set-Valued and Variational Analysis,18(3-4):467–482, 2010. → pages 25[GL13] B. Gardiner and Y. Lucet. Computing the conjugate of convex piecewise linear-quadratic bivariate functions. Mathematical Programming, 139(1-2):161–184, 2013.→ pages 25, 36, 37, 50[HP16] W.L. Hare and C. Planiden. Thresholds of prox-boundedness of PLQ functions. Jour-nal of Convex Analysis, 23(3):1–28, 2016. → pages 25[HUL93] J.B. Hiriart-Urruty and C. Lemare´chal. Convex Analysis and Minimization AlgorithmsII: Advanced Theory and Bundle Methods, vol. 306 of Grundlehren der mathematis-chen Wissenschaften. Springer-Verlag, New York, 1993. → pages 10, 11, 12, 13, 16,46[HUL12] J.B. Hiriart-Urruty and C. Lemare´chal. Fundamentals of Convex Analysis. SpringerScience & Business Media, 2012. → pages 31, 3252Chapter 5. Bibliography[HUL13] J.B. Hiriart-Urruty and C. Lemare´chal. Convex Analysis and Minimization AlgorithmsI: Fundamentals, volume 305. Springer Science & Business Media, 2013. → pages 5,9[HUMSV95] J.B. Hiriart-Urruty, M. Moussaoui, A. Seeger, and M. Volle. Subdifferential calculuswithout qualification conditions, using approximate subdifferentials: A survey. Non-linear Analysis: Theory, Methods & Applications, 24(12):1727–1754, 1995. → pages6, 16, 30[Lau05] A.J. Laub. Matrix Analysis for Scientists and Engineers. SIAM, 2005. → pages 14[LBT09] Y. Lucet, H.H. Bauschke, and M. Trienis. The piecewise linear-quadratic modelfor computational convex analysis. Computational Optimization and Applications,43(1):95–118, 2009. → pages 25, 26, 27, 28, 31, 36, 37, 42[Leo80] S.J. Leon. Linear Algebra with Applications. Macmillan New York, 1980. → pages11, 12[Luc] Y. Lucet. Computational convex analysis library, 1996–2016. http://atoms.scilab.org/toolboxes/CCA/. → pages 27[Mon11] J.F. Monahan. Numerical methods of statistics. Cambridge University Press, 2011.→ pages 9[RJ00] A. Rantzer and M. Johansson. Piecewise linear quadratic optimal control. AutomaticControl, IEEE Transactions on, 45(4):629–637, 2000. → pages 25[Roc88] R.T. Rockafellar. On the essential boundedness of solutions to problems in piecewiselinear quadratic optimal control. Analyse mathematique et applications. Gauthiervillars, pages 437–443, 1988. → pages 25[Roc15] R.T. Rockafellar. Convex Analysis. Princeton University Press, 2015. → pages 4, 17,31[RW86] R.T. Rockafellar and R.J.B Wets. A lagrangian finite generation technique for solvinglinear-quadratic problems in stochastic programming. In Stochastic Programming 84Part II, pages 63–93. Springer, 1986. → pages 25[RW09] R.T. Rockafellar and R.J.B. Wets. Variational Analysis, volume 317. Springer Science& Business Media, 2009. → pages 4, 6, 26, 31, 32[Sci15] Scilab:. Scilab. http://www.scilab.org/, 2015. → pages 27, 37[Tri07] M.J. Trienis. Computational convex analysis: From continuous deformation to finiteconvex integration. Master thesis, 2007. → pages 25, 2653
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Approximate subdifferential of piecewise linear-quadratic...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Approximate subdifferential of piecewise linear-quadratic functions Bajaj, Anuj 2016
pdf
Page Metadata
Item Metadata
Title | Approximate subdifferential of piecewise linear-quadratic functions |
Creator |
Bajaj, Anuj |
Publisher | University of British Columbia |
Date Issued | 2016 |
Description | Optimization is a branch of mathematics dealing with the selection of the best element(s) (based on some criteria) from a set of available options. In the study of optimization, we often come across problems involving functions that are not differentiable, thus, one cannot employ the tool of differentiation to solve or analyze such functions. Such an issue can be resolved with the help of subdifferentials. Subgradients generalize derivatives to nondifferentiable functions, which makes them one of the most useful and powerful instruments in nonsmooth optimization. However, they may come with a few limitations. Some limitations may be overcome by using approximate subdifferentials, which are a certain relaxation of true subdifferentials. In this thesis, we present a general algorithm to compute approximate subdifferential of any proper convex function. We then implement the algorithm for the family of convex univariate piecewise linear-quadratic functions - an important model class of functions for nonlinear systems. We provide many numerical examples. Finally, some directions and insights for future work are detailed. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2016-06-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0305102 |
URI | http://hdl.handle.net/2429/58318 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Irving K. Barber School of Arts and Sciences (Okanagan) Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan) |
Degree Grantor | University of British Columbia |
GraduationDate | 2016-09 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2016_september_anuj_bajaj.pdf [ 12.66MB ]
- Metadata
- JSON: 24-1.0305102.json
- JSON-LD: 24-1.0305102-ld.json
- RDF/XML (Pretty): 24-1.0305102-rdf.xml
- RDF/JSON: 24-1.0305102-rdf.json
- Turtle: 24-1.0305102-turtle.txt
- N-Triples: 24-1.0305102-rdf-ntriples.txt
- Original Record: 24-1.0305102-source.json
- Full Text
- 24-1.0305102-fulltext.txt
- Citation
- 24-1.0305102.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0305102/manifest