Computational Convex Analysis: From Continuous Deformation to Finite Convex Integration by Michael Joseph Trienis B.Sc., The University of British Columbia Okanagan, 2006 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in College of Graduate Studies (Interdisciplinary) The University Of British Columbia Okanagan January 1st, 2007 c Michael Joseph Trienis 2007 Abstract After introducing concepts from convex analysis, we study how to continuously transform one convex function into another. A natural choice is the arithmetic average, as it is pointwise continuous; however, this choice fails to average functions with different domains. On the contrary, the proximal average is not only continuous (in the epi-topology) but can actually average functions with disjoint domains. In fact, the proximal average not only inherits strict convexity (like the arithmetic average) but also inherits smoothness and differentiability (unlike the arithmetic average). Then we introduce a computational framework for computer-aided convex analysis. Motivated by the proximal average, we notice that the class of piecewise linear-quadratic (PLQ) functions is closed under (positive) scalar multiplication, addition, Fenchel conjugation, and Moreau envelope. As a result, the PLQ framework gives rise to linear-time and linear-space algorithms for convex PLQ functions. We extend this framework to nonconvex PLQ functions and present an explicit convex hull algorithm. Finally, we discuss a method to find primal-dual symmetric antiderivatives from cyclically monotone operators. As these antiderivatives depend on the minimal and maximal Rockafellar functions [5, Theorem 3.5, Corollary 3.10], it turns out that the minimal and maximal function in [12, p.132,p.136] are indeed the same functions. Algorithms used to compute these antiderivatives can be formulated as shortest path problems. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Moreau Envelope 1.5 Convex Optimization 1.6 Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.7 Continuity Notions 1.8 Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Proximal Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 iii Table of Contents 2.1 Arithmetic Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Continuity and Homotopy 2.3 Strict Convexity and Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Piecewise Linear-Quadratic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Numerical Methods for Convex Transforms . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Function Approximation 3.3 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 The Class of Piecewise Linear Functions . . . . . . . . . . . . . . . . . . . . 32 3.2.2 The Class of Piecewise Linear-Quadratic Functions . . . . . . . . . . . . . . 35 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Piecewise Linear-Quadratic Algorithms . . . . . . . . . . . . . . . . . . . . . 37 3.3.2 Extending the Piecewise Linear-Quadratic Algorithms . . . . . . . . . . . . . 45 3.3.3 Fast Algorithms for Convex Transforms . . . . . . . . . . . . . . . . . . . . . 55 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.1 Fast Algorithms for Convex Transforms . . . . . . . . . . . . . . . . . . . . . 58 3.4.2 Piecewise Linear-Quadratic Algorithms . . . . . . . . . . . . . . . . . . . . . 59 4 Finite Convex Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.1 Introduction 4.2 Cyclic monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Antiderivatives and their properties 4.4 Relationship between [12] and [5] 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.1 Minimal antiderivative in higher dimension . . . . . . . . . . . . . . . . . . . 69 4.4.2 Minimal antiderivative in one dimension . . . . . . . . . . . . . . . . . . . . 72 Linking Antiderivative Algorithms to Network Flow Problems . . . . . . . . . . . . 74 iv Table of Contents 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Appendices A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 v List of Tables 1.1 Extended-valued convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1 Special cases for the convex hull of a PLQ function . . . . . . . . . . . . . . . . . . . 48 vi List of Figures 1.1 Identifying a convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 The epigraph of x → sin(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Identifying a convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 The Moreau envelope of the absolute value function . . . . . . . . . . . . . . . . . . 12 1.5 A set subtangents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 First-order convexity characterization 1.7 A lower semi-continuous function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.8 The closed convex hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Averages of the linear function f0 (x) = x + 2 and the quadratic function f1 (x) = x2 . 2.2 The proximal averaging of two functions with different domains. . . . . . . . . . . . 26 2.3 Proximal averages of nonconvex functions . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1 A zeroth-order model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 A first-order model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 A second-order model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Multiplying a PLQ function by a scalar . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 The addition of two PLQ functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 The biconjugate of a nonconvex function . . . . . . . . . . . . . . . . . . . . . . . . . 46 . . . . . . . . . . . . . . . . . . . . . . . . . . 16 25 vii List of Figures 3.7 Back-tracking with the PLQ convex hull algorithm . . . . . . . . . . . . . . . . . . . 49 3.8 The convex hull of a piecewise ”quadratic-quadratic” function . . . . . . . . . . . . . 51 3.9 The convex hull of a piecewise ”quadratic-linear” function . . . . . . . . . . . . . . . 52 3.10 The convex hull of a piecewise ”linear-quadratic” function . . . . . . . . . . . . . . . 53 3.11 The convex hull of a piecewise ”linear-linear” function . . . . . . . . . . . . . . . . . 54 3.12 The limitations of discrete addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.13 Convergence of the discrete Legendre transform . . . . . . . . . . . . . . . . . . . . . 60 4.1 Finite convex integration is not unique up to a constant . . . . . . . . . . . . . . . . 62 4.2 Constructing a primal-dual symmetric method . . . . . . . . . . . . . . . . . . . . . 68 4.3 Graph associated with a system of difference constraints . . . . . . . . . . . . . . . . 75 4.4 Minimal and maximal antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 viii Acknowledgements I would like to thank my supervisor, Dr. Yves Lucet, for giving me the opportunity to pursue research in the area of optimization and convex analysis. His infinite patience and guidance were vital in every aspect of this work. As well, I sincerely thank Dr. Donovan Hare for being the first professor to spark my interest in optimization from a real world perspective, and to Dr. Heinz Bauschke, whose passion for mathematics inspires us all. I also give thanks to Liangjin Yao and Melisa Lavallee, who helped in revising my thesis. Everyone who attended the OCANA seminar provided invaluable knowledge and constructive criticism. This thesis was partially supported by the Pacific Institute for the Mathematical Sciences, the National Science and Engineering Research Council of Canada, and UBC Okanagan Internal Grant program. ix Introduction The main content of this thesis is derived from the papers [4], [5], and [19]. Contributions not contained in the previous papers include the extension to nonconvex PLQ functions, and the link with the all-pairs shortest path problem. We now summarize the contributions of each research paper. Firstly, article [4] states that the proximal average allows a parametric family of convex functions to continuously transform one convex function into another, even when the domains of the two functions do not intersect. The proximal average operator is also shown to be an homotopy with respect to the epi-topology. Moreover, the paper also shows that the parametric family inherits desirable properties such as differentiability and strict convexity. Next, article [19] presents a new computational framework for computer-aided convex analysis, and states that the class of piecewise linear-quadratic functions improves convergence and stability with respect to current models. A stable convex calculus is achieved by using symbolic-numeric algorithms to compute all fundamental convex transforms. The main result states the existence of efficient (linear-time) algorithms for the class of PLQ functions. These results are extended in the present thesis to nonconvex functions. Finally, article [5] discloses a new method which always produces primal-dual symmetric antiderivatives using Fitzpatrick functions and the proximal midpoint average. A link with the allpairs shortest path algorithms is given at the end of this thesis. We first introduce fundamental convex notations in Chapter 1, then summarize the main results 1 Acknowledgements of the paper [4] in Chapter 2. In Chapter 3, we sum up and expand on several contributions in [19], which include explicitly defining the (linear-time) PLQ algorithms and extending the framework for those algorithms to nonconvex functions. Chapter 4 provides some notes on primal-dual symmetric methods, and links the minimal and maximal antiderivatives in [5, Theorem 3.5, Corollary 3.10] with [12, p.132, p.136]. The last chapter concludes the thesis with a brief summary and suggestions for future research. 2 Chapter 1 Preliminaries In this section we recall basic notions from convex analysis. Throughout, we assume the reader has a basic knowledge in set theory as well as fundamental ¯ = R ∪ {+∞} with inner product < ·|· >, and results of calculus. We name X = X ∗ = Rd , and R norm · . We extend any proper function f : dom f → R with f¯(x) = f (x) ∞ if x ∈ dom f, if x ∈ dom f. We can recover the domain of the original function f from f¯ using the set dom f := {x : f¯(x) < +∞}. As our framework involves functions with infinite values, we need to acquire a convention for algebra that involves infinite values. Table 1.1 summarizes the convention which will be used throughout. For the sake of simplicity, we define positive and negative reals R−− := ]−∞, 0[ and R++ := ]0, +∞[. ¯ is proper if there is an x ∈ X such that f (x) < ∞. Definition 1.1. A function f : X → R 3 Chapter 1. Preliminaries Table 1.1: Let α ∈ R++ and β ∈ R−− . Then we Arithmetic α + ∞ = +∞ α − ∞ = −∞ β + ∞ = +∞ β − ∞ = −∞ +∞ + ∞ = +∞ 1.1 have the following extended-valued convention. Multiplication α(+∞) = +∞ α(−∞) = −∞ β(+∞) = −∞ β(−∞) = +∞ 0(+∞) = 0 0(−∞) = 0 Convex Sets Let x1 , x2 ∈ X. Then the point x is on the line segment [x1 , x2 ] if x is a convex combination of x1 and x2 i.e. [x1 , x2 ] = {x : λ1 + λ2 = 1, λ1 , λ2 ≥ 0 and x = λ1 x1 + λ2 x2 }. See Figure 1.1. Definition 1.2. A set C is convex if for any x1 , x2 ∈ C ⊂ X, x1 , x2 ∈ C ⇒ [x1 , x2 ] ⊆ C holds. The epigraph is a notion which relates convex sets to convex functions. Definition 1.3. The epigraph of a function f is the set of all points on or above the graph of f . That is, epi f = {(x, r) : x ∈ X, r ∈ R, r ≥ f (x)} . The above definition is illustrated on Figure 1.2. We can also describe a convex set C as the convex combination of all points in C. 4 Chapter 1. Preliminaries 4 3 2 1 0 −1 −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 Figure 1.1: If the points x1 , x2 ∈ C ⊂ R2 , and λ ∈ [0, 1] then λx1 + (1 − λ)x2 ∈ C. So C is a convex set. 2.0 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 −2.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Figure 1.2: The epigraph of x → sin(x) is the set of all points on and above the graph. 5 Chapter 1. Preliminaries 5.0 4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 Figure 1.3: The segment [(x, f (x)), (y, f (y))] is always above the graph of f . Fact 1.4. [20, Theorem 2.2] A set C is convex if and only if n ∀n ∈ N x1 , · · · , xn ∈ C we have n λi xi ∈ C for all λi ≥ 0 where i=1 1.2 λi = 1. i=1 Convex Functions We now define convex functions. ¯ is convex if dom f is a convex set and Definition 1.5. A function f : X → R f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ∀ x, y ∈ dom f , where λ ∈ [0, 1]. Geometrically, the inequality can be described as the line segment connecting (x, f (x)) and (y, f (y)) being always on or above the graph of the function f , as seen on Figure 1.3. The inequality 6 Chapter 1. Preliminaries becomes an equality when considering affine functions; therefore, affine functions are also convex. The notion of strict convexity is almost identical to the previous definition except that the inequality is strict. Definition 1.6. The function f is strictly convex if domf is convex and f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) ∀ x, y ∈ dom f , whenever x = y, and λ ∈ ]0, 1[ . Remark 1.7. Strict convexity is a stronger notion than convexity as all strictly convex functions are convex but the converse is false. A strictly convex function is a convex function without any linear parts. It is an important notion as strictly convex functions have at most one minimizer (see Figure 1.3). Notation 1.8. The set of all convex functions on X is denoted by Conv X. Notation 1.9. Any vector x ∈ X = Rd is identified as a column vector; the transpose, of x ∈ Rd×1 , is a new vector in R1×d denoted by xT . Finally, we denote ·, · as the standard dot product x, y = y T x. Definition 1.10. The n × n matrix M is positive-semidefinite if xT M x ≥ 0 for all x = 0. Furthermore, a matrix M is positive-definite if xT M x > 0 for all x = 0. Example 1.11. Quadratic functions f (x) = Ax, x + b, x + c with A positive-semidefinite, are convex. If A is positive definite, then f is strictly convex. Example 1.12. The exponential function f (x) = exp(x) for x ∈ R is strictly convex with no minimizer. Fact 1.13. [20, Theorem 4.1] The function f is convex, if and only if epi f is a convex set. 7 Chapter 1. Preliminaries Fact 1.14. [7, p.639] A function f : X → R is said to be closed if and only if epi f is closed. 1.3 Fenchel Conjugate The convex conjugate, also known as the Fenchel conjugate (we will refer to it simply as the conjugate) is an important operation in convex analysis. It is used as an intermediate transform for other more advanced operations like the Moreau envelope and the proximal average. ¯ The function f ∗ is defined as Definition 1.15. Let f : X → R. f ∗ (y) = sup { y, x − f (x)} . x∈dom f We notice that f ∗ is a convex function as the supremum of affine (convex) functions is a convex function. Definition 1.16. A set is closed if every limit point of the set is a point in the set. Proposition 1.17. [7, p.91] The function f ∗ is always convex and lsc, (see Definition 1.40) even if f is not convex. Proof. The conjugate is the pointwise supremum of a family of affine functions of y, therefore the epigraph of the conjugate is an intersection of a family of closed half spaces (closed convex sets). Since the intersection of closed convex sets is a closed convex set, the epigraph is a closed convex set. So the conjugate is lsc and convex. 8 Chapter 1. Preliminaries Example 1.18. The conjugate of a quadratic function f (x) = a0 x2 + b0 x + c0 , the conjugate is 1 2 4a0 (y − b0 ) − c0 f ∗ (y) = I{b } (y) − c0 0 ∞ if a0 > 0, if a0 = 0, if a0 < 0. The conjugate function has nice duality properties with respect to convexity. Fact 1.19. [20, Corollary 12.2] The biconjugate f ∗∗ = f , if and only if f is proper, convex and lsc. Example 1.20. Let f (x) = b0 x + c0 . Then f ∗ (y) = I{b0 } (y) − c0 . Taking the conjugate again we get f ∗∗ (x) = b0 x + c0 = f (x). ¯ and x ∈ dom f , Fenchel’s inequality Proposition 1.21. [20, p.105] Given a function f : X → R holds: p, x ≤ f (x) + f ∗ (p) ∀ p, x ∈ X . Proof. Directly from the definition of the Fenchel conjugate f ∗ (p) := supx { p, x − f (x)}, gives f ∗ (p) ≥ p, x − f (x). The energy function is the only function whose conjugate is itself. Fact 1.22. [20, p.106] The only self conjugate function is the energy function 1 2 · 2. Notation 1.23. The set of all closed convex functions on X is denoted by Conv X. As the conjugate function is always convex, taking the double conjugate yields the closed convex hull. Fact 1.24. [20, p.36] Let f be a proper function. Then co ¯ f = f ∗∗ . 9 Chapter 1. Preliminaries Proposition 1.25. [11, Corollary 1.4.4] Let f ∈ Conv X and x ∈ dom f . Then s, x = f (x) + f ∗ (s) ⇔ s ∈ ∂f (x) ⇔ x ∈ ∂f ∗ (s). (1.1) Proof. By definition of the conjugate we have − f ∗ (s) = inf [f (x) − s, x ]. x∈X (1.2) So the lsc proper convex function g: x → f (x) − s, x , achieves its infimum at x ˆ if and only if 0 ∈ ∂g(ˆ x) = ∂f (ˆ x) − s. That is, −f ∗ (s) = [f (ˆ x) − s, x ] if and only if s ∈ ∂f (ˆ x). Applying this same result to f ∗ , we obtain x ∈ ∂f ∗ (s) if and only if s, x = f ∗ (s) + f ∗∗ (x), which is again Equation (1.1) since f ∗∗ = f. 1.4 Moreau Envelope Let us first define an operation which is used in the Moreau envelope. ¯ Their infimal convolutions is defined Definition 1.26. Let f1 and f2 be two functions from X to R. 10 Chapter 1. Preliminaries by f1 ✷f2 (x) := inf{f1 (x) + f2 (x2 ) : x1 + x2 = x}, = inf [f1 (y) + f2 (x − y)]. y∈X Definition 1.27. [20, Theorem 31.5.] Let λ ∈ R++ and s ∈ X. Then the Moreau envelope, also called the Moreau-Yosida regularization is defined as Mλ (s) = f ✷ 1 · 2λ 2 (s) = inf f (x) + x∈X s−x 2λ 2 . (1.3) We summarize some of its key properties. Fact 1.28. (i.) [21, Theorem 1.25] Mλ converges pointwise to f as λ decreases to 0. (ii.) [21, Theorem 13.37] The Moreau envelope is smooth and continuous. (iii.) [18, p.2] The functions Mλ (x) and f (x) share the same critical points. It is also important due to its regularization properties, in particular, with nondifferentiable functions as seen in Figure 1.4. Using Equation (1.3) and expanding · 2 we obtain Mλ (s) = where gλ∗ (s) = sup { s, x − gλ (x)} and gλ (x) = s 2 1 − gλ∗ (s), 2λ λ x 2 2 (1.4) + λf (x) (see [17]). Remark 1.29. Formula (1.4) is important, as any algorithm used to compute the conjugate can also be used to compute the Moreau envelope. As the Moreau envelope is decomposed into conjugation, addition, and (positive) scalar multiplication, all algorithms and models need only to accommodate 11 Chapter 1. Preliminaries 2.0 1 1.8 1.6 0.75 1.4 1.2 0.5 1.0 0.8 0.25 0.6 0.4 0 0.2 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Figure 1.4: The Moreau envelope of f (x) = |x| is a smooth PLQ function. these operations. In other words, the Moreau envelope (regularization) depends entirely on the three operations in Equation (1.4). 1.5 Convex Optimization Duality theory in optimization associates another optimization problem (the dual) to a given problem (the primal). The primal problem is the initial model which typically consists of an objective function and a series of constraints. The conjugate plays a critical role in convex duality because it gives a dual representation (D) of the primal (P). Let h and k be proper, lsc and convex on Rn 12 Chapter 1. Preliminaries and Rm respectively. Then (P) inf {φ(x) : φ(x) := (D) sup {ψ(y) : ψ(y) := c, x + k(x) + h(b − Ax), and x ∈ Rn }, (1.5) b, y − h∗ (y) − k∗ (A∗ y − c), and y ∈ Rm }. (1.6) The dual problem clearly shows the importance of the conjugate function as Equation (1.6) relies directly on the conjugate of h and k. Remark 1.30. A version of Fenchel Duality Theorem involving the primal and dual problems, can be found in [20, Theorem 31.1]. The notion of duality has had a huge impact as some dual problems are much easier to solve than their primal counterparts. The primal and the dual are intrinsically linked to each other such that the solution to the dual provides insight into the solution of the primal. Under some conditions, the optimal value of both problems is the same. The indicator function of a convex set C ⊂ X is the bridge between functions and sets. It is defined as x → IC (x) = 0 if x ∈ C, +∞ otherwise. The indicator function will allow us to translate results on convex sets to convex functions. Example 1.31. In optimization we are concerned with the following problem minimize f over C. This problem can be equivalently rewritten as the unconstrained optimization problem minimize f + IC over X, 13 Chapter 1. Preliminaries 4 3 2 1 0 −1 −2 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Figure 1.5: A set of subtangents for smooth and non-smooth parts. as indicator functions act as infinite penalization. Definition 1.32. We say that a function is smooth if it is differentiable everywhere. In optimization we use critical points to locate minimum or maximum values. Definition 1.33. Let f be a differentiable function then a critical point is a point in the interior of the domain of the function where the derivative equals zero. As the gradient is undefined for nonsmooth functions, we require some notion of generalized differentiability; this is known as the subdifferential operator. 14 Chapter 1. Preliminaries 1.6 Subdifferential In order to properly describe the construction of the subdifferential operator, we need to define some other notions. Given a proper convex function f , we can construct a hyperplane f (x0 ) + s, x − x0 which passes through (x0 , f (x0 )) that always lies on or below the function f (seen in Figure 1.5). The slope of this supporting hyperplane is called a subgradient. Definition 1.34. The vector s is a subgradient of f at x0 ∈ dom f if and only if f (x) ≥ f (x0 ) + s, x − x0 ∀ x∈X . In the case where the function f has a corner at x0 , there may exist infinitely many supporting hyperplanes, and thus many subgradients. Figure 1.5 illustrates multiple subgradients at the corner (0, 0) and only a single subgradient otherwise. The subdifferential is simply the set of all possible subgradients at all points. Definition 1.35. The subdifferential of a proper function f at x is ∂f (x) = {s ∈ X : f (y) ≥ f (x) + s, y − x , ∀ y ∈ X} . It is constructed from hyperplanes which support epi f at (x, f (x)); therefore the subdifferential may be empty for nonconvex functions. Recall that the subdifferential is a generalization of the derivative and in the case when f is convex and smooth, {∇f (x)} = ∂f (x). Definition 1.36. We say that x ¯ is a critical point of f if 0 ∈ ∂f (¯ x). Remark 1.37. When considering the class of conjugate functions which have closed form solutions, one strategy is to determine if y, x − f (x) is concave (in term of x). If this is the case then the critical point will maximize y, x − f (x). This is the approach taken by the symbolic convex 15 Chapter 1. Preliminaries 5 4 3 2 1 0 −1 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 Figure 1.6: First-order convexity characterization. analysis toolkit (SCAT) package [9, p.72]. However, when no closed form exists we require numerical methods to find a solution. Convexity is characterized by the following first order condition. Observation 1.38. Assume f is differentiable at each point in dom f , and that dom f is open. Then the function f is convex, if and only if, dom f is convex and f (y) ≥ f (x) + ∇f (x), y − x ∀ x, y ∈ dom f . We can see from Figure 1.6 that the open circle is the point where the subtangent f (x) + f ′ (x)(y − x) passes through (x, f (x)), and the constraint f (y) ≥ f (x) + f ′ (x)(y − x) means that the subtangent must lie on or below the function f (y). The following is a second-order characterization of convexity. 16 Chapter 1. Preliminaries Fact 1.39. [20, Theorem 4.5] Assume f is proper, twice continuously differentiable at each point in dom f , where the dom f is open and convex. Then the function f is convex on dom f , if and only if, for all x ∈ dom f, ∇2 f (x) is positive semi-definite. If ∇2 f (x) is positive-definite for all x ∈ dom f , then f is strictly convex. 1.7 Continuity Notions Definition 1.40. A function f is lower semicontinuous (lsc) at x0 ∈ dom f if for every ε > 0, there exists a neighbourhood U of x0 such that f (x) > f (x0 ) − ε, for all x in U . This definition can be equivalently expressed as lim inf f (x) ≥ f (x0 ). x→x0 Lower semicontinuity is a weaker notion than continuity in the sense that continuity implies lower semicontinuity. In other words, if a function f is continuous at a point x0 then it is also lsc at x0 but the converse is not true, as seen in Figure 1.7. If a function f is lsc at every point in its domain then f is a lsc function. The notion of upper semicontinuous (usc) is analogous to that of lsc. Definition 1.41. The function f is usc at x0 ∈ dom f if for every ε > 0, there exists a neighborhood U of x0 such that f (x) < f (x0 ) + ε for all x in U . This definition can be equivalently expressed as lim sup f (x) ≤ f (x0 ). x→x0 17 Chapter 1. Preliminaries 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 −2.0 −2.5 −4 −3 −2 −1 0 1 2 3 4 Figure 1.7: The function is lsc at x0 = 0, and continuous otherwise therefore it is lsc everywhere. A function f is lsc if and only if −f is usc. Moreover, the notion of being continuous at x0 is equivalent to being both lsc and usc at x0 . Fact 1.42. [20, Theorem 7.1] A function f is lsc if and only if the epigraph of f is closed. Definition 1.43 (continuous function). Assume that xn , x0 belong to dom f . Then a real function f is continuous if for any sequence (xn ) such that lim xn = x0 , n→∞ it holds that lim f (xn ) = f (x0 ). n→∞ Equivalently, we can express the above definition as the following. 18 Chapter 1. Preliminaries Notation 1.44. For all ǫ > 0, there exists an σ > 0 such that |x − x0 | < σ ⇒ |f (x) − f (x0 )| < ǫ. Fact 1.45. [20, p.51] A function f is lsc and usc at x0 if and only if f is continuous at x0 . 1.8 Convex Hull Given a nonconvex function g, the convex hull (denoted by co) is found by taking the supremum of all convex functions which minorize g. Definition 1.46. If two functions f and g from X to R satisfy f (x) ≤ g(x) for all x ∈ X, we say that f minorizes g (on X). ¯ not identically +∞, be Fact 1.47. [11, Proposition 2.5.1, Proposition 2.5.2] Let g : X → R, minorized by an affine function: for some (s, b) ∈ X × R g(x) ≥ s, x − b ∀ x ∈ X . Then co g(x) := sup{h(x) : h ∈ Conv X, h ≤ g}. (1.7) Similarly, the closed convex hull of g is co g(x) := sup{h(x) : h ∈ ConvX, h ≤ g}. (1.8) The above definition of a closed convex hull is illustrated on Figure 1.8. The concept of convex functions is intrinsically linked to convex sets using the epigraph; therefore, we also define the convex hull for sets. If we are given any set S, not necessarily convex, we can construct a convex set by taking the intersection of all convex sets that contain S. 19 Chapter 1. Preliminaries Fact 1.48. [11, Proposition 1.3.4, Definition 1.4.1] The convex hull is the intersection of all convex sets which are supersets of S: co S := ∩{C : C is convex and contains S }. (1.9) The closed convex hull of a nonempty set S ⊂ X is the intersection of all closed convex sets containing S. It is also the intersection of all closed half-spaces containing S. We can also find the convex hull of a set of discrete points by finding the set of all convex combinations. Definition 1.49. A convex combination of x1 , · · · , xk is the point x = λ1 x1 + λ2 x2 + · · · + λk xk such that k i=1 λi = 1, and λi ≥ 0 for i = 1, · · · , k. Fact 1.50. [7, p.34] The convex hull of a finite set of points {x1 , · · · , xk } is co{x1 , · · · , xk } = {λ1 x1 + · · · + λk xk : λ1 + · · · + λk = 1, λi ≥ 0 for i = 1, · · · , k}. 20 Chapter 1. Preliminaries 4 3 2 1 0 −1 −2 −3 −6 −4 −2 0 2 4 6 Figure 1.8: The closed convex hull. 21 Chapter 2 Proximal Average Throughout this chapter we will assume f0 , and f1 belong to Conv X. 2.1 Arithmetic Average The main motivation for the proximal average is to determine how to continuously transform f0 into f1 . The first and most natural way is the arithmetic average function x → (1 − λ)f0 (x) + λf1 (x) ∀λ ∈ ]0, 1[ . The arithmetic average is the convex combination of f0 and f1 taken pointwise. In fact, when f0 and f1 are finite-valued the arithmetic average is pointwise continuous. Remark 2.1. We say that a function is finite if it does not take the values +∞ and −∞. Modern convex analysis uses extended-valued functions for many valid reasons. Indeed, many constrained optimization problems are often reformulated as unconstrained optimization problems by introducing indicator functions in the objective (see Example 1.31). As many functions (including indicator functions) consider infinite values, we encounter our first shortcoming: when we take the arithmetic average of two functions which do not have identical domains, the domain of the average may be empty. 22 Chapter 2. Proximal Average Example 2.2. Let f0 (x) := x ln(x) − x and f1 (x) := exp(x). Then the arithmetic average function is aλ (x) := (1 − λ)f0 (x) + λf1 (x) ∀λ ∈ ]0, 1[ . As illustrated by Figure 4.2(a), x → aλ (x) is only continuous on the interval dom f0 ∩ dom f1 = [0, +∞[. When dom f0 ∩ dom f1 = ∅ the arithmetic average is not even defined anywhere and will be exactly +∞ everywhere while the proximal average always has full domain. In contrast to the arithmetic average, let us consider the proximal average. The proximal average originates from the convex combination of two proximal maps, (which turns out to be a proximal map) Prox(fλ ) = (1 − λ) Prox(f0 ) + λ Prox(f1 ), where the proximal mapping of f ∈ Conv X defined by Prox(f )(x) = Argmin f (y) + x−y 2λ 2 (2.1) , is the set of minimizers of the Moreau envelope (see Equation (1.3)). Definition 2.3. The proximal average operator P is defined as P : Conv ×[0, 1] × Conv → Conv (f0 , λ, f1 ) → (1 − λ)(f0 + 1 · 2 2 ∗ ) + λ(f1 + 1 · 2 2 ∗ ∗ ) − 1 · 2 2 . The arithmetic average is convex as it is composed of operations which are convexity preserving. But it is unclear from the definition, whether the proximal average is convex, as the difference of convex functions is not always convex. 23 Chapter 2. Proximal Average Fact 2.4. [6] Set fλ = P(f0 , λ, f1 ). Then P(f0 , λ, f1 ) ∗ = P(f0∗ , λ, f1∗ ). Using the Biconjugate Theorem with Fact 2.4 and conjugating the proximal average twice, gives fλ∗∗ = P(f0 , λ, f1 ) ∗∗ = P(f0∗ , λ, f1∗ ) ∗ = P(f0∗∗ , λ, f1∗∗ ) = fλ . Thus, fλ ∈ Conv X. Fact 2.5. [4, Proposition 2.2] The following properties always hold for the proximal average operator: (i.) P(f0 , λ, f1 ) = P(f1 , 1 − λ, f0 ), (ii.) P(f0 , 0, f1 ) = f0 , (iii.) P(f0 , 1, f1 ) = f1 . Proposition 2.6. [4, Proposition 2.8] Let f ∈ Conv X, then P(f, 12 , f ∗ ) = 1 2 · 2. Proof. In fact, P(f, 12 , f ∗ ) ∗ = P(f ∗ , 12 , f ∗∗ ) = P(f ∗ , 1 − 12 , f ) = P(f, 12 , f ∗ ). As the conjugate of P(f, 12 , f ∗ ) is itself, by Fact 1.22 the proximal midpoint average P(f, 12 , 12 , f ∗ ) is 1 2 2.2 · 2. Continuity and Homotopy In contrast to the arithmetic average the proximal average is not pointwise continuous. 24 Chapter 2. Proximal Average Example 2.7. Let f0 (x) := I{b0 } (x), and f1 := I{b1 } (x). Then fλ (x) = I{(1−λ)b0 +λb1 } (x) + (1 − λ)λ b0 − b1 2 . 2 Let λ0 > 0 and set x0 = (1 − λ0 )b0 + λ0 b1 . Then for λ = λ0 we have fλ (x0 ) = +∞, but for λ = λ0 we obtain fλ (x0 ) = (1−λ)λ 2 b0 − b1 2 . So the proximal average operator λ → P(f0 , λ, f1 )(x) is not pointwise continuous. Figure 2.2(b) illustrates that there may exist some type of continuity with the proximal average. 9 9 1 8 7 1 8 7 0.75 6 0.75 6 5 5 0.5 4 0.5 4 3 3 0.25 2 0.25 2 1 0 0 1 0 0 −3 −2 −1 0 1 2 3 (a) From f0 to f1 : Arithmetic Average −3 −2 −1 0 1 2 3 (b) From f0 to f1 : Proximal Average Figure 2.1: Averages of the linear function f0 (x) = x + 2 and the quadratic function f1 (x) = x2 . Next, we quote several results from [4] that detail the setting in which the proximal average is continuous. Definition 2.8 (epi-convergence and epi-topology). Let g and (gn )n∈N be functions from X to ]−∞, +∞]. Then (gn )n∈N epi-converges to g, if the following properties hold for every x ∈ X: for every sequence (xn )n∈N in X converging to x, one has g(x) ≤ lim inf gn (xn ), n→+∞ there exists a sequence (xn )n∈N in X converging to x such that lim sup gn (xn ) ≤ g(x). n→+∞ 25 Chapter 2. Proximal Average The epi-topology is the topology induced by epi-convergence. Fact 2.9. [4, Theorem 4.2] If Conv X is equipped with the epi-topology, then the proximal average operator P : Conv X × [0, 1] × Conv X → Conv X is continuous. 20 2.5 1 15 1 2.0 1.5 0.75 0.75 1.0 10 0.5 0.5 0.5 0.0 5 0.25 0.25 −0.5 −1.0 0 0 0 −1.5 −3 −2 −1 0 1 2 3 4 −5 −4 −3 −2 −1 0 1 2 3 4 5 (a) Deforming f0 (x) = x ln(x) − x into f1 (x) = (b) Deforming f0 (x) = − ln(−x) into f1 (x) = exp(x). − ln(x). Figure 2.2: The proximal averaging of two functions with different domains. Fact 2.10. [4, Corollary 3.3] Let f0 ∈ Conv X and f1 ∈ Conv X. Then P(f0 , ·, f1 ) : [0, 1] → Conv X provides a homotopy between f0 and f1 . In fact, all functions in Conv X are homotopic for the epi-topology. Next, we define the domain of the proximal average operator. Fact 2.11. [6] Let C0 and C1 be nonempty closed convex subsets of X, and λ ∈ ]0, 1[. Set f0 := IC0 , 26 Chapter 2. Proximal Average f1 := IC1 , fλ := P(f0 , λ, f1 ). Then for every x ∈ X, fλ (x) = (1 − λ)λ inf 1 2 c0 − c1 2 : (1 − λ)c0 + λc1 = x, ci ∈ Ci . Consequently, dom fλ = (1 − λ)C0 + λC1 . Definition 2.12. The set C ⊆ X is the set of all x such that there exists a sequence xn ∈ C such that xn → x. Fact 2.13. [4, Theorem 2.13] For any f0 ∈ Conv X and f1 ∈ Conv X, dom fλ = (1 − λ) dom f0 + λ dom f1 , int dom fλ = int (1 − λ) dom f0 + λ dom f1 . The result was later refined in [3, Theorem 4.6] as the domain is the convex combination of the domains. As a consequence of Fact 2.13 the proximal average remedies the shortcomings of Example 2.2. This is illustrated on Figure 2.2. 2.3 Strict Convexity and Smoothness The proximal average inherits nice properties from the two functions it averages. Notation 2.14. We say a function has full domain if dom f = X = Rd is the whole space. Fact 2.15. [4, Theorem 3.2] Let f0 ∈ Conv X, f1 ∈ Conv X, λ ∈ ]0, 1[, fλ := P(f0 , λ, f1 ). Suppose that f0 or f1 has full domain, and that f0∗ or f1∗ has full domain. Then the following hold: (i.) Both fλ and fλ∗ have full domain. (ii.) If f0 or f1 is smooth (i.e. differentiable everywhere), then so is fλ . 27 Chapter 2. Proximal Average (iii.) If f0 or f1 is strictly convex and its Fenchel conjugate has full domain, then fλ is strictly convex. The arithmetic average preserves strict convexity if one function is strictly convex; however, if one function is not smooth then the arithmetic average will not be smooth. Remark 2.16. It is possible to find the proximal average of nonconvex functions; however, in order to avoid the degenerate cases of +∞ or −∞, each nonconvex function must be bounded below by at least one affine function. As a consequence of Section 3.3.2 we can now compute the proximal average of nonconvex functions. Figure 2.16 illustrates the proximal average between convex and nonconvex functions. 8 1.0 3 1 1 0.5 6 0.75 0.75 0.0 4 0.5 2 −0.5 0.5 −1.0 0.25 0 1 2 0.25 0.75 1 0 0.5 −1 0.25 −1.5 −2 0 −2 −6 −4 −2 0 2 (a) 4 6 0 −2.0 −5 −4 −3 −2 −1 0 1 2 3 4 5 0 −3 −6 −4 −2 (b) 0 2 4 6 (c) Figure 2.3: Proximal averages of nonconvex functions 28 Chapter 3 PLQ Model Definition 3.1. Let PLQ be the set of extended valued functions which are convex, lsc and proper with a piecewise linear subdifferential mapping ∂f : R → 2R . The set PLQ contains piecewise linear functions, piecewise quadratic functions and the sum of any such function with an indicator function. There are several standard operations which arise frequently in convex analysis. Definition 3.2. The standard convex operations are (positive) scalar multiplication, addition, conjugation, and regularization (Moreau envelope). Throughout we consider only these operations as benchmarks for algorithms and model accuracy. These operations provide a natural set of tools for convex problems, as each operation preserve convexity. Proposition 3.3. The following operations are convexity preserving: (positive) scalar multiplication, addition, conjugation, and regularization. Proof. (Positive) scalar multiplication and addition clearly preserve convexity. The conjugate is convexity preserving as it is always convex by Proposition 1.17. The Moreau envelope of a convex function f is convex by Formula (1.3). 29 Chapter 3. Piecewise Linear-Quadratic Model 3.1 Numerical Methods for Convex Transforms In convex analysis many transforms are difficult to compute or do not have a closed form. Example 3.4. [15, p.46] Let W (the Lambert function) be the inverse function of x → xex defined on the interval ]0, +∞[ . Then the conjugate of f (x) = x log(x) + x2 −x 2 is f ∗ (y) = 1 [W(ey )]2 + W(ey ). 2 As the conjugate of f cannot be written with standard functions (i.e. a combination of trigonometric or polynomial functions) , we use W to represent such functions (it is well known that W does not admit a closed form). Notation 3.5. Let I := {0, · · · , n}, be a finite set of ordered integers. Fast algorithms which efficiently approximate these transforms, depend on discretization of a continuous model. Discretization is the process of converting a continuous model into discrete counterparts. Example 3.6. The continuous model for the Fenchel conjugate is f ∗ (y) = sup[yx − f (x)], (3.1) x∈R and the discretization of this model is fn∗ (yj ) = max{yj xi − f (xi )}, j ∈ I. i∈I (3.2) 30 Chapter 3. Piecewise Linear-Quadratic Model The main idea in approximating the continuous model is to evaluate the discrete model over a grid and not on a single point. Many fast algorithms utilize the fact that the domain to approximate the operation and the domain on which the function is defined, is known prior to computation. This prior knowledge allows for efficient computation and can lead to a decrease in time complexity. Remark 3.7. Computing grids for fast algorithms requires prior knowledge of not only the domain of f but also the domain of the resulting function. For example, the function f (x) = − log(x) is defined only on the positive reals. The framework is the same for all fast algorithms. First, we discretize our model and compute the transform using an efficient algorithm. Convergence results from Section 3.4 allow us to recover the continuous model. 3.2 Function Approximation Given a discrete set of points {(xi , f (xi ))}i∈I sampled from a function f (typically the function is not known) such that x0 < x1 < x2 < · · · < xn , what is the closest approximating function f˜ (from a well defined class of functions) that passes through our discrete set of points? The process used to solve this problem is known as interpolation. When considering the class of piecewise polynomial functions there are various orders of approximation which correspond to the order of differentiability used in the model (approximation). Example 3.8. The following implications describe the orders of approximation: Zeroth-order Model First-order Model Second-order Model ⇒ f (xi ) = f˜(xi ), ⇒ f (xi ) = f˜(xi ) and f ′ (xi ) = f˜′ (xi ), ⇒ f (xi ) = f˜(xi ), f ′ (xi ) = f˜′ (xi ), and f ′′ (xi ) = f˜′′ (xi ). 31 Chapter 3. Piecewise Linear-Quadratic Model Remark 3.9. The word model and approximation can be used interchangeably. 3.2.1 The Class of PL Functions The class of piecewise linear (PL) functions accommodates a variety of models of finite order. The following models take a discrete set of points and output a continuous piecewise linear approximation of the function f . Consider the zeroth-order model f˜0 (x) := max[fi + si (x − xi )], (3.3) i where si = fi+1 −fi xi+1 −xi and fi = f (xi ) = f˜0 (xi ). The model f˜ is convex, lsc, and piecewise linear as Equation (3.3) takes the maximum of a finite set of affine functions. Moreover, f˜0 ≥ f as it is an inner approximation of the epigraph (see Figure 3.1). Lemma 3.10. Let f ∈ PLQ such that dom f := {x : x0 ≤ x ≤ xn }. Then, for x ∈ dom f f (x) ≤ max i∈{0,··· ,n−1} f (xi ) + f (xi+1 ) − f (xi ) (x − xi ) . xi+1 − xi (3.4) Proof. Let h := max i∈{0,··· ,n−1} f (xi ) + f (xi+1 ) − f (xi ) (x − xi ) . xi+1 − xi First we show that f (xi ) = h(xi ) for i = 0, . . . , n. Since f is convex we have f (x1 ) − f (x0 ) f (x2 ) − f (x1 ) f (xn ) − f (xn−1 ) ≤ ≤ ··· ≤ . x1 − x0 x2 − x1 xn − xn−1 (3.5) Then denote li (x) := f (xi ) + f (xi+1 ) − f (xi ) (x − xi ) xi+1 − xi ∀i∈I . 32 Chapter 3. Piecewise Linear-Quadratic Model Now we have li (xi+1 ) = f (xi+1 ) = li+1 (xi ). Using Equation (3.5) and lk (xi+1 ) ≤ li (xi+1 ) if k ≤ i, lk (xi+1 ≤ li (xi+1 ) if k ≥ i, we obtain h(xi+1 ) = max lk (xi+1 ) = f (xi+1 ). 0≤k≤n By Definition 1.5 all segments which join any two points on a convex function f must lay on or above f . Hence, there exists an index i0 such that x ∈ [xi0 , xi0 +1 ], as xi < xi+1 . There also exists λ such that x = λxi0 + (1 − λ)xi0 +1 where λ ∈ [0, 1]. Then f (xi0 ) + f (xi0 +1 ) − f (xi0 ) f (xi0 +1 ) − f (xi0 ) (x − xi0 ) = f (xi0 ) + (λxi0 + (1 − λ)xi0+1 − xi0 ), xi0 +1 − xi0 xi0 +1 − xi0 f (xi0 +1 ) − f (xi0 ) = f (xi0 ) + (1 − λ)(xi0 +1 − xi0 ), xi0 +1 − xi0 = f (xi0 ) + (1 − λ)[f (xi0 +1 ) − f (xi0 )], = λf (xi0 ) + (1 − λ)f (xi0 +1 ) ≥ f (λxi0 + (1 − λ)xi0 +1 ) = f (x). Consider the first-order model f˜1 (x) := max[fi + gi (x − xi )], i (3.6) where gi ∈ ∂f (xi ). This model is also convex, continuous, and piecewise linear as Equation (3.6) is 33 Chapter 3. Piecewise Linear-Quadratic Model 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 f(x) 0.2 f0(x) 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Figure 3.1: The zeroth-order model takes the maximum of a set of affine functions. taking the maximum of a finite set of linear functions as seen on Figure 3.2. The main difference between f˜0 and f˜1 is that this model is an outer approximation of the epigraph with f˜1 ≤ f . Lemma 3.11. Let f ∈ PLQ and gi ∈ ∂f (xi ). Then f (x) ≥ max i∈{0,··· ,n} f (xi ) + gi (x − xi ) ∀x∈R . (3.7) Proof. From Definition 1.34, we have f (x) ≥ f (xi ) + gi (x − xi ). Therefore the supremum of these minorizing functions will also minorize f . Theorem 3.12. Let f ∈ PLQ, then f˜1 (x) ≤ f (x) ≤ f˜0 (x). Proof. This is a direct consequence of Lemma 3.10, and Lemma 3.11. The problem with the zeroth-order and first-order models is that the class of piecewise linear 34 Chapter 3. Piecewise Linear-Quadratic Model 2.0 1.5 1.0 0.5 0.0 f(x) f1(x) −0.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Figure 3.2: The first-order model takes the maximum of a set of subtangents. (PL) functions is not closed under standard convex operations. Example 3.13. The following example is motivated by the proximal average. Let f1 (x) = x. Then f1∗ (y) = I{1} (y) =⇒ f1∗ 1 · 2 2 (y) = (y − 1)2 . 2 So a simple combination of standard convex operations results in a function which is quadratic. Therefore all quadratic pieces cannot be computed exactly, we have to rely on convergence of piecewise linear approximations. 3.2.2 The Class of PLQ Functions The class of piecewise linear-quadratic (PLQ) functions remedies the shortcomings of Example 3.13. In fact, the class of PLQ functions is closed under all the standard convex operations. 35 Chapter 3. Piecewise Linear-Quadratic Model 12 10 8 6 4 2 f(x) 0 f2(x) −3 −2 −1 0 1 2 3 Figure 3.3: Second-order model of the function f (x) = x4 4 . Proposition 3.14. The class of PLQ functions is closed under (positive) scalar multiplication, addition, conjugation, and regularization. Proof. It is clear that the class of PLQ functions is closed under (positive) scalar multiplication and addition. The conjugate of a PLQ function f is a PLQ function. We notice that the graph of ∂f ∗ is piecewise linear, and is symmetric (with respect to the line y = x) to the graph of ∂f . Therefore f ∗ is PLQ [21, p.484]. As a consequence of Equation (1.4) and Remark 1.29, the class of PLQ functions is closed under the Moreau envelope operation. The PLQ class is robust enough to approximate any convex function using a zeroth-order, first-order or even second-order model. An example of a second order model is seen in Figure 3.3. 36 Chapter 3. Piecewise Linear-Quadratic Model Remark 3.15. It is important that we use a higher-order model because as the order increases, the approximating function deviates less from the actual function. In fact, as the degree of the Taylor series rises, the approximating function is closer to the correct function in a neighborhood of the given discrete set of points (see Subsection 3.4). 3.3 Algorithms The objective of this section is to provide a detailed comparison between Fast and PLQ algorithms (see [16–19]). Moreover, we intend to discuss the advantages and drawbacks of each algorithm with respect to the standard convex operations. We will also provide examples and detailed steps which will emphasize the main idea behind each algorithm. In order to compare algorithms we need a measure of efficiency. Let us consider time-complexity and space-complexity as the main benchmarks for practical use. The time and space complexity of an algorithm can be determined by counting the number of times we look at each point and its corresponding memory allocation. Moreover, we can verify the complexity analysis by increasing the size of input and checking the rate at which the CPU time or memory allocation increases. From here we can categorize an algorithm into certain classes of efficiency. 3.3.1 PLQ Algorithms Not only does the class of PLQ functions provide closure for the standard convex operations but the class also gives rise to linear-time and linear-space algorithms. Moreover, these algorithms do not require prior knowledge of the domain (unlike fast algorithms). They provide exact solutions when performing standard convex operations on PLQ functions. An efficient way to store the PLQ function is a PLQ matrix. To illustrate, the unbounded PLQ 37 Chapter 3. Piecewise Linear-Quadratic Model function f (x) := a0 x2 + b0 x + c0 a1 x2 + b1 x + c1 if x ≤ x0 , if x0 < x ≤ x1 , .. . an−1 x2 + bn−1 x + cn−1 an x2 + bn x + cn (3.8) if xn−2 < x ≤ xn−1 , otherwise. is stored as a matrix x0 a0 b0 c0 x1 a1 b1 c1 . . . . .. .. .. . plqf := .. xn−1 an−1 bn−1 cn−1 +∞ an bn cn (3.9) The bounded-domain version of f (x) is stored as 0 x0 0 +∞ x1 a1 b1 c1 . .. .. .. . plqf := . . . . . xn−1 an−1 bn−1 cn−1 +∞ 0 0 +∞ Half-bounded functions are stored similarly. Remark 3.16. The indicator function f (x) := I{x0 } (x) of a single point x0 is a special case which is stored as the row vector plqf := [x0 0 0 + ∞]. Definition 3.17. Let PLQE denote the set of all lsc extended-valued functions which are PLQ and 38 Chapter 3. Piecewise Linear-Quadratic Model continuous on the interior of their domains. Remark 3.18. The class of functions in PLQE is just the set of functions in PLQ extended to nonconvex functions. A PLQ matrix can only store functions in PLQE due to the following limitations. Example 3.19. The usc function f (x) = x2 if x < 0, x2 + 1 if x ≥ 0, cannot be stored as a PLQ matrix as x2 1 0 0 0 ⇔ +∞ 1 0 1 x2 + 1 if x ≤ 0, if x > 0. This limitation is due to the fact that the data structure is not robust enough to represent all combinations of inequalities and strict inequalities. Similarly, functions which are neither lsc nor usc cannot be stored as a PLQ matrix. Example 3.20. A discontinuous function f (x) = x2 x2 + 1 if x < 0, if x > 0, does not fit into our model as the quadratic, x2 for x < 0 is defined by a strict inequality. Another limitation of the model is that we cannot explicitly store vertical lines. 39 Chapter 3. Piecewise Linear-Quadratic Model Example 3.21. The PLQ operator {−1} f (x) = [ − 1, 1] {1} if x < 0, if x = 0, if x > 0, cannot be represented by PLQ matrix due to the vertical line at x = 0. Remark 3.22. The limitations of the PLQ matrix in Example 3.19 and 3.20 could be remedied by adding a point between each interval. Also, in Example 3.21 we could store vertical lines implicitly: store all nonvertical lines, and complete the graph by continuity/maximality. We will not consider such models in the present thesis. Next we present a linear-time algorithm for each standard convex operation in Definition 3.2, where each algorithm only considers functions from the set PLQ. (Positive) Scalar Multiplication (PLQ Algorithm) Scalar multiplication of a PLQ function amounts to multiplying each piece of a PLQ function by a scalar (see Figure 3.4). As multiplying each piece takes constant time, multiplying n pieces will take n steps; hence the algorithm runs in linear-time. Let λ ∈ [0, +∞[. Then a0 b0 c0 x λa0 λb0 λc0 x 0 0 x1 a1 b1 c1 λa1 λb1 λc1 x1 . . . . . . . . .. .. .. . .. .. .. . λ ⊗ .. = . xn−1 an−1 bn−1 cn−1 xn−1 λan−1 λbn−1 λcn−1 +∞ an bn cn +∞ λan λbn λcn 40 Chapter 3. Piecewise Linear-Quadratic Model 7 6 5 4 3 2 1 0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Figure 3.4: The dashed line is the function resulting from multiplying a PLQ function by a scalar. Notation 3.23. The symbol ⊗ denotes PLQ multiplication, and is not the usual matrix multiplication. Addition (PLQ Algorithm) Now we will present an algorithm to compute the sum of two PLQ functions (which could be extended to the sum of a finite number of PLQ functions). Definition 3.24. The partition points are the xi of the first column in the PLQ matrix (3.9). Moreover, the set of partition points is the whole column vector. Algorithm 3.25. The algorithm sorts all the partition points on the x-axis (visualize an x-y plane) where along each interval [xi , xi+1 ] we compute the sum of two quadratic (or linear) pieces which are active in [xi , xi+1 ]. 41 Chapter 3. Piecewise Linear-Quadratic Model The addition of two PLQ functions is not trivial unless both functions have identical sets of partition points as in Example 3.26. Example 3.26. x x a0 b0 c0 d0 e0 f0 a0 + d0 b0 + e0 c0 + f0 x 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⊕ = . xn−1 an−1 bn−1 cn−1 xn−1 dn−1 en−1 fn−1 xn−1 an−1 + dn−1 bn−1 + en−1 cn−1 + fn−1 +∞ an bn cn +∞ dn en fn +∞ an + dn bn + en cn + fn Remark 3.27. We use the symbol ⊕ to denote the addition of two PLQ functions, which is not the usual matrix addition. In the extreme case, if both PLQ matrices (or functions) have disjoint sets of partitioning points, the matrix resulting from the addition operation will have the union of both sets as partition points (see Example 3.28). Example 3.28. The addition of the following PLQ matrices (or PLQ functions) is shown in Figure 3.5. −1 −1 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 ⊕ 2 = 1 2 +∞ 1 0 0 +∞ 1 0 −2 +∞ 2 0 0 1 0 1 1 . 1 1 0 2 0 −2 1 0 So the general case requires us to keep track of each partitioned interval and determine which quadratic (or linear) pieces are added together. As we are adding at most n + m quadratic (or linear) pieces together the running time is 42 Chapter 3. Piecewise Linear-Quadratic Model 14 12 10 8 6 4 2 0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 Figure 3.5: The addition of two PLQ matrices (or functions) with disjoint sets of partitioning points results in the ”dashed” function. 43 Chapter 3. Piecewise Linear-Quadratic Model O(n + m), where n, (resp. m) is the size of the sets of partitioning points for the first (resp. second) function. Conjugation (PLQ Algorithm) Next we present an algorithm which finds the conjugate of a function f (x) ∈ PLQ. Visualizing f (x) as Equation (3.8) we consider a pair of quadratic functions from f (x): f¯(x) := ϕi (x) := ai x2 + bi x + ci if xi−1 < x ≤ xi , ϕi+1 (x) := ai+1 x2 + bi+1 x + ci+1 if xi < x ≤ xi+1 . It is convex if and only if ϕi (xi ) = ϕi+1 (xi ) ϕ′i (xi ) ≤ ϕ′i+1 (xi ) ai , ai+1 ≥ 0 (continuity criterion), (increasing ”slopes” criterion ), (convexity criterion). From these assumptions, we can compute the conjugate directly as f¯∗ (y) = ϕ∗ (y) i if ϕ′i (xi−1 ) < y ≤ ϕ′i (xi ), ′ ∗ ′ y¯(y − ϕi (xi )) + ϕi (ϕi (xi )) ∗ ϕi+1 (y) if ϕ′i (xi ) ≤ y ≤ ϕ′i+1 (xi ), (3.10) if ϕ′i+1 (xi ) < y ≤ ϕ′i+1 (xi+1 ), where y¯ := ϕ′i+1 (xi )xi − ϕi+1 (xi ) − ϕ′i (xi )xi + ϕi (xi )) . ϕ′i+1 (xi ) − ϕ′i (xi ) The middle case (which only arises when the function f¯(x) is nonsmooth at xi ) is an affine function 44 Chapter 3. Piecewise Linear-Quadratic Model bridging the graph of ϕ∗i (y) with ϕ∗i+1 (y). Computing f¯∗ (y) can be done in linear-time and space since the computation of ϕ∗i (y) and ϕ∗i+1 (y) can be performed directly (see Example 1.18). Remark 3.29. The conjugate in Equation 3.10 was computed directly using the Maple package ”symbolic convex analysis toolkit” (SCAT) [9]. As f¯(x) is convex we can compute the conjugate supy x, y − f¯(x) by solving for the critical point of the inner function y, x − f¯(x) and substituting it back into the conjugate equation. Algorithm 3.30. The general strategy is to iteratively take pairs of quadratic (or linear) functions from f (x) (from left to right, consecutively) and apply the explicit closed form of the conjugate operation from Equation (3.10) to f¯(x). One drawback of Equation (3.10) is that we assumed f¯(x) is convex. Therefore the PLQ conjugate and Moreau envelope algorithms are restricted to convex functions. The next section removes the convexity assumption. 3.3.2 Extending the PLQ Algorithms We recall the fact that the PLQ conjugate, and Moreau envelope algorithms are restricted to convex functions. From Formula (1.4), and Remark 1.29 we need only to extend the conjugate algorithm to nonconvex functions. In fact, the relationship between the Fenchel conjugate and the convex hull provides some insight into conjugating a nonconvex PLQ function. Remark 3.31. The class of functions in PLQE will be use the same data structure as in the set PLQ. Fact 3.32. [11, p.44] For any function f minorized by at least one affine function, the biconjugate of f is the closed convex hull: f ∗∗ = co f . 45 Chapter 3. Piecewise Linear-Quadratic Model 4 3 2 1 0 −1 −2 −3 −6 −4 −2 0 2 4 6 Figure 3.6: The biconjugate (f ∗∗ ) yields the closed convex hull. As the Fenchel conjugate and convex hull depend on the pointwise supremum of a family of affine functions, the Fenchel conjugate is exactly the conjugate of the convex hull (see Facts 3.32 and 1.47). Theorem 3.33. For any proper function f , we have f ∗ = (co f )∗ and f ∗∗∗ = f ∗ . Proof. The function f ∗ is lsc and convex so we have f ∗∗∗ = f ∗ . The conjugate of Fact 3.32, yields f ∗∗∗ = (co f )∗ . Since f ∗∗∗ = f ∗ we have f ∗ = (co f )∗ . So regardless whether we take the conjugate of the convex hull or the conjugate itself, the conjugate function will be the same. Therefore, the PLQ conjugate algorithm can be extended to nonconvex functions by first taking the convex hull. 46 Chapter 3. Piecewise Linear-Quadratic Model Convex Hull Now we present an algorithm which finds the convex hull of a function f (x) in PLQE. Let f (x) be defined as Equation (3.8). Then iteratively take pairs of PLQ functions f¯(x) := ϕi (x) = ai x2 + bi x + ci ϕi+1 (x) = ai+1 x2 + bi+1 x + ci+1 if xi−1 < x ≤ xi , if xi < x ≤ xi+1 , from f (x) (i.e. for i = 1, . . . , n), and determine if they are nonconvex. The result below is important, but we omit its proof as it is trivial. Theorem 3.34. The PLQ function f¯(x) is nonconvex if at least one of the following conditions hold. I. ai < 0 (Concavity Criterion), II. ai+1 < 0 (Concavity Criterion), III. ϕi (xi ) < ϕi+1 (xi ) (Discontinuity Criterion), IV. ϕi (xi ) > ϕi+1 (xi ) (Discontinuity Criterion), V. ϕ′i (xi ) > ϕ′i+1 (xi ) (Decreasing ”Slope” Criterion). Remark 3.35. Although we have restricted our framework to the class of functions which are continuous (see the set PLQE). We still consider cases III. and IV. as a more general setting for the PLQ convex hull algorithm. First, we address several special cases. 47 Chapter 3. Piecewise Linear-Quadratic Model Table 3.1: The following special cases arise when taking the convex hull of f (x). Case 1. a0 < 0 or an < 0 co f (x) = −∞ Case 2. a0 ≥ 0 and an ≥ 0 Case 2.a. a0 ≥ 0 and an > 0 co f (x) is PLQ Case 2.b. a0 > 0 and an ≥ 0 co f (x) is PLQ Case 2.c. a0 = 0 and an > 0 co f (x) is PLQ Case 2.d. a0 > 0 and an = 0 co f (x) is PLQ Case 2.e a0 = an = 0 and b0 > bn co f (x) = −∞ Case 2.f a0 = an = 0 and b0 ≤ bn co f (x) is PLQ Remark 3.36. The above special cases arise from the fact that the convex hull requires at least one supporting affine function which is finite. If we cannot find at least one, then the convex hull is exactly minus infinity. Theorem 3.37. Let f (x) ∈ PLQE, and be defined as Equation (3.8). Then the convex hull has several special cases co f (x) = −∞ −∞ −∞ PLQ if a0 < 0, if an < 0, if a0 = an = 0 and ϕ′0 (x0 ) > ϕ′n (xn−1 ), otherwise. Proof. Table 3.1 considers all possible cases concerning the first and last quadratics of f , which determine if f (x) can be supported by an affine function. Algorithm 3.38. If f (x) falls under one of the special cases in Theorem 3.37 we stop with co f (x) = −∞. As the unbounded concave functions are noted in Theorem 3.37 we address the case of bounded concave functions. 48 Chapter 3. Piecewise Linear-Quadratic Model If f (x) contains any bounded concave functions (i.e. satisfying I or II in Theorem 3.34 with xi−1 , and xi+1 finite) then replace each concave part by an affine function interpolating the endpoints (See Equation (3.11) in Solution 3.43). Next, iteratively grab pairs of functions f¯(x) from f (x) and determine if they are ”quadratic-quadratic”, ”quadratic-linear”, ”linear-quadratic” or ”linearlinear”. Then proceed to the appropriate Problem (resp. 3.39, 3.41, 3.40, or 3.43). If the pair turns out to be nonconvex by one of the criteria in Theorem 3.34, then we backtrack by decrementing the index i and repeating the above process. The algorithm is complete when there no longer exists any nonconvex pairs by Theorem 3.34. 4.0 4.0 4.0 3.5 3.5 3.5 3.0 3.0 3.0 2.5 2.5 2.5 2.0 2.0 2.0 1.5 1.5 1.5 1.0 1.0 1.0 0.5 0.5 0.5 0.0 0.0 −0.5 0.0 −0.5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 −5 −4 −3 −2 (a) −1 0 1 2 3 4 5 −5 −4 −3 −2 (b) 0 1 2 3 4 5 (c) 4.0 4.0 3.5 3.5 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 −1 0.0 −0.5 −0.5 −5 −4 −3 −2 −1 0 (d) 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 (e) Figure 3.7: The above sequence of images, demonstrate the required steps in computing the convex hull of a PLQ function. We see from Figure (c) that it is necessary to back-track at most once per iteration to ensure a convex function. Indeed a single pass may not produce a convex function, and as a result the algorithm looks at each partition point at most twice. Hence, the algorithm takes at most 2n steps and runs in linear time. 49 Chapter 3. Piecewise Linear-Quadratic Model Problem 3.39 (quadratic-quadratic). Given two quadratic (or linear) functions ϕi and ϕi+1 with domain [xi−1 , xi+1 ], partitioned at xi , solve the following system for xα and xβ . ϕ′i (xα ) = ϕ′i+1 (xβ ) ϕ′i (xα )(x − xα ) + ϕi+1 (xα ) = ϕ′i+1 (xβ )(x − xβ ) + ϕi+1 (xβ ) ∀ x ∈ R xi−1 ≤ xα ≤ xi xi ≤ xβ ≤ xi+1 . If there exists a solution, then the convex hull is ϕi (x) co f¯(x) = ϕ′ (xα )(x − xβ ) + ϕi+1 (xβ ) i ϕi+1 (x) if xi−1 ≤ x < xα , if xα ≤ x ≤ xβ , if xβ < x < xi+1 . If the above system of equations has no solution, we solve Problem 3.41. 50 Chapter 3. Piecewise Linear-Quadratic Model 5 4 3 2 1 0 −1 −4 −3 −2 −1 0 1 2 3 4 Figure 3.8: The ”dashed” function is an instance of co f¯(x), and the partition points xα and xβ are where the affine function ϕ′i (xα )(x − xβ ) + ϕi+1 (xβ ) bridges the gap between the left quadratic ϕi (x), and the right quadratic ϕi+1 (x). Note that we do not need to consider the bounded and unbounded cases as we can always find an affine function minorizing f¯(x) and supporting both quadratic pieces. Problem 3.40 (quadratic-linear). Given two quadratic (or linear) functions ϕi and ϕi+1 joined at xi with domain [xi−1 , xi+1 ], find an xα such that ϕ′i (xα )(x − xα ) + ϕi (xα ) = ϕ′i (xα )(x − xi+1 ) + ϕi (xi+1 ), ∀ x ∈ R xi−1 ≤ xα ≤ xi . 51 Chapter 3. Piecewise Linear-Quadratic Model If a solution exists then the convex hull is co f¯(x) = ϕi (x) if xi−1 ≤ x < xα , ϕ′i (xα )(x − xi+1 ) + ϕi (xi+1 ) if xα ≤ x ≤ xi+1 . If the above system of equations has no solution, we solve Problem 3.40. 2.0 2.5 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 −0.5 −0.5 −2 −1 0 1 2 3 4 5 −1.0 −2.0 −1.5 −1.0 (a) −0.5 0.0 0.5 1.0 1.5 2.0 (b) Figure 3.9: The ”dashed” function is an instanceco f¯(x) and the partition point xα is where the affine functions ϕ′i+1 (xα )(x − xi−1 ) + ϕi (xi−1 ) on the right joins the quadratic ϕi+1 (x) on the left. The two graphs (a) and (b) represent the difference between a bounded and unbounded convex hull. If xi+1 is finite then our graph will look like (a) but if xi+1 = +∞ then it will look like (b). Problem 3.41 (linear-quadratic). Given two quadratic (or linear) functions ϕi and ϕi+1 defined on the interval [xi−1 , xi+1 ] and joined at xi , find an xβ such that ϕ′i (xβ )(x − xi−1 ) + ϕi (xi−1 ) = xi ≤ xβ ϕ′i+1 (xβ )(x − xβ ) + ϕi+1 (xβ ) ∀ x ∈ R, ≤ xi+1 . 52 Chapter 3. Piecewise Linear-Quadratic Model If we find a solution to the above equations the convex hull is co f¯(x) = ϕ′i+1 (xβ )(x − xi−1 ) + ϕi (xi−1 ) ϕi+1 (x) if xi−1 ≤ x ≤ xβ , if xβ < x ≤ xi+1 . If the above system of equations has no solution, we solve Problem 3.43. 2.0 2.5 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 −0.5 −0.5 −5 −4 −3 −2 −1 0 1 2 (a) −1.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (b) Figure 3.10: The ”dashed” function is an instance co f¯(x), and the partition point at xβ is where the affine function ϕ′i (xβ )(x − xi+1 ) + ϕi+1 (xi+1 ) on the left joins the quadratic ϕi (x) on the right. The two graphs (a) and (b) represent the difference between a bounded and unbounded convex hull. If xi−1 is finite then our graph will look like (a) but if xi−1 = −∞ then it will look like (b). Notation 3.42. Assume we have a bounded PLQ function f¯(x) discontinuous at xi . Then cont f¯(x) returns the continuous version of f¯(x). The function f¯(x) is discontinuous by case III or IV of Theorem 3.34. These cases are addressed below. Problem 3.43 (linear-linear). If f¯(x) satisfies III of Theorem 3.34, then the function cont f¯(x) 53 Chapter 3. Piecewise Linear-Quadratic Model is the line interpolating (xi , ϕi (xi )) and (xi+1 , ϕi+1 (xi+1 )). Hence, cont f¯(x) = ϕi (x) if xi−1 ≤ x ≤ xi , ϕi+1 (xi+1 )−ϕi (xi ) (x − xi ) + ϕi (xi ) xi+1 −xi if xi < x ≤ xi+1 . If f¯(x) satisfies IV of Theorem 3.34, then the function cont f¯(x) is the line interpolating (xi−1 , ϕi (xi−1 )) and (xi , ϕi+1 (xi )). Hence, cont f¯(x) = ϕi+1 (xi )−ϕi (xi−1 ) (x − xi ) + ϕi (xi ) xi+1 −xi ϕi+1 (x) if xi−1 < x ≤ xi , if xi < x ≤ xi+1 . If f¯(x) satisfies V of Theorem 3.34, then the function co f¯(x) is the line interpolating (xi−1 , ϕ(xi−1 )) and (xi+1 , ϕ(xi+1 )). Hence, co f¯(x) = ϕi+1 (xi+1 ) − ϕi (xi−1 ) (x − xi−1 ) + ϕi (xi−1 ) xi+1 − xi−1 1 1 0 0 −1 −1 −2 −2 −3 −3 −4 −4 −5 −5 if xi−1 ≤ x ≤ xi+1 . (3.11) 3.0 2.5 2.0 1.5 1.0 0.5 0.0 −6 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 (a) 0.0 0.5 1.0 1.5 2.0 −6 −3.0 −0.5 −2.5 −2.0 −1.5 −1.0 −0.5 (b) 0.0 0.5 1.0 1.5 2.0 −1.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (c) Figure 3.11: The ”dashed” curves represent different instances of the function co f¯(x). Figure (a) is the convex hull when the function f¯(x) is bounded from the right. Figure (b) is the convex hull when f¯(x) is bounded from the left, and Figure (c) is the convex hull when f¯(x) is bounded from both sides. 54 Chapter 3. Piecewise Linear-Quadratic Model 3.3.3 Fast Algorithms for Convex Transforms Let us recall fast algorithms, and compare them with PLQ algorithms using the same standard convex operations as in Definition 3.2. (Positive) Scalar Multiplication (Fast Algorithm) Multiplying a function by a (positive) scalar, amounts to evaluating a function on a grid and multiplying each sample value by the scalar value. Notation 3.44. The value X[i] is the ith term in the grid X. For example if X := (5, 8, 9, 2) then X[1] = 5. Moreover, if we are given a function f (x) := x2 for x ∈ R, then f (X[1]) = 25, and f (X) = (X)2 = (25, 64, 81, 4). Remark 3.45. In the following Fast algorithms, we do not require a uniform grid and use it for the sake of simplicity. Algorithm 3.46. Let X := {a + i b−a : i = 1, 2, · · · , n} be a discrete grid on the interval [a, b] n with n points. Then scalar multiplication amounts to multiplying each value f (X[i]) by a scalar λ. Fast scalar multiplication runs in linear-time and is an equivalent algorithm to PLQ scalar multiplication in the sense that they both use the same set of steps. Addition (Fast Algorithm) Adding two continuous functions numerically calls for the evaluation of each function, and computing the sum of their values. The addition of two grids is not necessarily straightforward. In order to add two functions evaluated on different grids, we need to model our discrete set of points and compute the sum along interpolated values. 55 Chapter 3. Piecewise Linear-Quadratic Model 300 250 200 150 100 50 0 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure 3.12: The limitations of discrete addition. The addition of two functions on disjoint grids is shown in Figure 3.12. The function marked by ”crosses” is f1 (X1 ) = (X1 )2 − 2(X1 ) + 1, and the function marked by ”circle-crosses” is f2 (X2 ) = 10(X2 )2 . They are evaluated on the grids X2 := {−5 + i 5+5 5 {−5 + i 5+5 15 : i = 1, 2, · · · , 5}, and X1 := : i = 1, 2, · · · , 15} respectively. The addition of f1 (X1 ) with f2 (X2 ) is the curve marked by ”stars”. When considering needle functions, any grid points which do not match up, sum to positive infinity. The extreme case is when the two grids are disjoint; then the resulting function is exactly positive infinity. Conjugation (Fast Algorithm) One of the first fast algorithms used to approximate the conjugate was the Fast Legendre Transform (FLT) algorithm [8, 13]. It was introduced to numerically approximate the conjugate of a function 56 Chapter 3. Piecewise Linear-Quadratic Model with a log-linear worst-case complexity. A more efficient algorithm, conveniently named the Lineartime Legendre transform (LLT) improved the complexity to linear-time [14]. The Linear-Time Legendre Transform (LLT) is a fast algorithm which takes as input the sets X, Y , S where Y [i] is an approximation of f (X[i]). The input is a set of points and slopes in R. The output is a set Zm where Z[j] is an approximation of f ∗ (S[j]). The LLT algorithm can be considered a black box which computes the conjugate at various points in space. Computing the approximation f ∗ (S[j]) involves two different steps depending on the convexity of f . Step 1. Assuming f is nonconvex and X[i] is increasing, we use the Beneath-Beyond algorithm (see [2]) to compute the convex hull of the planar set (X[i], Y [i]) in linear-time. After obtaining the convex hull, we discard all points inside the convex hull and proceed to Step 2. Step 2. Assuming f is convex (otherwise complete Step 1), we compute the slopes C := ci = yi+1 − yi xi+1 − xi ∀i∈I . Finding a solution for Equation (3.2) is equivalent to finding the supremum of all affine functions that support f . As we are given a set of slopes to approximate the Fenchel conjugate, we need only to find the support point (or where the function is maximized). Using the fact that f is convex and therefore has increasing slopes, we only need to find index i such that C[i] ≤ S[j] ≤ C[i + 1], as c1 ≤ c2 ≤ · · · ≤ cn . Thus, X[i] is the supporting point and maximizes i → S[j]X[i] − Y [i] for slope S[j]. As a result we have an approximation of the conjugate Z[j] = S[j]X[i] − Y [i] ≈ f ∗ (S(j)), 57 Chapter 3. Piecewise Linear-Quadratic Model at S(j). For further information see [15, p.27]. Remark 3.47. The LLT algorithm utilizes the fact that the supremum of Formula (3.1) is attained if f is convex and x ∈ dom f and s ∈ ∂f (x). See Proposition 1.25. 3.4 Convergence We recall the definition of pointwise convergence. Definition 3.48. Suppose {fn } is a sequence of functions with common domain. If lim fn (x) = f (x) n→∞ holds for all x ∈ dom f then the sequence {fn } converges pointwise to f . 3.4.1 Fast Algorithms for Convex Transforms (Convergence) All fast algorithms depend on discretizing a continuous model in order to approximate a convex operation. Since fast algorithms output a discrete set of data points, all fast algorithms implicitly rely on a zeroth order model. In order for these computational algorithms to converge, we need to either increase the number of points or enlarge the domain. For example, when we consider the discrete Legendre transform in Equation (3.2), we have the following convergence results [19, p.3]: Notation 3.49. For the sake of simplicity we consider equidistant points: X := (a + i b−a : i = 1, 2, · · · , n). n We define the discrete approximation of f as fX with linear interpolation defining fX at points not in X. 58 Chapter 3. Piecewise Linear-Quadratic Model Notation 3.50. The function fn is the function f evaluated on a grid X := {a + i (b−a) : i= n 1, 2, . . . , n} where the n corresponds to the size of the grid X. Moreover, fn is the n − th term in a sequence of functions (fn ). Fact 3.51. Assume f : R → R ∪ {+∞} is proper. Convergence on a bounded domain (see [8, 13, 14]). ∗ (s) := sup (i) If f is usc on [a, b], then fn∗ converges pointwise to f[a,b] x∈[a,b] [sx − f (x)]. (ii) If f is twice continuously differentiable on an open interval containing [a, b], then ∗ max |f[a,b] − fn∗ | ≤ [a,b] 1 (b − a)2 max f ′′ . 2 n2 [a,b] Convergence on unbounded domains (see [10, 14]) The following equivalence holds for any s ∈ R, and any a > 0: ∗ (s) = f ∗ (s). ∂f ∗ (s) ∩ [−a, a] = ∅ ⇔ f[−a,a] 3.4.2 PLQ Algorithms (Convergence) Let us recall the fact that PLQ algorithms provide exact solutions within the class of PLQ functions, and that functions outside this class must be modeled via PL or PLQ approximation. Theorem 3.52. Assume f is proper, modeled by a PL approximation f˜n , and satisfies condition (i) or (ii) of Fact 3.51. Then f˜n∗ converges pointwise to f ∗ . Proof. Assume f˜n is modeled via PL approximation, then it falls under the same framework as the LLT algorithm. As consequence of Fact 3.51, f˜n converges pointwise to f ∗ on both bounded and unbounded domain. 59 Chapter 3. Piecewise Linear-Quadratic Model 3.0 2.0 dom=[−1,1] 2.5 dom=[−2,2] 1.8 dom=[−3,3] f* 1.6 1.4 2.0 1.2 1.5 1.0 0.8 1.0 0.6 0.4 N=3 N=5 N=9 f* 0.5 0.2 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (a) Convergence by enlarging the domain: the conjugate of f (x) := |x| converges to I[−1,1] (x) the indicator function of the interval [−1, 1] 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (b) Convergence by decreasing the grid spacing: the discrete conjugate of the function f (x) := x2 /2 converges to f ∗ = f . Figure 3.13: Convergence of the discrete Legendre transform. 60 Chapter 4 Finite Convex Integration This chapter provides additional examples and details for [12]. Throughout we will also describe and illustrate the relationship between the two papers, [12] and [5]. As notations differ, our purpose is to explain both styles in an effort to provide additional perspectives and to aid understanding. 4.1 Introduction Notation 4.1. Define I := {0, 1, . . . , n}. We are concerned with the following problem. Problem 4.2. Given a finite family {(xi , x∗i )}i∈I ⊂ X × X ∗ . find a lsc convex function f : X → R such that x∗i ∈ ∂f (xi ) ∀i ∈ I or determine no such function exists. Consider an operator A where gra A := {(x, x∗ ) ∈ X × X ∗ : x∗ ∈ Ax}. The finite family {(xi , x∗i )}i∈I is interpreted as the graph of a finite operator. Finite operators which have certain properties can admit antiderivatives. Definition 4.3. The function f is an antiderivative of A if gra A ⊆ gra ∂f . 61 Chapter 4. Finite Convex Integration 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −3 −2 −1 0 1 2 3 −3 −2 (a) ∂f1 (x) −1 0 1 2 3 (b) ∂f2 (x) Figure 4.1: Finite convex integration is not unique up to a constant. The continuous integration problem is well known to be unique up to a constant. For example, the antiderivative of the continuous function f (x) = x is the function F (x) = 1 2 2x + K, where K ∈ R. However, the finite integration problem may have many solutions. Example 4.4. Let gra A = {(−1, −2), (0, 0), (1, 2)} and f (0) = 0 (initial condition). Then both f1 (x) = x2 and f2 (x) = 2|x| are convex, lsc antiderivatives as they satisfy gra A ⊆ gra ∂f1 (x) and gra A ⊆ gra ∂f2 (x) (see Problem 4.2 and Figure 4.2). Hence, given λ0 , we define the solution set F as the set of all lsc convex functions that satisfy f (x0 ) = λ0 and x∗i ∈ ∂f (xi ) ∀i ∈ I. Altogether, f ∈F⇔ f is lsc, convex, f (x0 ) = λ0 , and x∗i ∈ ∂f (xi ) ∀i ∈ I. Next we discuss the relationship between cyclic monotonicity and the existence of antiderivatives. 62 Chapter 4. Finite Convex Integration 4.2 Cyclic monotonicity Finite operators can have a special property called cyclic monotonicity (CM), which is necessary for an operator to admit a convex antiderivative. Definition 4.5. The family {(xi , x∗i )}i∈I is said to be CM if for all j0 , j1 , . . . , jk , jk+1 ∈ I with j0 = jk+1 , the inequality x∗jl , xjl+1 − xjl ≤ 0 l=0,...,k holds. Proposition 4.6. For every convex lsc function, the subdifferential operator ∂f is CM. Proof. Take x∗i ∈ ∂f (xi ) for all i ∈ I. We have f (xj1 ) − f (xj0 ) ≥ x∗j0 , xj1 − xj0 , f (xj2 ) − f (xj1 ) ≥ .. . x∗j1 , xj2 − xj1 , f (xjk ) − f (xjk−1 ) ≥ x∗jk−1 , xjk − xjk−1 , f (xjk+1 ) − f (xjk ) ≥ x∗jk , xjk+1 − xjk . Adding up all inequalities and setting xjk+1 = xj0 , the LHS becomes [f (xj1 ) − f (xj0 )] + [f (xj2 ) − f (xj1 )] + · · · + [f (xjk ) − f (xjk−1 )] + [f (xj0 ) − f (xjk )] = 0. As a result, we get x∗jl , xjl+1 − xjl ≤ 0. l=0,...,k 63 Chapter 4. Finite Convex Integration Definition 4.7. A multivalued mapping ρ from X to X is said to be monotone if x1 − x0 , x∗1 − x∗0 ≥0 holds for every (x0 , x∗0 ) and (x1 , x∗1 ) in the graph of ρ. This definition corresponds to the case where k = 1 from the definition of CM. Thus, if ρ is a CM mapping then ρ is a monotone mapping. Geometrically, monotonicity describes a function which is ”non-decreasing”. This means that the graph of the function is either ”increasing” or ”flat”. However, when the dimension of X is greater than one, there exists monotone mappings which are not cyclically monotone (for an example, see [20, p.240]). Remark 4.8. The most efficient way to verify that a finite operator is cyclically monotone, is to use an algorithm (such as [12, p.140]) to compute an antiderivative. If the algorithm cannot find an antiderivative then the given operator is not cyclically monotone. However, if it does find a convex antiderivative, this implies that the operator is cyclically monotone. Note that the algorithm stated in [12, p.140] runs in O(n3 ) time. 4.3 Antiderivatives and their properties A CM operator may admit many antiderivatives using a variety of different methods. Notation 4.9. We say that mA is a method m : A → Conv X applied to an operator A, and mA is the resulting antiderivative. These antiderivatives may have a variety of different properties. Definition 4.10. Let A be a finite CM operator with gra A containing the points (ai , a∗i )i∈I . The 64 Chapter 4. Finite Convex Integration method m is primal-dual symmetric if its resulting antiderivative mA satisfies m∗A = mA−1 . Definition 4.11. [5, Theorem 3.5] Let A be a CM operator with gra A containing n points (ai , a∗i ). Then the Rockafellar function is defined by RA,(a1 ,a∗1 ) (x) = a∗1 , a2 − a1 + · · · + a∗n−1 , an − an−1 + a∗n , x − an . max (4.1) (a2 ,a∗2 )∈graA, .. . (an ,a∗n )∈gra A Remark 4.12. We refine the above definition to be more precise. Let n+1 ai+1 − ai , a∗i ≤ 0, where an+2 = a1 σ := i=1 Then gra A contains at most n points, and there exists integers k and l such that ak = al with 1 ≤ k < l ≤ n + 1 and σ := σ1 + σ2 with l−1 ai+1 − ai , a∗i , and σ1 := i=k n+1 k−l ai+1 − ai , a∗i + σ2 := i=l ai+1 − ai , a∗i . i=1 Since σ1 ≤ 0 we have σ ≤ σ2 . If we repeat the above argument k times we obtain σ ≤ σ2 ≤ · · · ≤ σk . 65 Chapter 4. Finite Convex Integration As a result, we now have RA,(a1 ,a∗1 ) (x) = max a∗1 , a2 − a1 + · · · + a∗n−1 , an − an−1 + a∗n , x − an . (4.2) (a2 ,a∗2 )∈graA, .. . (an ,a∗n )∈gra A such that (ar , a∗r ) = (as , a∗s ) for r = s. Definition 4.13. Let A be a CM operator, and let a ∈ dom A. Then we set RA,a := RA,(a,a∗ ) where a∗ is an arbitrary point in Aa. ¯ is an intrinsic antiderivative if Definition 4.14. Let A be CM. We say that a function f : X → R gra A ⊆ gra ∂f and f depends only on gra A. The following example is not an intrinsic method (i.e. does not always produce intrinsic antiderivatives). Example 4.15. Let e ∈ X be such that e = 1 and define A via gra A := {(−e, −e), (e, e)}. Then for every x ∈ X, we have RA,−e (x) = max − x|e − 1, x|e − 3 = −2 + | x|e − 1| (4.3) x|e − 1, − x|e − 3 = −2 + | x|e + 1|. (4.4) and RA,e (x) = max Consequently, RA,e ≥ RA,−e and RA,e ≤ RA,−e . 66 Chapter 4. Finite Convex Integration There are several types of intrinsic methods when the graph of A is finite. Example 4.16. Let A be cyclically monotone such that gra A is finite. Then the two methods max (a,a∗ )∈gra A RA,(a,a∗ ) , and (a,a∗ )∈gra A 1 R ∗ nA A,(a,a ) produce intrinsic methods of A (nA is the number of points in gra A). Let us consider a process to build a primal-dual symmetric method from any intrinsic antiderivative. Definition 4.17. Let f0 , f1 ∈ F, then the proximal midpoint average of f0 and f1 is the function 1 P(f0 , , f1 ) := 2 1 2 f0 + 1 2 · 2 ∗ + 1 2 f1 + 1 2 · ∗ 2 ∗ − 1 2 · 2 . (4.5) Fact 4.18. [5, Corollary 4.17] Let m be a method which produces an intrinsic method with full domain. Then P mA , 12 , m∗A−1 and P mA−1 , 12 , m∗A are both primal-dual symmetric antiderivatives for A and A−1 respectively. Moreover, they have full domain. The importance of primal-dual symmetric methods can be illustrated by the following diagram. A m mA = P mA , 12 , m∗A−1 ∗ −−−−→ ∗ A−1 m −−−−→ mA−1 = P mA−1 , 12 , m∗A 67 Chapter 4. Finite Convex Integration In general (within our framework), we always have primal-dual symmetry at the discrete level: x∗i ∈ A(xi ) ⇔ xi ∈ A−1 (x∗i ); however, there is no reason to expect this type of symmetry at the continuous level, i.e. m∗A = mA−1 . It turns out that when the antiderivatives mA , and m∗A−1 are intrinsic methods, the proximal midpoint average P mA , 12 , m∗A−1 produces an antiderivative which is primal-dual symmetric at the continuous level [5, Example 4.19 and Example 4.20]. 4 4 3 3 2 2 1 1 0 0 −1 −1 −3 −2 −1 0 1 2 3 (a) Convex antiderivatives. −3 −2 −1 0 1 2 3 (b) The subdifferential. Figure 4.2: Constructing a primal-dual symmetric method. In Figure 4.2(a), the family gra A := (a, exp(a)) : a ∈ {−1, − 12 , 0, 12 , 1} is denoted by the ”circle-cross” marks on the function f (x) = exp(x). The ”dashed” function and the ”dasheddotted” function are two antiderivatives mA , and m∗A−1 , respectively. As we are interested in finding an antiderivative between these two functions, we take the proximal midpoint average P mA , 12 , m∗A−1 represented by the ”thick” function. This function is primal-dual symmetric. We see from Figure 4.2(b) that the ”thick” function is ∂P mA , 12 , m∗A−1 and is an antiderivative (as gra A ⊂ gra ∂P). Any convex antiderivative will have a monotone subdifferential, hence ∂P mA , 12 , m∗A−1 lies within the ”dashed” rectangles. 68 Chapter 4. Finite Convex Integration 4.4 4.4.1 Relationship between [12] and [5] Minimal antiderivative in higher dimension Fact 4.19. [12, p.132] Let Γ be the set of all ordered subsets J = {j0 , · · · , jk } of I with j0 = 0, and for all r = s, jr = js . Then, for each J ∈ Γ, we define the function fJ (x) := x∗j0 , xj1 − xj0 + · · · + x∗jk−1 , xk − xjk−1 + x∗jk , x − xjk (4.6) on X. Then for a given λ0 and x0 , the minimal antiderivative f − is defined as f − (x) := λ0 + max fJ (x). J∈Γ (4.7) The Rockafellar function RA,(a,a∗ ) is a special method that produces minimal antiderivatives. Remark 4.20. We say that an antiderivative is minimal if it is the greatest piecewise linear function which bounds all other antiderivatives from below. Similarly, the maximal antiderivative is the smallest piecewise linear function which bounds all other antiderivatives from above. Fact 4.21. [5, Theorem 3.5] Let A be CM and a ∈ dom A. Then RA, (a,a∗ ) = min{f ∈ F : f is an antiderivative of A and f (a) = 0}. As both f − and RA,(a,a∗ ) are minimal antiderivatives, they must be equal. Transformation 4.22. Let Γ be the collection of all subsets J = {j0 , · · · , jk } of I such that 69 Chapter 4. Finite Convex Integration andj0 = 0, and for all r = s, jr = js . Let us rename the following indexes and variables: J = {j0 , j1 , · · · , jk } := {1, 2, · · · , n}, xi := ai for i = 1, · · · , n − 1, x∗i := a∗i for i = 1, · · · , n − 1, x0 := a1 , and x∗0 := a∗1 . Theorem 4.23. Assume that A is a CM operator and that λ0 = 0 = f − (a). Then RA,(a,a∗ ) = f − , where f − ∈ F and (a, a∗ ) ∈ gra A. Proof. Recall, that f − (x) := λ0 + max fJ (x), J∈Γ where λ0 = 0. For each J ∈ Γ we have fJ (x) := x∗j0 , xj1 − xj0 + · · · + x∗jk−1 , xjk − xjk−1 + x∗jk , x − xjk . (4.8) After redefining Equation (4.8) by Transformation 4.22, we get fJ (x) = a∗ , a2 − a + · · · + a∗n−1 , an − an−1 + a∗n , x − an . Taking the maximum gives the announced result. Similarly, the maximal antiderivatives are the same. 70 Chapter 4. Finite Convex Integration Fact 4.24. [12, p.137] Let H := {h ∈ X : h(x∗0 ) = λ∗0 , λ∗0 = x0 , x∗0 − λ0 , xi ∈ ∂h− (x∗i ) ∀i ∈ I} Then, by [12, Theorem 3.4] there exists at most one minimal antiderivative h− which is piecewise linear with full domain. Remark 4.25. Note that h− is the minimal antiderivative in the dual space H. It is unique as it is the greatest piecewise linear function which bounds all other antiderivatives (in H) from below. If we wish to compute f + := (h− )∗ , it is completely characterized in [12, Theorem 4.3]. Observation 4.26. Since h− ∈ H we have the following constraints: i. h− (x∗0 ) = λ∗0 : λ∗0 = x0 , x∗0 − λ0 , ii. xi ∈ ∂h− (x∗i ) ∀i ∈ I. Taking the conjugate of the minimal antiderivative in the dual space will yield the maximal antiderivative in the primal space. Hence (h− )∗ is maximal in F. Notation 4.27. Throughout we will denote the maximal antiderivative (h− )∗ as f + . Fact 4.28. [5, Corollary 3.10] Let A be CM and a ∈ dom A, ∗ ∗ RA = max{f ∈ F : f is an antiderivative of A and f (a) = 0}. −1 ,(a∗ ,a) − a, a Theorem 4.29. Assume A is a CM operator and that f + (a) = λ∗0 = 0. Then ∗ ∗ RA = f +, −1 ,(a∗ ,a) − a, a where f + := (h− )∗ , h− ∈ H, and (a, a∗ ) ∈ gra A. Proof. It suffices to show h− = RA−1 ,(a∗ ,a) + a, a∗ . 71 Chapter 4. Finite Convex Integration Recall, h− ∈ H ⇔ h− (x∗0 ) = λ∗0 : λ∗0 = x0 , x∗0 − λ0 , xi ∈ ∂h− (x∗i ) ∀i ∈ I. From λ0 = λ∗0 − x0 , x∗0 and λ∗0 = 0 we have h− (x) = λ0 + max hJ (x) (4.9) J∈Γ = λ∗0 − x0 , x∗0 + max hJ (x) (4.10) = max hJ (x) − x0 , x∗0 (4.11) J∈Γ J∈Γ where for each J ∈ Γ we have hJ (x) := xj0 , x∗j1 − x∗j0 + · · · + xjk−1 , x∗k − x∗jk−1 + xjk , x − x∗jk . After redefining Equation (4.11) by Transformation 4.22, we get h− (x) = max hJ (x) − a, a∗ J∈Γ where for each J ∈ Γ we have hJ (x) = a, a∗2 − a∗ + · · · + an−1 , a∗n − a∗n−1 + an , x − a∗n . 4.4.2 Minimal antiderivative in one dimension In one dimension Problem 4.2 has a closed form for the minimal antiderivative. 72 Chapter 4. Finite Convex Integration Fact 4.30. [5, Theorem 3.14] Let A have a finite graph and suppose that the graph of B : conv(Ax) is + ∪ni=1 ({ai } × [b− i , bi ]), + − − + where n ∈ {1, 2, · · · }, a1 < a2 < · · · < an , and b− 1 ≤ b1 ≤ b2 ≤ · · · ≤ bn ≤ bn . Set a0 := −∞ and an+1 := +∞. Suppose that k ∈ {1, · · · , n}. Then RA,(ak ,a∗k ) is given by x→ (x − ai )b− i + (x − ai )b+ i + k j=i+1 (aj−1 i−1 j=k (aj+1 − aj )b− j , − aj )b+ j , if ai−1 < x ≤ ai ≤ ak ; if ak ≤ ai ≤ x < ai+1 ; and ∂RA,(ak ,a∗k ) is given by {b− i }, − [b− i , bi+1 ], x → [b− , b+ ], k k + [b+ i−1 , bi ], {b+ }, i if ai−1 < x ≤ ai ≤ ak ; if x = ai < ak ; if x = ak ; if ak < x = ai ; if ak ≤ ai < x < ai+1 ; Remark 4.31. In one dimension, the closed form of the minimal antiderivative in [12, p.145] is exactly the same closed form of the Rockafellar function in [5, Theorem 3.14] since both closed forms have full domain and are piecewise linear minimal antiderivatives. In one dimension we have a closed form of the minimal antiderivative. Therefore the time complexity is linear. 73 Chapter 4. Finite Convex Integration 4.5 Computational Algorithms for Antiderivatives The most efficient algorithms used to compute minimal and maximal antiderivatives are described in [12, p.139-140]. They are specialized algorithms driven by the following LP formulation. Fact 4.32. [12, Proposition 3.5] The optimal solutions of min{ λ λi : λj − λi ≥ x∗i , xj − xi ∀ i, j ∈ I } (4.12) i∈I,i=0 correspond to the values f − (xi ) (i.e. λi = f − (xi ) for i = 1, . . . , n), which completely determines the minimal antiderivative RA,(a,a∗ ) . Remark 4.33. We only need to discuss the minimal antiderivative, as the maximal antiderivative is determined through conjugate duality, see [12, p.137] and [5, Corollary 3.10]. Definition 4.34. Considering the set of linear inequalities Ax ≤ b, we say that the system Ax ≤ b is a difference of constraints when the constraint matrix A contains one +1 and one −1 in each row; all other entries are zero, for all rows. As xi , x∗i , xj are given, the constraint matrix in Equation (4.12) is a system of difference constraints. It turns out that we can find a feasible solution (an antiderivative) to the Linear program in Equation (4.12) by finding a solution to the shortest path problem in the corresponding graph formulation (see [1, p.103-105]). Example 4.35. Let gra A := {(0, 0), (−1, −2), (1, 2)}. Then by Equation (4.12) we have the fol- 74 Chapter 4. Finite Convex Integration lowing constraints: λ0 − λ1 ≤ 0, λ0 − λ2 ≤ 0, λ1 − λ0 ≤ 2, λ1 − λ2 ≤ 4, λ2 − λ0 ≤ 2, λ2 − λ1 ≤ 4. Each system of difference constraints is associated with a graph whose nodes are λi and each constraint λi − λj ≤ x∗i , xi − xj represents the arc (i, j) with the weighed value x∗i , xj − xi . Hence, we obtain the following graph. 0 2 Figure 4.3: Graph associated with a system of difference constraints. 75 Chapter 4. Finite Convex Integration In this specific example we notice that all weights are nonnegative, therefore we can use Dijkstra’s algorithm to obtain the shortest path distances from a particular node to every other node. Thus the following table represents the shortest path distances from node i to node j. d(i, j) Node 0 Node 1 Node 2 Node 0 0 2 2 Node 1 0 0 2 Node 2 0 2 0 By [1, p.104] and Fact 4.32 we have d(i, j) = λj = f (xj ). Hence, i. f1 (x0 = 0) = λ0 = 0, f1 (x1 = −1) = λ1 = 2, f1 (x2 = 1) = λ2 = 2, ii. f2 (x0 = 0) = λ0 = 0, f2 (x1 = −1) = λ1 = 0, f2 (x2 = 1) = λ2 = 2, iii. f3 (x0 = 0) = λ0 = 0, f3 (x1 = −1) = λ1 = 2, f3 (x2 = 1) = λ2 = 0. We recall that λj = f (xj ) for j = 0, . . . , n completely determines an antiderivative f . Therefore the above list determines exactly three antiderivatives (f1 , f2 , and f3 ). However, not one of these antiderivatives is the minimal antiderivative as the minimal antiderivative corresponds to the case λi = f (xi ) as illustrated on Figure 4.4. 76 Chapter 4. Finite Convex Integration 5 4 3 2 1 x2 2|x| f− f+ 0 −3 −2 −1 0 1 2 3 Figure 4.4: Minimal and maximal antiderivatives. Therefore, generally we cannot find a minimal antiderivative by solving its corresponding shortest path problem, but we can find several antiderivatives. In fact, for each node in the graph G we can find at most one antiderivative. Moreover, if the weights x∗i , xj − xi are all nonnegative we can find an antiderivative in O(n2 ) time using Dijkstra’s algorithm. On the other hand, if we remove the nonnegative weight assumption, we can find several antiderivatives in O(n3 ) using Floyd-Warshall algorithm which is described in [1, p.147]. Note that the graph of G is always full (i.e. n2 arcs), so labeling algorithms [1, p.155] (whose complexity depends on the number of arcs) will not help us compute an antiderivative more efficiently. Remark 4.36. As many convex transforms rely on the subgradient constraint in Equation (4.12), they may also be linked to the shortest path problem. 77 Chapter 5 Conclusion We visited the proximal average which is an advanced convex transform. It inherits a much richer set of properties (continuity, differentiability, full domain, and strict convexity properties) than the arithmetic average. Due to the complicated nature of the proximal average, existing computational algorithms failed to accurately approximate nested transforms (without complications). This shortcoming is remedied by the PLQ model and its corresponding algorithms. As these algorithms depend on convexity, we extended their framework to nonconvex functions using the convex hull. We also explicitly defined each algorithm and stated its general strategy. We showed that a method used to produce primal-dual symmetric antiderivatives depends on the proximal midpoint average and Rockafellar’s function. The latter is a minimal antiderivative, and through conjugate duality, it also provides a maximal antiderivative. Therefore, we showed that the minimal and maximal antiderivatives in [5, Theorem 3.5, Corollary 3.10] are the same minimal and maximal antiderivatives as in [12, p.132, p.136]. So the algorithms in [12, p.139, p.140] can be used to compute primal-dual symmetric antiderivatives. Moreover, these minimal and maximal antiderivatives can be computed by solving a shortest path problem. In fact, the specialized algorithms used to solve the all-pairs shortest path problem [1, p.144] can be used to find many antiderivatives (including Rockafellar’s functions). Future work may focus on the following extensions. • Developing a second-order model to accommodate the PLQ algorithms. 78 Chapter 5. Conclusion • Analyzing convergence for a second-order model. • Extending the PLQ model to higher dimension. • Generalizing the PLQ matrix to represent a more general class of functions. • Expanding the PLQ model to include other convex operations which have the same closure properties. • Exploring the link between convex transforms and shortest path problems. 79 Bibliography [1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network flows, Prentice Hall Inc., Englewood Cliffs, NJ, 1993. Theory, algorithms, and applications. [2] C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, The quickhull algorithm for convex hulls, ACM Transactions on Mathematical Software, 22 (1996), pp. 469–483. [3] H. H. Bauschke, R. Goebel, Y. Lucet, and X. Wang, The proximal average: Basic properties, tech. report, University of British Columbia, 2007. [4] H. H. Bauschke, Y. Lucet, and M. Trienis, How to transform one convex function continuously into another, tech. report, University of British Columbia, July 2006. Accepted for publication in SIAM Review. [5] H. H. Bauschke, Y. Lucet, and X. Wang, Primal-dual symmetric intrinsic methods for finding antiderivatives of cyclically monotone operators, SIAM J. Control Optim., 46 (2007), pp. 2031–2051. ´, and S. Reich, Projection and proximal point methods: [6] H. H. Bauschke, E. Matouˇ skova Convergence results and counterexamples, Nonlinear Anal., 56 (2004), pp. 715–738. [7] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, 2004. 80 Bibliography [8] L. Corrias, Fast Legendre–Fenchel transform and applications to Hamilton–Jacobi equations and conservation laws, SIAM J. Numer. Anal., 33 (1996), pp. 1534–1558. [9] C. H. Hamilton, Symbolic convex analysis, master’s thesis, Simon Fraser University, 2005. [10] J.-B. Hiriart-Urruty, Lipschitz r-continuity of the approximate subdifferential of a convex function, Math. Scand., 47 (1980), pp. 123–134. [11] J.-B. Hiriart-Urruty and C. Lemar´ echal, Convex Analysis and Minimization Algorithms, vol. 305–306 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 1993. Vol I: Fundamentals, Vol II: Advanced theory and bundle methods. [12] D. Lambert, J.-P. Crouzeix, V. H. Nguyen, and J.-J. Strodiot, Finite convex integration, J. Convex Anal., 11 (2004), pp. 131–146. [13] Y. Lucet, A fast computational algorithm for the Legendre–Fenchel transform, Computational Optimization and Applications, 6 (1996), pp. 27–57. [14] , Faster than the Fast Legendre Transform, the Linear-time Legendre Transform, Numer. Algorithms, 16 (1997), pp. 171–185. [15] , La Transform´ee de Legendre–Fenchel et la convexifi´ee d’une fonction : algorithmes rapides de calcul, analyse et r´egularit´e du second ordre, PhD thesis, Laboratoire Approximation et Optimisation, UFR MIG, Universit´e Paul Sabatier, Feb. 1997. [16] , Fast Moreau Envelope computation II: Applications, tech. report, University of British Columbia, 2005. 81 Bibliography [17] , A linear Euclidean distance transform algorithm based on the Linear-time Legendre Transform, in Proceedings of the Second Canadian Conference on Computer and Robot Vision (CRV 2005), Victoria BC, May 2005, IEEE Computer Society Press. [18] , Fast Moreau envelope computation I: Numerical algorithms, Numerical Algorithms, 43 (2006), pp. 235–249. DOI - 10.1007/s11075-006-9056-0. [19] Y. Lucet, H. H. Bauschke, and M. Trienis, The piecewise linear-quadratic model for computational convex analysis, Comput. Optim. Appl., (2006). Accepted for publication. [20] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New York, 1970. [21] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer-Verlag, Berlin, 1998. 82 Appendix A //PROBLEM 1. (1) f1’(x2) = f2’(x4), // (2) m1(x-x2) + f1(x2) = m2(x-x4) + f2(x4), where // (3) x1 < x2 < x3 // (4) x3 < x4 < x5 // //PROBLEM 2. (1) m1(x-x2) + f1(x2) = m1(x-x5) + f2(x5), where // (2) x1 < x2 < x3 // (3) x3 < x4 < x5, and // (4) m1=f1’(x2) // //PROBLEM 3. (1) m2(x-x1) + f1(x1) = m2(x-x4) + f2(x4), where // (2) x1 < x2 < x3 // (3) x3 < x4 < x5, and // (4) m2=f2’(x4) // //PROBLEM 4. (1) m1(x-x1) + f1(x1) = m1(x-x5) + f2(x5), where // (2) x1 < x2 < x3 // (3) x3 < x4 < x5 83 Appendix A. function plqco=plq_co(plqf) EPS=1E-15; plqf_original=plqf; plqfLHS=[];plqfRHS=[]; //Bounded plq functions will have an infinity value as a coefficient //in either the first row or the last row, simply remove these pieces //and add these pieces after the convex hull has been computed. if or(plqf(1,2:$) == %inf) then plqfLHS = plqf(1,:); plqf=plqf(2:$,:); end; if or(plqf($,2:$) == %inf) then plqfRHS = plqf($,:); plqf=plqf(1:$-1,:); end; n = size(plqf,1); x=plqf(:,1);a=plqf(:,2);b=plqf(:,3);c=plqf(:,4); //update the coefficients if a(1) < 0 | a(n) < 0 | (a(1)==0 & a(n) == 0 & b(n) < b(1) ) then plqco=[%inf,0,0,-%inf]; return; end; //special case i=1; while i <= n-1 & n >= 2, a=clean(a(:,1)); b=clean(b(:,1)); c=clean(c(:,1)); //Considering a PLQ function with two pieces, it is nonconvex //if the following condition holds. if (2*a(i)*x(i)+b(i)) - (2*a(i+1)*x(i)+b(i+1)) > EPS | a(i) < 0 | a(i+1) < 0 then if n >=3 & i >= 2 then x1 = plqf(i-1,1); else x1 = -%inf; end; f1=plqf(i,:); f2=plqf(i+1,:);x3=f1(1,1); x5=f2(1,1); 84 Appendix A. qplot(plqf_original,plqf,-%inf,%inf); plqf = [plqf(1:(i-1),:);plq_conv_on_interval(f1,f2,x1,x3,x5);plqf((i+2):$,:)]; plqf = plq_clean(plqf); if n <= 1 then plqco=[plqf]; break; end; if i > 1 then i=i-1; end; //backtrack else i=i+1; end; n = size(plqf,1); //update size of the PLQ function x=plqf(:,1);a=plqf(:,2);b=plqf(:,3);c=plqf(:,4); //update the coefficients end; plqco=[plqfLHS;plqf;plqfRHS]; qplot(plqf_original,plqco,-%inf,%inf); return; endfunction 85 Appendix A. //We assume x(1) < x(2) < x(3) < x(4) < x(5) are ordered and finite //(no infinity values). In creating the PLQ hull we may need to create //points x(2) and x(4). Note that x(1) is the LEFTBOUND, x(3) is the //partition point, and x(5) is the RIGHTBOUND in our PLQ interval. //plq_conv_on_interval finds the convex hull on the interval [x1,x5]. function [plqco] = plq_conv_on_interval(f1,f2,x1,x3,x5) //Assume f1 is to the left of f2 x(1) = x1; a(1)=f1(1,2); a(2)=f2(1,2); x(3) = x3; b(1)=f1(1,3); b(2)=f2(1,3); x(5) = x5; c(1)=f1(1,4); c(2)=f2(1,4); ieee(2); //allow infinity i.e. 1/0 plqco=[]; if a(1) < 0 then //concave pieces are replaced by a linear function. f1=_plq_conv_buildl(x,a,b,c,1,1,1,3); f2=[x(5),a(2),b(2),c(2)]; plqco=plq_conv_on_interval(f1,f2,x(1),x(3),x(5)); return; elseif a(2) < 0 then f1=[x(3),a(1),b(1),c(1)]; f2=_plq_conv_buildl(x,a,b,c,2,5,2,3); plqco=plq_conv_on_interval(f1,f2,x(1),x(3),x(5)); return; elseif 2*a(1)*x(3)+b(1) <= 2*a(2)*x(3)+b(2) then //triveral case plqco=[x(3),a(1),b(1),c(1);x(5),a(2),b(2),c(2)]; return; elseif a(1) == 0 & a(2) == 0 then //(LINEAR-LINEAR) f1 is linear and f2 is linear. 86 Appendix A. [plqco]=_plq_conv_buildl(x,a,b,c,1,1,2,5); return; elseif a(1) == 0 & a(2) <> 0 then //(LINEAR-QUDARTIC) if f1 is linear and f2 is quadratic then x(4) is determined. plqco=[_conv_interval_routine(x,a,b,c,x1,x3,x5)]; return; elseif a(1) <> 0 & a(2) == 0 then //(QUADRATIC-LINEAR) if f1 is quadratic and f2 is linear then x(2) is determined. plqco=[_conv_interval_routine(x,a,b,c,x1,x3,x5)]; return; elseif a(1) <> 0 & a(2) <> 0 then //(QUADRATIC-QUADRATIC) if f1 is quadratic and f2 is quadratic //we need to find solutions to A*x(4)^2 + B*x(4) + C. //The following code solves [PROBLEM 1.], A=(1/4)*(-4*a(2)^2+4*a(2)*a(1))/a(1); B=(1/4)*(-4*a(2)*b(2)+4*b(1)*a(2))/a(1); C=(1/4)*(-b(2)^2-b(1)^2+4*c(1)*a(1)+2*b(1)*b(2)-4*c(2)*a(1))/a(1); D = B^2 - 4*A*C; //Discriminant. if A == 0 then //Linear case, so we have Bx+C=0 with one zero. x(4) = (-C/B); x(2) = (-1/2)*(b(1)-2*a(2)*x(4)-b(2))/a(1); if x(2) < x(1) | x(3) < x(2) then x(2)=%inf; end; if x(4) < x(3) | x(4) > x(5) then x(4)=%inf; end; //if x(2) or x(4) are outside the interval, send the x(i) to infinity. //In otherwords these solutions provide no information and thus we //should just ignore them. Similarly if D < 0, just ignore x(2) and x(4) //as they don’t help with the hull. else // A <> 0, Quadratic case, so we have Ax^2+Bx+C=0, with one or two zeros. 87 Appendix A. //Solving for x(4). if D>0 then nr = (-B - sqrt(D))/(2*A); //negative root pr = (-B + sqrt(D))/(2*A); //positive root if D==0 then nr=pr; end; if x(3) < pr & pr < x(5) then x(4) = pr; elseif x(3) < nr & nr < x(5) then x(4) = nr; else x(4)=%inf; end; x(2) = (-1/2)*(b(1)-2*a(2)*x(4)-b(2))/a(1); if x(1) > x(2) | x(2) > x(3) then x(2)=%inf; end; else x(2)=%inf; x(4)=%inf; end end if x(2) < %inf & x(4) < %inf then plqco=[x(2),a(1),b(1),c(1); _plq_conv_buildl(x,a,b,c,1,2,2,4);x(5),a(2),b(2),c(2)]; return; else //x(2) == %inf | x(4) == %inf, //Here we solve the SECOND PROBLEM: as x(2) or x(4) is infeasible in [PROBLEM 1.] plqco=[_conv_interval_routine(x,a,b,c,x1,x3,x5)]; return; end; else printf("Error: Case does not exist"); end endfunction 88 Appendix A. //if we can’t find a solution to [PROBLEM 1.] then we need to find a solution to [PROBLEM 2.]. //The following routine solves [PROBLEM 2.]. function [plqco] = _conv_interval_routine(x,a,b,c,x1,x3,x5), EPS=1E-15; x(1)=x1;x(3)=x3;x(5)=x5; //solving the quadratic for x(2) if a(1) <> 0 & x(5) < %inf then A1 = -a(1); B1 = 2*a(1)*x5; C1 = c(1)+b(1)*x5-a(2)*x5^2-b(2)*x5-c(2); D1 = B1^2 - 4*A1*C1; pr1 = (-B1 + sqrt(D1))/(2*A1); //solution is x2 nr1 = (-B1 - sqrt(D1))/(2*A1); //negative root for the LINEAR-QUADRATIC CASE. if x(1) <= pr1 & pr1 <= x(3) then x(2) = pr1; plqco=[x(2),a(1),b(1),c(1);_plq_conv_buildl(x,a,b,c,1,2,2,5)]; return; elseif x(3) <= pr1 & pr1 <= x(5) then x(4)=pr1; plqco=[_plq_conv_buildl(x,a,b,c,1,1,2,4);x(5),a(2),b(2),c(2)]; return; elseif x(1) <= nr1 & nr1 <= x(3) then x(2) =pr1; plqco=[x(2),a(1),b(1),c(1);_plq_conv_buildl(x,a,b,c,1,2,2,5)]; return; elseif x(3) <= nr1 & nr1 <= x(5) then x(4)=pr1; plqco=[_plq_conv_buildl(x,a,b,c,1,1,2,4);x(5),a(2),b(2),c(2)]; return; else plqco=_plq_conv_buildl(x,a,b,c,1,1,2,5); return; end; //solving the quadratic for x(4) elseif a(2) <> 0 & x(1) > -%inf then 89 Appendix A. A2 = a(2); B2 = -2*a(2)*x(1); C2 = -b(2)*x(1)+a(1)*x(1)^2+b(1)*x(1)+c(1)-c(2); D2 = B2^2 - 4*A2*C2; pr2 = (-B2 + sqrt(D2))/(2*A2); nr2 = (-B2 - sqrt(D2))/(2*A2); if x(3) <= pr2 & pr2 <= x(5) then x(4)=pr2; plqco=[_plq_conv_buildl(x,a,b,c,1,1,2,4);x(5),a(2),b(2),c(2)]; return; elseif x(1) <= pr2 & pr2 <= x(3) then x(2)=pr2; plqco=[x(2),a(1),b(1),c(1);_plq_conv_buildl(x,a,b,c,1,2,2,5)]; return; elseif x(3) <= nr2 & nr2 <= x(5) then x(4)=nr2; plqco=[_plq_conv_buildl(x,a,b,c,1,1,2,4);x(5),a(2),b(2),c(2)]; return; elseif x(1) <= nr2 & nr2 <= x(3) then x(2)=nr2; plqco=[x(2),a(1),b(1),c(1);_plq_conv_buildl(x,a,b,c,1,2,2,5)]; return; else plqco=_plq_conv_buildl(x,a,b,c,1,1,2,5); return; end; else plqco=_plq_conv_buildl(x,a,b,c,1,1,2,5); return; end; endfunction 90
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational convex analysis : from continuous deformation...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computational convex analysis : from continuous deformation to finite convex integration Trienis, Michael Joseph 2007
pdf
Page Metadata
Item Metadata
Title | Computational convex analysis : from continuous deformation to finite convex integration |
Creator |
Trienis, Michael Joseph |
Publisher | University of British Columbia |
Date Issued | 2007 |
Description | After introducing concepts from convex analysis, we study how to continuously transform one convex function into another. A natural choice is the arithmetic average, as it is pointwise continuous; however, this choice fails to average functions with different domains. On the contrary, the proximal average is not only continuous (in the epi-topology) but can actually average functions with disjoint domains. In fact, the proximal average not only inherits strict convexity (like the arithmetic average) but also inherits smoothness and differentiability (unlike the arithmetic average). Then we introduce a computational framework for computer-aided convex analysis. Motivated by the proximal average, we notice that the class of piecewise linear-quadratic (PLQ) functions is closed under (positive) scalar multiplication, addition, Fenchel conjugation, and Moreau envelope. As a result, the PLQ framework gives rise to linear-time and linear-space algorithms for convex PLQ functions. We extend this framework to nonconvex PLQ functions and present an explicit convex hull algorithm. Finally, we discuss a method to find primal-dual symmetric antiderivatives from cyclically monotone operators. As these antiderivatives depend on the minimal and maximal Rockafellar functions [5, Theorem 3.5, Corollary 3.10], it turns out that the minimal and maximal function in [12, p.132,p.136] are indeed the same functions. Algorithms used to compute these antiderivatives can be formulated as shortest path problems. |
Extent | 6563254 bytes |
Subject |
Convex analysis Convex function Proximal average PLQ (piecewise linear-quadratic) Nonconvex Algorithm Primal-dual symmetric antiderivatives Rockafellar functions |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2008-11-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
IsShownAt | 10.14288/1.0066799 |
URI | http://hdl.handle.net/2429/2799 |
Degree |
Master of Science - MSc |
Program |
Interdisciplinary Studies |
Affiliation |
Graduate Studies, College of (Okanagan) |
Degree Grantor | University of British Columbia |
GraduationDate | 2007-05 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2007_spring_trienis_michael.pdf [ 6.26MB ]
- Metadata
- JSON: 24-1.0066799.json
- JSON-LD: 24-1.0066799-ld.json
- RDF/XML (Pretty): 24-1.0066799-rdf.xml
- RDF/JSON: 24-1.0066799-rdf.json
- Turtle: 24-1.0066799-turtle.txt
- N-Triples: 24-1.0066799-rdf-ntriples.txt
- Original Record: 24-1.0066799-source.json
- Full Text
- 24-1.0066799-fulltext.txt
- Citation
- 24-1.0066799.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066799/manifest