UBC Undergraduate Research

Convex Hulls and Graph-Matrix Calculus: Algorithms in Computational Convex Analysis 2010

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
Thesis Bryan Gardiner.pdf [ 701.64kB ]
Metadata
JSON: 1.0052227.json
JSON-LD: 1.0052227+ld.json
RDF/XML (Pretty): 1.0052227.xml
RDF/JSON: 1.0052227+rdf.json
Turtle: 1.0052227+rdf-turtle.txt
N-Triples: 1.0052227+rdf-ntriples.txt
Citation
1.0052227.ris

Full Text

Convex Hulls and Graph-Matrix Calculus: Algorithms in Computational Convex Analysis by Bryan Nicholas Carl Gardiner A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF SCIENCE HONOURS in The Irving K. Barber School of Arts and Sciences (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) April 2010 c© Bryan Nicholas Carl Gardiner 2010 Abstract At the core of Convex Analysis and its applications are a collection of fre- quently used operators for transforming convex functions, along with the convex hull operation for convexifying functions. While numerical algo- rithms are usually applied to general functions with little known structure, we focus on the class of univariate piecewise linear-quadratic (PLQ) func- tions, for which exact algorithms have been developed. This thesis presents two main results. In the first part, we investigate two convex hull algorithms for univariate PLQ functions. The first algo- rithm is an extension of the linear-time planar Beneath-Beyond algorithm, and performs a plane sweep that converts a function into its convex hull. The second uses convex duality theory to deconstruct a nonconvex function and build its convex hull using convex operators, in worst-case quadratic time. The trade-off is complexity: the second algorithm can be significantly simpler to implement. The second part is concerned with the computation of convex transforms, such as the Legendre-Fenchel transform and the Moreau Envelope. We introduce a new family of algorithms that stores and manipulates models of the subdifferential instead of the original function. We performed numerical experiments comparing the two convex hull algorithms, and comparing the new subdifferential algorithms to existing PLQ algorithms. These experiments confirm the time complexity for the convex hull algorithms, and show that the subdifferential algorithms have the same complexity as the PLQ algorithms, while performing an order of magnitude faster. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1.2 Affine and Convex Hulls . . . . . . . . . . . . . . . . 3 2.1.3 Set Interiors . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.4 Subdifferential . . . . . . . . . . . . . . . . . . . . . . 4 2.1.5 Convex Conjugate . . . . . . . . . . . . . . . . . . . . 4 2.1.6 Epi-addition and Epi-multiplication . . . . . . . . . . 5 2.1.7 Moreau Envelope . . . . . . . . . . . . . . . . . . . . 5 2.1.8 Proximal Average . . . . . . . . . . . . . . . . . . . . 6 2.1.9 Self-Dual Smoothing Operators . . . . . . . . . . . . 7 2.2 Piecewise Linear-Quadratic Functions . . . . . . . . . . . . . 7 3 Convex Hull Algorithms . . . . . . . . . . . . . . . . . . . . . 9 3.1 Direct Approach . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Convex Duality Approach . . . . . . . . . . . . . . . . . . . . 18 3.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . 19 4 Graph-Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . 22 4.1 Representation of PLQ Functions . . . . . . . . . . . . . . . 22 4.2 Goebel’s Graph-Matrix Calculus . . . . . . . . . . . . . . . . 25 4.3 Transforms of PLQ Functions . . . . . . . . . . . . . . . . . 28 4.4 Comparison of Algorithms . . . . . . . . . . . . . . . . . . . 31 iii Table of Contents 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 iv Acknowledgements I wish to thank Dr. Yves Lucet most sincerely for providing me with the opportunity to do an Honours thesis under him, and the opportunity to pursue research in the area of Convex Analysis. His patient and caring mentorship has been essential, and he has given me much inspiration and confidence. Many other people have provided support, encouragement, and knowl- edge throughout my thesis work. I would like to acknowledge Dr. Heinz Bauschke and Dr. Warren Hare for their teaching, and for their everlasting enthusiasm for Mathematics. I would also like to thank Mike Trienis for his initial implementation of the direct PLQ convex hull algorithm. v Chapter 1 Introduction Convex Analysis is a rich field with applications in diverse areas. For exam- ple, it provides algorithms to compute distance transforms [8] and reverse degradation [5] in image and signal processing, and to improve models in machine learning [4, 23]. There are numerous operators in Convex Analysis that transform one convex function into another. Common examples of these would be the ubiq- uitous Legendre-Fenchel transform, the Moreau-Yoshida approximation, and the recently-introduced Proximal Average [3]. It is natural to want fast al- gorithms to perform these operations. Since generic numerical algorithms do not generate an exact model of the transformed function, research has focused on special classes of functions [9, 16, 17]. Restricting oneself to a well-behaved class of functions can allow exact algorithms to be developed. Piecewise linear-quadratic (PLQ) functions are such a class. They are closed under many convex transforms, including those listed above. Existing PLQ algorithms directly manipulate a model of a function [2]. Because the subdifferential of a PLQ function is a piecewise linear multi- valued mapping, the subdifferentials of many transforms are related linearly to the original subdifferentials. Goebel [12] published the exact relationship for a number of operators. We extend his results to additional operators, and provide a method to recover the function values. First, we will look at algorithms for computing the convex hulls of PLQ functions. We will recall the extension by Trienis [22] of the Beneath-Beyond algorithm for points in a plane, to PLQ functions with unbounded domains. We will then introduce a new algorithm for computing the convex hull of a PLQ function, that uses only the convex conjugate and the pointwise maximum operators. Chapter 2 sets notations and recalls the necessary theory. Chapter 3 discusses the two convex hull algorithms for PLQ functions. After bring- ing ourselves into the domain of convex functions, in Chapter 4 we apply Goebel’s calculus rules to the computation of a number of convex operators, and compare these techniques to existing PLQ algorithms. Chapter 5 con- cludes the thesis with a results summary, and directions for future research. 1 Chapter 2 Preliminaries In this chapter, we will outline notations and recall results we will use in the rest of the thesis. We will denote the standard inner product on Rd with 〈x, y〉 = xT y = d∑ i=1 xiyi, and the quadratic energy function q : Rd → R : x 7→ 1 2 ‖x‖2. We will write the open ball centered at x with radius ε as B(x; ε) = {x ∈ Rd : ‖x‖ < ε}. 2.1 Convex Analysis 2.1.1 Convexity We use standard definitions for the convexity of sets and functions. Definition 2.1.1. A set C is convex if, and only if, (iff) ∀x1, x2 ∈ C, ∀λ ∈ [0, 1], (1− λ)x1 + λx2 ∈ C. Another useful notion is that of the epigraph of a function. Definition 2.1.2. Given a function f : Rd → R, its epigraph epi(f) is the set of points in Rd+1 that are on or above the graph of f : epi(f) = {(x, α) : f(x) ≤ α} 2 2.1. Convex Analysis A function is defined to be convex iff its epigraph is convex. (Likewise, a function f is concave iff −f is convex, i.e. epi(−f) is a convex set.) We will write R̄ to mean the set R ∪ {+∞} of extended reals. When working with a function f with bounded domain dom(f) ⊂ Rd, it will some- times be more convenient to instead work with the extended-value function f̄ , where f̄(x) = { f(x) , if x ∈ dom(f); +∞ , if x /∈ dom(f). We denote the indicator function of a set S by ιS(X). This function is defined to be zero for all elements in S, and +∞ for elements outside of S, ιS(x) = { 0 , if x ∈ S; +∞ , if x /∈ S. 2.1.2 Affine and Convex Hulls We recall the definitions of affine and convex hulls for sets in Rd, and for convex hulls of functions. Fact 2.1.3. The affine hull of a set S is the set of all affine combinations of finite numbers of points in S; that is, aff(S) = { n∑ i=1 λixi : xi ∈ S for i ∈ {1, . . . , n}, n∑ i=1 λi = 1, n ≥ 1 } . A set is affine if it is equal to its affine hull. Fact 2.1.4. The convex hull of a set S is the smallest convex set that con- tains S. It is also the intersection of all convex sets containing S. It can be written as conv(S) = { n∑ i=1 λixi : xi ∈ S, λi ≥ 0 for i ∈ {1, . . . , n}, n∑ i=1 λi = 1, n ≥ 1 } . The convex hull is, by definition, a convex set. The closed convex hull of a function f is defined to be the largest lower semi-continuous, convex function that underestimates f . It is written as co(f). It can be defined via the epigraph with (co f)(x) = inf{t : (x, t) ∈ conv epi f}. When we refer to the convex hull of a function, we are referring to its closed convex hull. 3 2.1. Convex Analysis 2.1.3 Set Interiors Definition 2.1.5. The interior of a set S, int(S), is defined as int(S) = {x ∈ S : B(x; ε) ⊆ S for some ε > 0}. Definition 2.1.6. The relative interior of S, ri(S), is the interior of S when considered as a subset of its affine hull, that is, ri(S) = {x ∈ S : B(x; ε) ∩ aff(S) ⊆ S for some ε > 0}. The relative interior will be useful when discussing function domains. 2.1.4 Subdifferential The subdifferential is a generalization of derivatives to convex but nondif- ferentiable functions. Definition 2.1.7. We define the subdifferential of a convex function f : Rd → R̄ at a point x to be the set of all subgradients s of f at x, ∂f(x) = {s ∈ Rd : ∀y ∈ Rd, f(y) ≥ f(x) + 〈s, y − x〉}. The subdifferential is a multivalued map from Rd to Rd. We also define the graph of the subdifferential, gph ∂f , as a subset of Rd × Rd by (x, s) ∈ gph ∂f ⇔ s ∈ ∂f(x). 2.1.5 Convex Conjugate The convex conjugate, also known as the Legendre-Fenchel transform, or just the conjugate, is a fundamental transform in convex analysis. It can be used to construct many other transforms, such as the Moreau envelope and the proximal average. Definition 2.1.8. For a proper function f : Rd → R, its conjugate is the function f? : Rd → R̄ defined by f?(y) = sup x∈dom(f) {〈y, x〉 − f(x)} Using extended-value functions, we can write the conjugate without con- straints as f?(y) = supx∈Rd{〈y, x〉− f̄(x)}. As f? is the supremum of convex functions, it is convex. 4 2.1. Convex Analysis Fact 2.1.9. [20, Theorem 12.2] If f is convex, then the conjugate f? is always convex and lower semi-continuous. Furthermore, f?? is the closed convex hull of f , and f? is proper iff f is proper. We highlight several cases where the conjugate is easy to compute. Fact 2.1.10. [20, p.106] The only function on Rd that is self-conjugate, i.e. f? = f , is q(x) = 12‖x‖2. Example 2.1.11. If f(x) = ax2 + bx+ c for a > 0, b, c ∈ R, then f?(y) = 1 4a(y − b)2 − c, and f?? = f . Example 2.1.12. If f(x) = bx + c for b, c ∈ R, then f?(y) = ι{b}(y) − c, and f?? = f . Example 2.1.13. If f(x) = 1p‖x‖p with x ∈ Rd, 1 < p < +∞, then f?(y) = 1q‖y‖q, where 1p + 1q = 1. 2.1.6 Epi-addition and Epi-multiplication We make use of two further operators, known as epi-addition (or inf-convolution), and epi-multiplication. We denote these, respectively, by: (f2g)(x) = inf y∈Rd {f(y) + g(x− y)} (αFf)(x) = { α · f(x/α) , if α > 0; ι{0} , if α = 0. 2.1.7 Moreau Envelope The Moreau envelope, also known as the Moreau-Yoshida approximate, is the inf-convolution of a function with 12λ‖ · ‖2. That is, eλf(x) = (f2λ−1q)(x) = inf y∈Rd {f(y) + 1 2λ ‖y − x‖2}, for λ > 0. The Moreau envelope is an underestimator of f , which converges pointwise to f as λ ↘ 0 [21, Theorem 1.25]. By expanding the norm- squared, we can write the Moreau envelope in terms of the conjugate as eλf(x) = ‖x‖2 2λ − 1 λ g?λ(x), 5 2.1. Convex Analysis where gλ(x) = (λf + 12‖ · ‖2)(x). When f is convex, the Moreau envelope can be written using the conjugate, eλf = (f? + λ−1q)?. The proximal mapping is the set of minimizers of the Moreau envelope. It is defined as Proxλf(x) = argmin y∈Rd {f(y) + 1 2λ ‖y − x‖2}. 2.1.8 Proximal Average The proximal average is an operator for continuously transforming one func- tion into another, and unlike the arithmetic average, it produces a proper function even if the domains of two proper input functions f and g are disjoint. The proximal average is the function whose proximal mapping is the convex combination of the proximal mappings of the input functions [3, Remark 6.2]. Definition 2.1.14. Given two proper functions f and g, and a parameter λ ∈ [0, 1], the proximal average Pλ(f, g) : Rd → R̄ is defined as Pλ(f, g) = ( (1− λ)(f + µ−1q)? + λ(g + µ−1q)?)? − µ−1q for a smoothing parameter µ > 0. The proximal average is self-dual, that is, (Pλ(f, g))? = Pλ(f?, g?). If f and g are convex, then Pλ(f, g) is also convex. Convexity with respect to other parameters have been studied in [15]. Recent extensions to the proximal average include the generalized kernel average of n functions in [19], as well as the nonconvex proximal average in [13]. The proximal average can be written as a minimization problem using [1, Formula 20], Pλ(f, g)(x) = inf x=(1−λ)x1+λx2 ( (1− λ)f(x1) + λg(x2) + (1− λ)λ2µ ‖x1 − x2‖ 2 ) . We call the proximal average exact at x if the infimum is attained for x1, x2 ∈ Rd. The proximal average can also be written using Moreau envelopes as in [1, Theorem 8.3], Pλ(f, g)(x) = −eµ(−(1− λ)eµf − λeµg). 6 2.2. Piecewise Linear-Quadratic Functions 2.1.9 Self-Dual Smoothing Operators We will present results on two recently published operators that are self- dual, i.e. they satisfy T (f?) = (Tf)?. The first is a smoothing operator defined by Goebel in [12] as sλf = (1− λ2)eλf + λq. The second operator was published in [19] and is defined as Tλf = Pλ(f, q). In both cases, λ ∈ (0, 1). 2.2 Piecewise Linear-Quadratic Functions Piecewise linear-quadratic (PLQ) functions are piecewise functions f : Rd → R̄, where dom(f) is convex and partitioned into polyhedral pieces, and each piece is assigned a linear or quadratic function. We will additionally assume throughout this thesis that f is continuous on the interior of its domain, and differentiable on each of its pieces, though f may fail to be differentiable at the boundary of each piece. The class of PLQ functions is well studied [2, 21] because it is closed under many of the standard operations in convex analysis, such as addi- tion, scalar multiplication, conjugation, and the Moreau envelope. When d = 1, the class is also closed under the pointwise minimum and maximum operations. In this thesis, we are specifically concerned with univariate PLQ func- tions f : R → R̄. We divide the real line into n pieces, with the ith piece being defined by fi(x) = aix + bix + ci over the interval [xi−1, xi], where −∞ = x0 < x1 < x2 < . . . < xn−1 < xn = +∞. Given such a function, we represent it as the n× 4 matrix x1 a1 b1 c1 x2 a2 b2 c2 ... ... ∞ an bn cn  . Example 2.2.1. The univariate quadratic energy function, q(x) = 12x 2 is represented by the matrix [∞ 12 0 0] . 7 2.2. Piecewise Linear-Quadratic Functions Example 2.2.2. The function f(x) = ||x − 1| − 1| is represented by the matrix  0 0 −1 0 1 0 1 0 2 0 −1 2 ∞ 0 1 −2  . As a special case, we support representation of the indicator function for a single point, f(x) = ι{x̄}(x) + c, where x̄ ∈ R, with the matrix:[ x̄ 0 0 c ] Any references to PLQ functions from this point forward will be referring to univariate PLQ functions unless stated otherwise. 8 Chapter 3 Convex Hull Algorithms Convex hulls form a bridge between the fields of Computational Geometry and Convex Analysis. The former typically focuses on computing the convex hull of a set of points, lines, curves, or other objects in Euclidean space; the latter is concerned with the convex hulls of functions. Fortunately, there is a direct link: representing a function’s curve can enable us to use an applicable algorithm from Computational Geometry. The Beneath-Beyond algorithm is an incremental algorithm for comput- ing the convex hull of a set of points in Euclidean space [7]. In the plane, this algorithm requires O(n) time, given sorted input (this is not an issue, as the first column of the matrix of a PLQ function is always sorted). The convex hull of a piecewise linear function could be calculated by passing to the Beneath-Beyond algorithm the set of points where the function is non- differentiable. Quadratic pieces require special attention, but do not require asymptotically more time. The initial implementation of this algorithm was provided in [22], and it is studied further in [10]. After describing this algorithm, we present a new method for computing convex hulls [10] by decomposing a nonconvex PLQ function into individual convex pieces, conjugating, then applying the maximum operation to those pieces to build the convex hull. As this algorithm stores a model that may grow at each iteration, it takes O(n2) time in the worst case. 3.1 Direct Approach The planar Beneath-Beyond algorithm performs a plane sweep and incre- mentally adds points to, and removes points from, the convex hull. We apply the same technique to a PLQ function f , performing a plane sweep along the x-axis to find the convex hull of the epigraph. At any point in the algorithm, the function to the left of our current position is convex. Initially, there are a number of cases we must consider, which would cause co f to be −∞ everywhere. Proposition 3.1.1. [10, Proposition 4.2] Let f be a PLQ function. Then 9 3.1. Direct Approach co f ≡ −∞ iff any of the following conditions hold. (i) There is an x with f(x) = −∞, (ii) a1 < 0, (iii) an < 0, (iv) a1 = an = 0, and b1 > bn. The direct algorithm begins with the leftmost finite piece of f , and pro- gresses rightward. At any point in the algorithm, the function to the left of the current index is convex. In each iteration, two adjacent pieces of f are considered. If those two pieces, treated as a single function, are convex, then we move right, else we convexify the two pieces, backtracking if necessary. A sample run of the algorithm is shown in figure 3.1. Pseudocode is presented in Algorithm 1, which we call plq coDirect. Notation 3.1.2. We write (fi, fi+1) to denote the ith and (i + 1)th pieces of f considered together, i.e. f restricted to [xi−1, xi+1]. Before we begin the plane sweep, we will simplify f by replacing any nonconvex pieces of f with their convex hulls. This simplifies the cases we must consider to convexify each pair of pieces. A single piece fi is only nonconvex on its domain if ai < 0, in which case we can replace the piece with the line segment between its two endpoints. We also need a test to know when two adjacent pieces are convex to- gether. The following proposition treats this case. Proposition 3.1.3. [22, Theorem 3.34] Let f be a continuous PLQ function such that co f 6≡ −∞. Then f is nonconvex iff there are two adjacent pieces fi (on [xi−1, xi]) and fi+1 (on [xi, xi+1]) such that f ′i(xi) > f ′ i+1(xi). In order to construct the convex hull of two adjacent PLQ functions on line 21 of Algorithm 1, three cases must be considered. If both pieces fi, fi+1 are quadratic (that is, ai, ai+1 > 0), then the following system must be solved for xα, xβ to find a tangent line joining the pieces. f ′i(xα) = f ′ i+1(xβ) (3.1) f ′i(xα)(x− xα) + fi(xα) = f ′i+1(xβ)(x− xβ) + fi+1(xβ) (3.2) xα ∈ [xi−1, xi] (3.3) xβ ∈ [xi, xi+1] (3.4) 10 3.1. Direct Approach −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 (a) Initial, nonconvex PLQ func- tion. The first two pieces are nonconvex; convexify and move forward. −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 (b) Two linear pieces are non- convex. Collapse to one piece and backtrack. −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 (c) Quadratic-linear nonconvex case. Convexify; the slope at −5 is unchanged, so advance. −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 (d) Linear-quadratic case. We need to backtrack again. −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 (e) Several more iterations lead to the convex hull. Figure 3.1: Demonstration of the steps of the plq coDirect algorithm. 11 3.1. Direct Approach Algorithm 1: plq coDirect Input: f (a PLQ function) Output: co f (a convex PLQ function) begin1 if any conditions in Proposition 3.1.1 hold then2 co f ← −∞3 return4 end5 for i← 1 to n do6 if ai < 0 then7 Replace the ith piece of f with the linear function between8 (xi−1, f(xi−1)) and (xi, f(xi)); end9 end10 // Plane sweep.11 i0 ← i← the index of the first finite piece of f ;12 if ← the index of the last finite piece of f ;13 // Let N denote the number of finite pieces of f.14 while (i ≤ if − 1) and (N ≥ 2) do15 if f ′i(xi) ≤ f ′i+1(xi) then16 // (fi, fi+1) are convex.17 i← i+ 1; // Add fi to the convex hull and advance.18 else19 // Convexify fi and fi+1.20 Replace (fi, fi+1) with co((fi, fi+1)) on [xi−1, xi+1] in f ;21 Update if according to inserted/removed pieces;22 if f ′i−1(xi−1) > f ′ i(xi−1) then23 // (fi−1, fi) has lost convexity.24 i← max(i− 1, 1);25 else26 i← the index of the rightmost piece in [xi−1, xi+1];27 // f is now convex on [xi0 , xi].28 end29 end30 end31 co f ← f ;32 end33 12 3.1. Direct Approach In this case, an additional linear piece will be inserted into f on [xα, xβ], possibly replacing one or both quadratic pieces when xα = xi−1 or xβ = xi+1. A quadratic-quadratic convex hull is demonstrated in Figure 3.2. If fi is quadratic and fi+1 is linear, then we find a tangent line connecting fi at xα to the right endpoint of fi+1. We set f to be this tangent line on [xα, xi+1]. Note that xα may be xi−1, in which case the quadratic piece will be removed. Symmetry handles the case where fi is linear and fi+1 is quadratic; see Figure 3.3. The system we now solve is f ′i(xα)(xi+1 − xα) + fi(xα) = fi+1(xi+1) (3.5) xα ∈ [xi−1, xi] Care must be given to the linear-quadratic case. After solving a linear- quadratic system and convexifying (fi, fi+1), if we have lost convexity at xi−1, we first check to see if fi−1 is quadratic. If it is, then we may enter a cycle of alternately convexifying (fi−1, fi) and (fi, fi+1), converging to the convex hull of f on [xi−1, xi+1]. As the linear function fi is repeatedly shifted downward, we treat this as a special case and remove fi from the model, solving the quadratic-quadratic system (Equations 3.1–3.4) to find the tangent line for fi−1 and fi+1. The simplest case to handle is when both fi and fi+1 are linear func- tions. We are convexifying because (fi, fi+1) are nonconvex, implying that bi > bi+1. Here we replace both linear pieces with a single linear function connecting the points (xi−1, fi(xi−1)) and (xi+1, fi+1(xi+1)). Figure 3.4 pro- vides an example of this case. Special attention may be necessary in implementing the last two cases, when a linear piece extends to ±∞ (e.g. xi+1 = ∞ in (3.5)). These cases are handled by finding a line whose slope is that of the unbounded piece. Examples of these cases are shown in Figures 3.3(b), 3.4(b), and 3.4(c). After inserting co((fi, fi+1)) into f , we would like to increment the index i and consider the next two pieces in the function. By Proposition 3.1.3, f is convex iff its slope increases at each of the breaks in its domain. The function (fi, fi+1) is now convex, however, if the slope of f approaching xi from the right has decreased, then (fi−1, fi) may fail to be convex. Hence the algorithm must backtrack and reconsider (fi−1, fi). Lemma 3.1.4. With consecutive backtracks, every backtrack after the first reduces the two pieces currently being considered to a single linear piece. Proof. After convexifying (fi, fi+1), Algorithm 1 only backtracks when nec- essary, i.e. when f ′i(xi−1) decreases sufficiently. The only time f ′ i(xi−1) 13 3.1. Direct Approach −4 −3 −2 −1 0 1 2 3 4 −1 0 1 2 3 4 5 Figure 3.2: Quadratic-quadratic convex hull case. −5 −4 −3 −2 −1 0 1 2 −0.5 0.0 0.5 1.0 1.5 2.0 (a) Bounded linear piece. −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 (b) Unbounded linear piece. Figure 3.3: Linear-quadratic convex hull case. 14 3.1. Direct Approach −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 (a) Bounded convex hull. −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 −6 −5 −4 −3 −2 −1 0 1 (b) Convex hull unbounded on the left. −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 −6 −5 −4 −3 −2 −1 0 1 (c) Convex hull unbounded on the right. Figure 3.4: Linear-linear convex hull case. 15 3.1. Direct Approach decreases is when a linear piece is inserted on some interval [xi−1, xβ], for xi−1 < xβ ≤ xi. Hence, after backtracking, the right piece will be linear. After the first backtrack, we will be in either the quadratic-linear case or the linear-linear case. The only way backtracking can happen in these cases is if the current two pieces are replaced by a single linear piece. By induction, this holds for all consecutive backtracks beyond the first. Proposition 3.1.5. Algorithm 1 runs in optimal linear time and space. Proof. Assume co f 6≡ −∞. Convexifying pieces in lines 6–10 requires O(1) time for each piece. To show that the while loop on lines 15–31 requires linear time, note again that each iteration requires constant time. The while loop terminates either when we reach the rightmost finite piece of f , or when f contains a single finite piece. In every iteration, we make progress to at least one of these conditions. Moving right will lead to the first condition being satisfied. If we don’t move right, then we have lost convexity at xi−1. There are four subcases to consider, based on the types of the original fi, fi+1 being convexified. (i) Linear-linear: Taking the convex hull of two linear pieces always pro- duces a single linear piece on [xi−1, xi+1], so progress is made toward N reaching 1. (ii) Quadratic-linear: The only way for Algorithm 1 to decrease f ′i(xi−1) is to insert a linear piece extending rightward from xi−1. As fi+1 is linear, this means again that (fi, fi+1) has collapsed to a single linear piece. (iii) Quadratic-quadratic: As in case (ii), a linear piece has been inserted whose left endpoint is xi−1. In the quadratic-quadratic case we look for the linear function tangent to both quadratic pieces, so consider the right endpoint of the linear function, xβ. If xβ = xi+1, then again co((fi, fi+1)) is now a single linear piece. If xβ < xi+1, then the convex hull is a linear-quadratic function, which is handled in the next case (after convexification). (iv) Linear-quadratic: If xβ = xi+1, then (fi, fi+1) collapses to a single piece. Otherwise, co((fi, fi+1)) still contains two pieces. As we have lost convexity at xi−1, consider the type of fi−1. 16 3.1. Direct Approach If fi−1 is quadratic, then backtracking as usual will lead to a cycle of repeatedly convexifying (fi−1, fi) and (fi, fi+1), so remove the linear piece and solve the quadratic system instead. If fi−1 is linear, then it is necessary to backtrack without decrementing the number of pieces. We claim that this can happen at most once before adding fi+1 to the convex hull. Fix x̄ as the current value of xi−1, and let fj denote, across all iterations, the piece of f whose right endpoint is the value of xi before any backtracking. (Currently, fj = fi.) By Lemma 3.1.4, each consecutive backtrack will now decrease N , and the right piece in all these backtracks will be fj . When consecutive backtracking is no longer necessary, we will again consider having convexified (fj , fj+1) in the linear-quadratic case, and assume f is still nonconvex at xj−1. We enumerate the reasons why backtracking stopped. (a) If j = 1, then we have convexified (f1, f2), and consequently move forward (we cannot backtrack any further). (b) If fj−1 is quadratic, then we the use quadratic-quadratic system as above, and either move forward, or when the two quadratic pieces collapse to one, backtrack. (c) If fj−1 is linear, it may be necessary to backtrack again without collapsing (fj , fj+1). Note that, as there was at least one consec- utive backtrack to get to this point, xj−1 is now strictly less than x̄, and also that xj−1 is one of the partition points that was in f when we assigned x̄ (backtracking cannot introduce new parti- tion points). Hence there is a finite number of partition points that xj−1 can take on in this case, and once xj−1 moves left be- yond a partition point, it is deleted from the model and cannot be considered by a future linear-quadratic case. Thus, though a single linear-quadratic piece may require O(n) time, partition points removed from the model will not be considered by fu- ture linear-quadratic cases, so O(n) work is done by all linear-quadratic cases. The other three cases always progress toward either i = if or N = 1, so overall, Algorithm 1 runs in linear time. The function f begins with O(n) pieces, and each iteration adds at most one piece (a tangent line between two quadratics), or deletes at most one piece, hence Algorithm 1 requires linear space as well. 17 3.2. Convex Duality Approach 3.2 Convex Duality Approach An alternate method that we present to compute the convex hull is based on dividing a PLQ function into individual pieces, each of which is convex. Consider a PLQ function f with n pieces. Assuming co f 6≡ −∞, this method will also assume that each individual piece is convex (convexifying individual pieces may initially be necessary, as with the direct algorithm). We name this algorithm plq coSplit. Define the functions (gi)ni=1 by gi(x) = { fi(x) , if x ∈ [xi−1, xi] +∞ , otherwise Each gi is the extended-value function of fi restricted to [xi−1, xi]. We can write f = min gi. Also, as each gi is convex, co gi = gi. Hence we can write co f as co f = co min gi = co min co gi. From here, we apply the following theorem. Theorem 3.2.1. [14, Theorem X.2.4.4] Let g1, . . . , gn be proper, lower semi-continuous functions. Then (co max(g1, . . . , gn))? = co min(g?1, . . . , g ? n). This holds for an arbitrary finite number of functions. By applying Theorem 3.2.1 with [14, Proposition X.2.6.1], we rewrite the convex hull of f as co f = co min(co g1, . . . , co gn) = (max(g?1, . . . , g ? n)) ?. (3.6) This formulation provides a basis for computing the convex hull of f . Doing this requires the ability to compute the conjugate of already-convex func- tions, and to construct the pointwise maximum of n convex functions. The conjugate of each gi is easily found via an explicit formula in [2, Proposition 5.2]. The complete method is presented in Algorithm 2. As the difference in line counts suggests, Algorithm 2 requires less work to implement, provided that algorithms are already available for computing the maximum and conjugate of convex PLQ functions. 18 3.3. Performance Comparison Algorithm 2: plq coSplit Input: f (a PLQ function) Output: co f (a convex PLQ function) begin1 Check if co f ≡ −∞, and return if necessary;2 Convexify individual nonconvex pieces of f ;3 g ← −∞;4 for i← 1 to n do5 Construct gi as above;6 g ← max(g, g?i );7 end8 co f ← g?;9 end10 Proposition 3.2.2. Algorithm 2 has linear space complexity, worst-case quadratic time complexity, and best-case linear time complexity. Proof. Each gi contains a single piece, hence g?i will contain O(1) pieces. Every iteration, g will have O(1) partition points added to it, in the case where the partition points in g?i are not partition points of g. After n iterations, g will contain O(n) pieces. As every call to max is a linear operation, across all calls to max we take O(n2) time. If every gi after the first introduces no new pieces into g, then each call to max will take constant time, making the algorithm run in linear time. This is the case, e.g., when computing the convex hull of a piecewise affine interpolation of the function f(x) = −12 |x|2. 3.3 Performance Comparison To compare the plq coDirect and plq coSplit algorithms, we compute the convex hull of a piecewise affine function with an increasing number of pieces. The results may be seen in Figures 3.5 and 3.6. Figure 3.5 confirms that the worst-case complexity is attained for both algorithms when computing co(12‖ · ‖2), with plq coDirect running in linear time and plq coSplit showing quadratic performance. As the model of the convex hull does grow to contain n pieces, quadratic growth is expected. In contrast, when computing co(−12‖ · ‖2) in Figure 3.6, plq coSplit’s model does not grow, and ultimately a model with a single piece is produced (a linear function between the bounds of the domain). 19 3.3. Performance Comparison -10 0 10 20 30 40 50 60 70 80 100 150 200 250 300 350 400 450 500 550 600 n R u n  t i m e  ( s e c ) direct split Figure 3.5: Run time (seconds) of the convex hull algorithms computing the convex hull of an n-piece piecewise linear approximation of f(x) = 12x 2 on [−n/2, n/2]. In both graphs, plq coDirect performs significantly better. Besides quadratic worse-case complexity, plq coSplit is also making use of entire lower-level algorithms, so overhead invoking these subroutines certainly contributes to plq coSplit’s poorer performance. All experiments were run on a quad-core 64-bit 2.0GHz HP HDX 18 lap- top with 4GB RAM, running Gentoo Linux 10.0. Implementation was done under Scilab 5.1.1, extending the Computational Convex Analysis toolbox freely available from [18]. 20 3.3. Performance Comparison 0.0 0.5 1.0 1.5 2.0 2.5 100 200 300 400 500 600 n R u n  t i m e  ( s e c ) direct split Figure 3.6: Run time (seconds) of the convex hull algorithms computing the convex hull of an n-piece piecewise linear approximation of f(x) = −12x2 bounded on [−n/2, n/2]. 21 Chapter 4 Graph-Matrix Calculus We will now present a new class of algorithms for computing transforms of convex functions, based on unary “calculus rules” [12] for the subdifferen- tial of a convex function. We recall these rules, then provide formulas for additional unary operators, and two binary operators. We then apply these rules to PLQ functions, and describe how to recover an exact model of a function from a transformed subdifferential. 4.1 Representation of PLQ Functions Ultimately, we wish to be able to perform a transform of a convex PLQ function f , g = Tf, by looking at the graph of the subdifferential of f , gph ∂f = {(x, y) : x ∈ dom f, y ∈ ∂f(x)}, and computing the subdifferential of the transform via a matrix multiplica- tion gph ∂(Tf) = A gph ∂f, for some matrix A ∈ R2d×2d. For univariate PLQ functions, A ∈ R2×2, and gph ∂f must be repre- sented in a way suitable for multiplying by A. The data structure used for this graph must also contain a constant of integration, to be able to use integration to recover the PLQ matrix model, and evaluate the function at points in its domain. Conversely, differentiation is able to convert a PLQ matrix model into what we call a GPH matrix,x0 x1 x2 . . . xn xn+1s0 s1 s2 . . . sn sn+1 f0 f1 f2 . . . fn fn+1  . The subdifferential of a PLQ function, when single-valued, is a piecewise affine function, and the subdifferential of a convex PLQ function is made up 22 4.1. Representation of PLQ Functions of line segments connected end-to-end that extend to infinity in both direc- tions. We note xi points in dom f , si subgradients of f at xi, and fi constants of integration (fi = f(xi)). The matrix above stores enough points (xi, si) on this graph to be able to recover the entire graph of the subdifferential via piecewise linear interpolation of (xi, si) for all i ∈ {0, . . . , n+ 1}. The columns on the GPH matrix are stored in lexicographically increas- ing order. That is, for i ≤ j, we have xi ≤ xj , and also si ≤ sj . If xi = xj then fi = fj . The values {x1, . . . , xn} are divisions between the pieces of f , though they are not necessarily distinct: if f is nondifferentiable at a break x̄, then there will be two columns i < j with xi = xj = x̄, and si < sj will be the slopes of f approaching x̄ from the left and right, respectively. The values x0 and xn+1 are in the matrix to store the subdifferential of f on (−∞, x1] and [xn,+∞), respectively. The line segment from (x1, s1) to (x0, s0) is extrapolated to infinity, and correspondingly with (xn, sn) and (xn+1, sn+1). To represent f being bounded on the left (resp. right), x0 = x1 (resp. xn+1 = xn) indicates a ray extending vertically to −∞ (resp. +∞). In this case, we set f0 (resp. fn+1) to +∞. Except for f0 and fn+1, all other fi and all xi, si must be finite. Example 4.1.1. The quadratic energy function may be represented by the GPH matrix 0 20 4 0 4  . Example 4.1.2. The absolute value function may be encoded as the GPH matrix −1 0 0 1−1 −1 1 1 1 0 0 1  . Example 4.1.3. The function where f(x) = 0 on x ∈ [−1, 1] and f(x) = x2 − 1 elsewhere may be stored as−2 −1 −1 1 1 2−4 −2 0 0 2 4 3 0 0 0 0 3  . Example 4.1.4. The function where f(x) = x2 for x < 0, f(x) = 0 for x ∈ [0, 1], and f(x) = x− 1 for x > 1 may be represented by the matrix−1 0 1 1 2−2 0 0 1 1 −1 0 0 0 1  . 23 4.1. Representation of PLQ Functions There is no unique GPH matrix representation for a PLQ function. Re- dundant points may be stored in the middle of the matrix, and the two endpoints (x0, s0) and (xn+1, sn+1) can be slid along their respective rays. While normalized GPH rules could be set, this would require normalization after many transformations, as the transformation matrix A can change gph ∂f drastically. Frequent cleaning could incur significant overhead, so we instead suggest that a separate algorithm be provided to “clean” the GPH matrix when desired, removing duplicate points. To represent the special case of the point-indicator function f(x) = ι{x̄}(x) + c, we use the matrix x̄ x̄s0 s1 c c  (with any s0 < s1). This matrix is the vertical line at x = x̄. An affine function f(x) = ax + b is stored as a matrix whose graph is a horizontal line, x0 x1a a ax0 + b ax1 + b  (with any x0 < x1). Likewise, a quadratic f(x) = ax2 + bx+ c is encoded as x0 x12ax0 + b 2ax1 + b ax20 + bx0 + c ax 2 1 + bx1 + c  (with any x0 < x1). Given a matrix x0 x1s0 s1 f0 f1  , if x0 < x1 and s0 < s1, then f is a quadratic function that can be recovered with the following formulas. f(x) = ax2 + bx+ c a = s1 − s0 2(x1 − x0) b = s0 − 2ax0 c = y0 − ax20 − bx0 24 4.2. Goebel’s Graph-Matrix Calculus The case where x0 = x1 and s0 = s1 does not form a valid GPH matrix, as two distinct points are required to extrapolate to a maximal graph. Considering that gph ∂f is stored in the first two rows of the GPH matrix, we are now able to calculate A gph ∂f with a simple matrix multiplication. 4.2 Goebel’s Graph-Matrix Calculus Numerous calculus rules were originally published in [12], then slightly ex- tended in [11], showing how the graph of the subdifferential of a convex function is affected under core convex transformations operating on a single function. Specifically, (i), (ii), (iii), (iv), (vii), and (viii) were given in [12], and (v), (vi), and (ix) were introduced in [11]. Theorem 4.2.1 (Goebel’s Graph-Matrix Calculus for unary operators). Let f : Rd → R be a proper, lower semi-continuous, convex function. Then the following rules hold, for all α > 0, β ≥ 0, and λ ∈ [0, 1]. (i) gph ∂(f?) = [ 0 Id Id 0 ] gph ∂f . (ii) gph ∂(f + βq) = [ Id 0 β Id Id ] gph ∂f . (iii) gph ∂(αf) = [ Id 0 0 α Id ] gph ∂f . (iv) gph ∂(eβf) = [ Id β Id 0 Id ] gph ∂f . (v) gph ∂(αFf) = [ α Id 0 0 Id ] gph ∂f . (vi) gph ∂f(α·) = [ α−1 Id 0 0 α Id ] gph ∂f . (vii) gph Proxβ(f) = [ Id β Id Id 0 ] gph ∂f . (viii) gph ∂(sλf) = [ Id λ Id λ Id Id ] gph ∂f . (ix) gph ∂(Tλf) = [ (1− λ2 ) Id λ2 Id λ 2 Id (1− λ2 ) Id ] gph ∂f . 25 4.2. Goebel’s Graph-Matrix Calculus Proof. Let J = [ 0 Id Id 0 ] . Formula (i) is shown via (x, y) ∈ gph ∂f ⇔ y ∈ ∂f(x)⇔ x ∈ ∂f?(y)⇔ (y, x) ∈ gph ∂f. For (ii), we note that ∂(f + βq) = ∂f + β∂q = ∂f + β Id, thus (x, y) ∈ gph ∂f ⇔ (x, y + βx) ∈ gph ∂(f + βq). We get (iii) using ∂(αf) = α(∂f): y ∈ ∂f(x)⇔ αy ∈ ∂(αf)(x)⇔ αy ∈ α∂f(x)⇔ (x, αy) ∈ gph ∂f. Similarly, ∂(eβf) = ∂(f2β−1q) = ∂(f? + βq)?, and using previous rules yields (iv). To obtain (v), we use the identity αFf = (αf?)?, and applying the two previous rules gives gph ∂(αf?)? = J gph ∂(αf?) = J [ Id 0 0 α Id ] gph ∂f? = J [ Id 0 0 α Id ] J gph ∂f = [ α Id 0 0 Id ] gph ∂f. Using both multiplication rules (iii) and (v), and rewriting f(α·) as α(α−1Ff) leads to gph ∂f(α·) = gph ∂(α(α−1Ff)) = [ Id 0 0 α Id ] [ α−1 Id 0 0 Id ] gph ∂f = [ α−1 Id 0 0 α Id ] gph ∂f, thus showing (vi). From [5, Section 2.2], we have Proxβ f = (Id +β∂f)−1, so gph(Id +β∂f) = [ Id 0 Id β Id ] , and inverting the graph gives (vii). By writing out the definition of sλf and expanding based on previous rules, gph ∂(sλf) = gph ∂((1− λ2)eλf + λq) = [ Id 0 λ Id Id ] [ Id 0 0 (1− λ2) Id ] [ Id λ Id 0 Id ] gph ∂f, and matrix multiplication proves (viii). The proof of (ix) is postponed until after the next theorem. 26 4.2. Goebel’s Graph-Matrix Calculus We also recall rules in [11] for calculation of the transformed subdifferen- tial under the binary operations of addition, infimal convolution, and taking the proximal average. Theorem 4.2.2 (Graph-Matrix Calculus for binary operators). Let f1 and f2 be two proper, lower semi-continuous, convex functions. Then the following rules hold (for i ∈ {1, 2}). (i) Addition rule: If ri dom f1 ∩ ri dom f2 6= ∅, then (x, s) ∈ gph ∂(f1 + f2)⇔ ∃(xi, si) ∈ gph ∂fi such that { x = x1 = x2, s = s1 + s2. (ii) Inf-convolution rule: If ∂f1(x1) ∩ ∂f2(x2) 6= ∅, then (x1 + x2, s) ∈ gph ∂(f12f2)⇔ (xi, s) ∈ gph ∂fi. (iii) Proximal average rule: Let λ ∈ (0, 1) and µ = 1. Assume that Pλ(f1, f2) is exact at x = (1− λ)x1 + λx2, where xi ∈ dom fi for i ∈ {1, 2}. Then: (x, s) ∈ gph ∂Pλ(f1, f2)⇔  s1 ∈ ∂f1(x1) s2 ∈ ∂f2(x2) s = x1 + s1 − x = x2 + s2 = x Proof. As f1, f2 are convex, we have by [14, Corollary XI.3.1.2]: ∂f1(x) + ∂f2(x) = ∂(f1 + f2)(x). Hence, s ∈ ∂(f1 + f2)(x)⇔ s = s1 + s2 for si ∈ ∂f(xi). Similarly, [14, Corollary XI.3.4.2] gives ∂f1(x1)∩ ∂f2(x2) = ∂(f12f2)(x), so s ∈ ∂(f12f2)(x)⇔ ∃x1, x2, s ∈ ∂f1(x1) and s ∈ ∂f2(x2). Using ∂Pλ(f1, f2)(x) = −x+ ⋂ i:λi>0 (∂fi(xi) + xi) from [1, Theorem 7.1], s ∈ ∂Pλ(f1, f2)(x)⇔ { s = x1 + s1 − x , for some s1 ∈ ∂f1(x1); s = x2 + s2 − x , for some s2 ∈ ∂f2(x2). We now include the proof of Theorem 4.2.1(ix). 27 4.3. Transforms of PLQ Functions Proof. Using the definition of Tλf = Pλ(f, q), and that s2 ∈ ∂q(x2)⇔ s2 = x2, we have (x, s) ∈ gph ∂(Tλf)⇔  x = (1− λ)s1 + λs2 s = x1 + s1 − x s = x2 + s2 − x Solving for x and s yields x = (1− λ 2 )x1 + λ 2 s1 s = 1 2 x1 + (1− λ2 )s1 which are the coefficients in Theorem 4.2.1(ix). 4.3 Transforms of PLQ Functions Given a convex PLQ function f in GPH matrix format, Theorem 4.2.1 shows how to compute many of the fundamental unary convex transforms. Having computed A gph ∂f , we still need to recover at least one constant of integration to be able to make use of this new graph. We present ways to transform fi values from ∂f ’s graph to new values in A∂f ’s graph. Unlike the matrices presented by Theorem 4.2.1, the method to compute new fi’s depends highly on the transform being performed. We set our notations first. We will assume we are transforming a con- vex function f given by the GPH matrix [xf , sf , yf ]T , and that we have calculated xg, sg for the transformed function g = Tf given by [xg, sg, yg]T . The notation x.*y denotes elementwise multiplication, and x.^n denotes elementwise exponentiation. Recovery of yg when g = αf or g = f + βq is straightforward: yg = αyf in the first case, and yg = yf + (β/2)xf.^2 in the second. The following two Propositions show how to compute the transforms of other unary operators. Proposition 4.3.1. To compute the Legendre-Fenchel transform, g = f?, yg can be calculated via yg = sf.*xf − yf . Proof. As f is convex, looking for a maximizer of g(s) = f?(s) = maxx(〈s, x〉− f(x)) means finding x, s such that s ∈ ∂f(x). For all i ∈ {0, . . . , n+ 1}, we know that (sf )i ∈ ∂f((xf )i), hence yg = g(xg) = g(sf ) = sf.*xf − yf . 28 4.3. Transforms of PLQ Functions Proposition 4.3.2. Under the Moreau envelope g = eβf , we have that yg = yf + (β/2)sf.^2. Proof. The Moreau envelope can be written as eβf = (f? + β−1q)?. By the previous Proposition as well as Theorem 4.2.1(i), f? := x1s1 y1  =  sfxf sf.*xf − yf  . Then, if we consider the function f? + β−1q, f? + β−1q := x2s2 y2  =  x1s1 + (1/β)x1 y1 + (1/2β)x1.^2  . Conjugating again, g = (f? + β−1q)? := xgsg yg  =  s2x2 s2.*x2 − y2  . Expanding yg yields yg = (β/2)sf.^2. To compute the sum of two convex PLQ functions f and g finite at more than a single point, the x-axis must be repartitioned to include all of the distinct elements in xf and xg. Pseudocode is presented in Algorithm 3. The algorithm requires complexity in time and space linear in the number of pieces of f and g, as Line 3 merges two sorted sequences, after which each partition is only considered once, in O(1) time. Note that additional (linear) work may be required before entering the for loop to determine in which pieces of f and g the entries of x fall. The proximal average also requires more work to implement than the unary operators we have shown. Proposition 4.3.3. Let f1 = [x1, s1, y1] and f2 = [x2, s2, y2] be convex PLQ functions in GPH matrix format, and λ ∈ [0, 1]. Define the operator P by P = (Id +∂f2)−1(Id +∂f1). Then the proximal average Pλ(f1, f2) = [x, s, y] is given by: x = (1− λ)x1 + λPx1 s = x1 + s1 − x = x2 + s2 − x y = (1− λ)(y1 + 12x1.^2) + λ(f2(Px1) + 1 2 (Px1).^2)− 12x.^2 29 4.3. Transforms of PLQ Functions Algorithm 3: gph add Input: f, g (convex PLQ functions, in GPH matrix format) Output: h (a convex PLQ function, in GPH matrix format) begin1 h← [];2 x← the ascending distinct partition points in xf and xg;3 for each interval [xi, xi+1] produced by the partition x do4 Append to h a column for the right slope at xi by adding the5 corresponding right slopes and function values of f and g at xi; Append to h another column for the left slopes and function6 values at xi+1; end7 if f(x) =∞ or g(x) =∞ for x < x1 then8 Bound h on the left with a column [x1, s1 − 1,∞]T ;9 end10 if f(x) =∞ or g(x) =∞ for x < xn then11 Bound h on the right with a column [xn, sn + 1,∞]T ;12 end13 h← UniqueColumns(h);14 end15 30 4.4. Comparison of Algorithms Proof. In order to calculate the proximal average at x, we must find the x1, x2 that make the proximal average exact. To do this, we parametrize the graph with respect to x1 and find the corresponding x2. The formula for s was stated in Theorem 4.2.2. From s, we can see that x1 + s1 = x2 + s2, so (Id +∂f1)(x1) = (Id +∂f2)(x2). Thus x2 = Px1 gives the points that correspond to x1, i.e. x2 = (Id +∂f2)−1(Id +∂f1)(x1). By using a reformulation of the proximal average [1, Proposition 4.3], Pλ(f1, f2) = inf x=(1−λ)x1+λx2 ((1− λ)(f1 + q)(x1) + λ(f2 + q)(x2)− q(x)) , and substituting x2, we arrive at the formula for y. It is important to note that (y1 + 12x1.^2), (f2(Px1) + 1 2(Px1).^2), and (12x.^2) are independent of λ. These values can be precomputed to reduce the time needed to compute the proximal average for various λ and fixed f1, f2, x. The proximal average PLQ algorithm does not use such a pre- computation scheme. This technique was demonstrated in [11] to compute Pλ(12‖ · ‖2, ι{0} + 1) for many λ (Figure 4.1), and significant performance increases were noticed. The PLQ algorithm required 117 seconds, while the GPH algorithm for multiple λ took only 33 seconds. 4.4 Comparison of Algorithms To compare the performance of GPH algorithms with existing algorithms, we implemented many of these GPH algorithms alongside their PLQ coun- terparts in [18]. The results may be seen in Figures 4.2 and 4.3. We included four algorithms in our experiments. PLQ is the existing algorithm that works with PLQ functions directly. GPH is our new algo- rithm that works with GPH matrices. In Figure 4.2, LLT is the Linear-time Legendre Transform [16]. Finally, OPT is an algorithm using the n1fc1 non- smooth bundle method solver built into Scilab 5.1.1 [6] with a warm-start from the previously computed point. First, we look at computing the conjugate, (x4)?, in Figure 4.2. The general OPT algorithm does not perform as well as specialized algorithms (Figure 4.2(a)), and removing OPT from the picture makes it clear that, although both algorithms clearly require linear time, GPH is outperforming PLQ significantly (Figure 4.2(b)). The PLQ conjugate algorithm makes use 31 4.4. Comparison of Algorithms Figure 4.1: The proximal average of f1(x) = 12‖ · ‖2 and f2(x) = ι{0}(x) + 1 as λ ranges from 0 (blue) to 1 (red). of a “clean” function to normalize its matrix, whereas this is not necessary for the GPH matrix. The GPH algorithm, which produces an exact transformed model, shows comparable performance to the inexact LLT algorithm. We also used the proximal average algorithm for comparison (Figure 4.3). The results were much the same: OPT was the slowest algorithm; PLQ performed much better; and GPH showed significant improvement over both algorithms. All experiments were performed on the same computer as in Section 3.3. 32 4.4. Comparison of Algorithms 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500 3000 3500 4000 4500 n R u n  t i m e  ( s e c ) plq gph llt opt plq gph llt opt (a) All conjugate algorithms. -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 n R u n  t i m e  ( s e c ) plq gph llt plq gph llt (b) With OPT removed. Figure 4.2: Comparison of conjugate algorithms computing (x4)? approxi- mated by n-piece piecewise linear functions on [−10, 10]. 33 4.4. Comparison of Algorithms 0.0 0.5 1.0 1.5 2.0 2.5 100 200 300 400 500 600 700 n R u n  t i m e  ( s e c ) plq gph opt plq gph opt Figure 4.3: Comparison of proximal average algorithms computing P 1 2 (x4, ex) approximated by n-piece piecewise linear functions on [−10, 10]. 34 Chapter 5 Conclusion We have provided two means to compute the convex hulls of continuous piecewise linear-quadratic functions. The first algorithm requires treating a number of special cases at each iteration, but runs in optimal linear time. The second requires worst-case quadratic time and runs slower than the first even when it achieves its best-case linear time, but requires less effort to implement. We also showed that by storing the graph of the subdifferential of a PLQ function in matrix form, it is possible to build a class of algorithms that computes many convex transforms of convex PLQ functions via ma- trix multiplication, followed by a linear-time calculation of the value of the transformed function. We included complete algorithms for scalar multi- plication, addition of the scaled energy function, addition of two functions, the Legendre-Fenchel transform, the Moreau envelope, and the proximal average. Because these calculations are matrix and elementwise vector op- erations, and because no subalgorithms are necessary, we experienced sig- nificant performance gains over algorithms working with PLQ functions di- rectly. For both sets of algorithms, we experimentally validated our results with implementations in the Computational Convex Analysis toolbox [18]. Directions for future work include extending the univariate PLQ frame- work to model bivariate PLQ functions. Representation of the domain of a function becomes critical, and it is necessary to find a representation that aids the computation of the conjugate, from which other operators may be built. Computing the convex hull of a bivariate PLQ function is also im- portant, however certain techniques can be ruled out: the duality approach of Section 3.2 cannot be directly applied, as the pointwise maximum of two convex bivariate PLQ functions, while convex, may fail to be PLQ (see Figure 5.1). Another direction to take would be to extend the Graph-Matrix Cal- culus algorithms to work with nonconvex functions. Additionally, instead of relying on convex hull algorithms to be able to use convex algorithms on nonconvex functions, individual algorithms may be extended to handle 35 Chapter 5. Conclusion -5 -4 -3 -2 -1 0 X 0 -5 1 -4 -3 2 -2 -1 0 3 1 Y 2 4 3 4 5 5 5 10 Z 15 20 25 Figure 5.1: f(x, y) = 12(x 2 + y2). g(x, y) = 4. max(f, g) is not a PLQ function, as one piece of its domain is a circle about (0, 0), which is not polyhedral as the PLQ class requires. nonconvex cases as well. 36 Bibliography [1] H. H. Bauschke, R. Goebel, Y. Lucet, and X. Wang. The proximal average: Basic theory. SIAM Journal of Optimization, 19(2):766–785, July 2008. [2] H. H. Bauschke, Y. Lucet, and M. Trienis. The piecewise linear- quadratic model for computational convex analysis. Computational Optimization and Applications, 43(1), May 2009. [3] H. H. Bauschke, E. Matoukov, and S. Reich. Projection and proximal point methods: convergence results and counterexamples. Nonlinear Analysis: Theory, methods, and Applications, 56(5):715–738, 2004. [4] K. P. Bennett and E. Parrado-Hernández. The interplay of optimization and machine learning research. J. Mach. Learn. Res., 7:1265–1281, 2006. [5] P. L. Combettes and J.-C. Pesquet. A proximal decomposition method for solving convex variational inverse problems. Inverse Problems, 24(6), 2008. [6] Scilab Consortium. Scilab. http://www.scilab.org/, 2010. [7] H. Edelsbrunner. Algorithms in Combinatorial Geometry. Springer- Verlag, 1987. [8] P. F. Felzenszwalb and D. P. Huttenlocher. Distance transforms of sampled functions. Technical Report TR2004-1963, Cornell Computing and Information Science, September 2004. [9] B. Gardiner and Y. Lucet. Numerical computation of Fitzpatrick func- tions. Journal of Convex Analysis, 16(4), January 2009. [10] B. Gardiner and Y. Lucet. Convex hull algorithms for piecewise linear- quadratic functions in computational convex analysis. Submitted 2010. 37 Bibliography [11] B. Gardiner and Y. Lucet. Graph-matrix calculus for computational convex analysis. Submitted 2010. [12] R. Goebel. Self-dual smoothing of convex and saddle functions. Journal of Convex Analysis, 15:179–190, 2008. [13] W. L. Hare. A proximal average for nonconvex functions: A proximal stability perspective. SIAM Journal on Optimization, 20(2):650–666, May 2009. [14] J.-B. Hiriart-Urruty and C. Lemarchal. Convex Analysis and Mini- mization Algorithms, volume 305–306 of Grundlehren der Mathema- tischen Wissenschaften [Fundamental Principles of Mathematical Sci- ences]. Springer-Verlag, 1993. [15] V. Koch, J. Johnstone, and Y. Lucet. Convexity of the proximal aver- age. Submitted June 29, 2009. [16] Y. Lucet. Faster than the fast Legendre transform, the linear-time Legendre transform. Numerical Algorithms, 16(2):171–185, 1997. [17] Y. Lucet. Fast Moreau envelope computation I: Numerical algorithms. Numerical Algorithms, 43(3):235–249, November 2006. [18] Y. Lucet. Computational Convex Analysis. http://people.ok.ubc. ca/ylucet/cca, 2010. [19] S. M. Moffat. On the kernel average for n functions. MSc thesis, University of British Columbia Okanagan, 2009. [20] R. T. Rockafellar. Convex Analysis. Princeton University Press, Prince- ton, New York, 1970. [21] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer- Verlag, Berlin, 1988. [22] M. Trienis. Computational convex analysis: From continuous defor- mation to finite convex integration. MSc thesis, University of British Columbia Okanagan, January 2007. [23] J. Yu, S. V. N. Vishwanathan, S. Günter, and N. N. Schraudolph. A quasi-newton approach to nonsmooth convex optimization problems in machine learning. J. Mach. Learn. Res., 11:1145–1200, 2010. 38

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 10 4
Germany 4 0
Republic of Korea 4 0
Canada 3 0
France 1 0
China 1 34
City Views Downloads
Unknown 11 0
Ashburn 9 0
Vancouver 1 0
Sunnyvale 1 0
Beijing 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items