Computation of Convex Conjugates inLinear Time using Graph-MatrixCalculusbyTasnuva HaqueB.Sc., Rajshahi University of Engineering and Technology (RUET), 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE COLLEGE OF GRADUATE STUDIES(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)March 29, 2017c© Tasnuva Haque, 2017The undersigned certify that they have read, and recommend to the Col-lege of Graduate Studies for acceptance, a thesis entitled:Computation of Convex Conjugates in Linear Time using Graph-Matrix Calculussubmitted by Tasnuva Haque in partial fulfilment of the requirements ofthe degree of Master of Science.Dr. Yves Lucet, Irvin K.Barber School of Arts and Sciences, Unit 5, Computer ScienceSupervisor, Professor (please print name and faculty/school above the line)Dr. Heinz Bauschke, Irvin K.Barber School of Arts and Sciences, Unit 5, MathematicsSupervisory Committee Member, Professor (please print name and faculty/school abovethe line)Dr. Shawn Wang, Irvin K.Barber School of Arts and Sciences, Unit 5, MathematicsSupervisory Committee Member, Professor (please print name and faculty/school abovethe line)Dr. Abbas S. Milani, Faculty of Applied Science, School of EngineeringUniversity Examiner, Professor (please print name and faculty/school above the line)March 29, 2017(Date Submitted to Grad Studies)iiAbstractComputational Convex Analysis focuses on computing the convex oper-ators which are used very often in convex analysis. The Fenchel conjugateis one of the most frequently used convex operator. The objective of thisthesis is to develop an algorithm for computing the conjugate of a bivariatePiecewise Linear-Quadratic (PLQ) function. Although some algorithms al-ready exist for computing the conjugate of a bivariate PLQ function, theirworst-case time complexity is not linear. Our challenge is to improve theworst case time complexity to linear.We use a planar graph to represent the entities of a PLQ function. Eachnode of the graph contains a GPH matrix which represents an entity of thePLQ function . In addition, we store the adjacency information and type ofall entities. We traverse the graph using breadth first search and computethe conjugate of each entity. Moreover we store the information of visitedentities using a binary flag. So we never need loop through all entities tocheck whether it is already visited. As a result we get linear computationtime in the worst case.We perform numerical experiment which confirms that our algorithmis running in linear-time. We provide a comparison of the performance ofthis algorithm with the previous algorithms. Finally we explained how toextend this algorithm in higher dimensions while keeping the worst-case timecomplexity linear.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . ixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Algorithms for computing the conjugate of an univariate con-vex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Algorithms for computing the conjugate of a bivariate convexfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Chapter 2: Preliminaries . . . . . . . . . . . . . . . . . . . . . . 82.1 Polyhedral set . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Piecewise Linear-Quadratic function . . . . . . . . . . . . . . 102.3 Subdifferential of a function . . . . . . . . . . . . . . . . . . . 132.4 Planar graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Chapter 3: Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 203.1 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.1 GPH matrix . . . . . . . . . . . . . . . . . . . . . . . 203.1.2 Adjacent entities . . . . . . . . . . . . . . . . . . . . . 313.1.3 Entity Type . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32ivTABLE OF CONTENTS3.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.1 Space complexity . . . . . . . . . . . . . . . . . . . . . 403.3.2 Time complexity . . . . . . . . . . . . . . . . . . . . . 41Chapter 4: Numerical experiments . . . . . . . . . . . . . . . . 434.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3 Partition of domain using a grid of points . . . . . . . . . . . 554.3.1 Performance Comparison . . . . . . . . . . . . . . . . 57Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 60Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62vList of TablesTable 1.1 The worst-case performance of the algorithms devel-oped for computing the conjugate of a bivariate PLQfunction. . . . . . . . . . . . . . . . . . . . . . . . . . 7Table 3.1 Flag for x and s. . . . . . . . . . . . . . . . . . . . . . 22Table 3.2 Entity type. . . . . . . . . . . . . . . . . . . . . . . . 32Table 3.3 Iterations of Algorithm 1. . . . . . . . . . . . . . . . . 37Table 3.4 Mapping of the primal entity to the dual entity for thel1 norm function. . . . . . . . . . . . . . . . . . . . . 39Table 4.1 Mapping of the primal entity to the dual entity. . . . 50Table 4.2 Comparison of computation time using the algorithmfrom [Kha13] and Algorithm Compute PLQ Conjugate.All the times are in seconds. . . . . . . . . . . . . . . . 58viList of FiguresFigure 2.1 A convex function. . . . . . . . . . . . . . . . . . . . 9Figure 2.2 Examples of a polyhedral decomposition and polyhe-dral subdivision. . . . . . . . . . . . . . . . . . . . . . 11Figure 2.3 Example of a bivariate PLQ function - the l1 normfunction. . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 2.4 Vertex in R2. . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.5 Red points are the extreme points for the set S =coB(−1, 0) ∪ B(1, 0) where B(1, 0) is the closed ball ofradius 1 centered at (1, 0). . . . . . . . . . . . . . . . 14Figure 2.6 The subgradient of a function. . . . . . . . . . . . . 15Figure 2.7 The subgradient of a convex function. . . . . . . . . . 16Figure 2.8 (a)The absolute value function and (b) the subdiffer-ential ∂f(x) of the absolute value function. . . . . . . 17Figure 2.9 Geometric interpretation of a conjugate function . . 18Figure 2.10 Example of planar and non-planar graph. . . . . . . . 19Figure 3.1 The domain of the l1 norm function . . . . . . . . . . 21Figure 3.2 Entity graph for the l1 norm function . . . . . . . . . 21Figure 3.3 (a)The domain of a PLQ function f , we use blue dotto indicate an extreme point in the primal (b)Thesubdifferential of f , green dots are used to representthe extreme points in the dual. . . . . . . . . . . . . . 24Figure 3.4 Example of no inequality. We use dotted lines to indi-cate the artificial polyhedral decomposition. We rep-resent extreme points by using blue and non-extremepoints by using orange color. . . . . . . . . . . . . . . 25viiLIST OF FIGURESFigure 3.5 Example of polyhedral sets with one inequality. (a)A half space. To represent it we assume a ray inthe half space and then select the extreme and thenon-extreme points. (b) A ray which contains an ex-treme point. To represent it completely we pick anon-extreme point. (c) A line. We represent a lineusing two rays. . . . . . . . . . . . . . . . . . . . . . 26Figure 3.6 Example of the polyhedral sets with two hyperplanes.(a) The polyhedral set if the hyperplanes are parallel(b) The polyhedral set if two hyperplanes intersect. . 27Figure 3.7 Example of polyhedral sets with three inequalities. . 27Figure 3.8 Visualization of the data structure . . . . . . . . . . . 34Figure 3.9 Traversing the entity graph of the l1 norm functionusing Algorithm 1. . . . . . . . . . . . . . . . . . . . . 38Figure 3.10 The partition of domain of the l1 norm function andits conjugate. . . . . . . . . . . . . . . . . . . . . . . . 39Figure 4.1 The function f(x1, x2) from Equation 4.1 . . . . . . 44Figure 4.2 Partition of dom f which is not a polyhedral subdi-vision. . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4.3 Partition of dom f which is a polyhedral subdivision. 45Figure 4.4 The conjugate f∗(s1, s2). . . . . . . . . . . . . . . . . 48Figure 4.5 Partition of the domain of f∗. . . . . . . . . . . . . . 48Figure 4.6 The 2D energy function. . . . . . . . . . . . . . . . . 51Figure 4.7 The domain of 2D energy function. The dotted linesindicate that the partitions are not real partitions. . . 52Figure 4.8 The approximation of univariate PLQ function f1(x1). 56Figure 4.9 The approximation of function f(x1, x2) = x41 + x42. . 56Figure 4.10 The time complexity for computing the conjugate ofthe function from Example 4.3 where dom f is parti-tioned into a grid. . . . . . . . . . . . . . . . . . . . . 57Figure 4.11 The time complexity for computing the conjugate ofthe function from Example 4.3 using the algorithmfrom [Kha13] and the proposed algorithm . . . . . . 59viiiAcknowledgmentIn the name of Allah, the Most Gracious and the Most Merciful.Alhamdulillah, all praises to Allah for giving me the opportunity, the strengthand the patience to complete this thesis and overcome all the challenges anddifficulties.I would like to express my sincere gratitude to my supervisor, Dr. YvesLucet, for the guidance, encouragement and advice he has provided through-out the progression of my graduate studies. I have been extremely lucky tohave a supervisor who cared so much about my work, and who responded tomy questions and queries so promptly. His positive outlook and confidencein my research inspired me and gave me confidence.I would also like to extend my enduring gratitude to the members ofmy examination committee: Dr. Heinz Bauschke and Dr. Shawn Wang fortheir insightful comments on my thesis. I am also thankful to Dr. AbbasS. Milani, my university examiner, for spending his valuable time readingmy thesis and providing constructive feedback. Thanks to the committeemembers for letting my defense be an enjoyable moment.I truly appreciate the support of all amazing colleagues and friends ofmy research group. I am really proud to be a part of this research group.I would also like to thank the Bangladesh community in Kelowna for theirefforts and selfless care.I would like to take this opportunity to acknowledge and express mygratitude towards my parents (Md. Manowarul Haque and Fouzia Jesmin),parents-in-law, sister (Tasnima Haque) and other family members for theircontinuous prayers, inspiration and support. I owe thanks to a very specialperson, my husband, Md. Nobinur Rahman, for his unconditional supportand understanding during this journey that made the completion of thethesis possible. Thanks to my baby, Nuayma, for being one of my sourcesof inspiration in completing this thesis.Finally, I am also thankful to the University of British Columbia, Okana-gan and Natural Sciences and Engineering Research Council of Canada(NSERC) for providing financial support.ixDedicationTo my husbandMd. Nobinur Rahman&my daughterNuayma Mahreen Rahman.xChapter 1IntroductionComputational Convex Analysis (CCA) introduces several algorithmsfor studying operators commonly encountered in Convex Analysis. Amongthem the addition, the scalar multiplication, the Moreau envelope (alsoknown as Moreau-Yosida approximate or Moreau-Yosida regularization),the (Legendre-Fenchel) conjugate are considered as the core convex oper-ators [GL13, Luc10]. These operators are used to develop new theoriesand results in convex optimization [Luc13]. One application of CCA isComputer-Aided Convex Analysis, which focuses on visualizing these oper-ators when applied to functions of typically one or two variables.Computational Convex Analysis has numerous applications in the field ofscience and engineering such as network flow, electrical networks, robot nav-igations, image processing, computer vision, partial differential equations,network communications etc. For example, in image processing, convex op-erators are used to compute the Euclidian distance transform, generalizeddistance transform and morphological operators [Luc10]. The algorithmsfor computing the Euclidian distance transform are developed based on theconcept of the Legendre-fenchel transform and the Moreau envelope. Thesealgorithms achieve linear-time complexity by combining separability of theEuclidian distance transform with convex properties [Luc09]. See [Luc10]for more details on how CCA facilitates all of these applications.Convex analysis has several advantages over nonconvex analysis. One ofthe advantages of a convex function is that the global optimality is guar-anteed by local optimality. Nonsmooth functions can be manipulated usingconvex calculus. Moreover several efficient numerical algorithms are avail-able.To calculate the convex operators and perform several transforms byimplementing convex algorithms, a Computational Convex Analysis (CCA)toolbox has been developed. The current version of the CCA toolbox hasbeen built by using Scilab [SCI], a free and open source software. The CCAtoolbox is almost complete for univariate functions.11.1. Motivation1.1 MotivationIn this thesis we compute the convex conjugate which is one of the mostfrequently used convex operator. Convex conjugate functions play a signifi-cant role in duality. Consider the primal optimization problemp = infx∈Rd [f(x) + g(Ax)].The dual optimization problem isd = supy∈Rd [−f∗(AT y)− g∗(−y)].Here f∗ is the Fenchel conjugate of f . Fenchel’s duality theorem statesthat under some constraint qualification conditions, p=d and the solutionof one problem can be computer from the solution of the other. Computingthe conjugate of a function is one of the key step to solve a dual optimizationproblem [Her16].There is a close relationship between the conjugate and the Moreauenvelope [Luc06, HUL07]. The Moreau envelope of a convex function isMλ(s) = infx∈Rd [f(x) +||x− s||22λ].One of the important properties of the Moreau envelope is regularizationproperty [Luc10]. The Moreau envelope regularize a nonsmooth functionwhile keeping the same local minima [PW16]. The set of point at which theinfimum is found is called the proximal mapping which is the basis of manyconvex and nonconvex optimization techniques e.g. the bundle method.The application of conjugate function discussed above inspired us todevelop an algorithm to compute the convex conjugate. Although somealgorithms are available for computing the conjugate of the univariate orbivariate convex functions, we are interested in improving their performance.In Section 1.2 we discuss the performance of available algorithms for theunivariate function.1.2 Algorithms for computing the conjugate ofan univariate convex functionThe field of Computational Convex Analysis was born with developingthe algorithms to compute the Legendre-Fenchel transform, see also refer-ences in [Luc10] for a more complete history. The Fenchel conjugate op-erator, one of the most fundamental convex operators, is used to calculate21.2. Algorithms for computing the conjugate of an univariate convex functionthe dual properties of a convex function [Luc10]. Symbolic computationalgorithms have been developed for computing the conjugate operator ofunivariate or multivariate convex functions. These algorithms involve calcu-lating the Fenchel conjugate by differentiating a function under the supre-mum and computing an equation which is satisfied by all the critical points.However, the problem is that sometimes the computations of the criticalpoints involves a nonlinear equation that is difficult or impossible to solvesymbolically e.g. polynomial of degree 5 or higher degree.The fast Legendre transform (FLT) algorithms were developed for com-puting the conjugate using piecewise-linear approximations [Luc97, Luc13].The FLT algorithms are also used to compute the Moreau envelope and theproximal average. The Moreau envelope can be computed from the con-jugate operator and it converts a nonsmooth optimization problem into asmooth problem without changing the minimum of the problem [Luc06].The worst-case time complexity of FLT algorithms was log-linear. Thetime complexity of the FLT algorithms was later improved by the Linear-time Legendre transform (LLT) algorithm, which runs in linear time [Luc97].The LLT algorithms have a wide range of applications in network com-munication, robotics, numeric simulation of multiphasic flows, pattern recog-nition, and computation of distance transforms [Luc10]. Unfortunately, be-cause of the curse of dimensionality the application of FLT and LLT algo-rithms are limited to a small number of variables [GL13].The Parabolic Envelope (PE) algorithm was developed as an alternativeto the LLT algorithms to compute the conjugate and the Moreau enve-lope [FH12, Luc10]. However the efficiency of the PE and the LLT algo-rithms are same.A new class of algorithm based on Piecewise Linear-Quadratic (PLQ)functions was developed to remove the restrictions of the symbolic algorithmand the fast algorithms [LBT09]. A function is called a PLQ function if itsdomain can be partitioned into a finite number of polyhedral sets, on each ofwhich the function is either linear or quadratic [RW09]. PLQ functions havesome interesting properties. Consider a convex univariate PLQ function f(x)with k pieces,f(x) =Q0x2 + q0x+ α0, if x ≤ x0,Q1x2 + q1x+ α1, if x0 ≤ x ≤ x1,......Qk−1x2 + qk−1x+ αk−1, if xk−2 ≤ x ≤ xk−1,Qkx2 + qkx+ αk, otherwise,31.2. Algorithms for computing the conjugate of an univariate convex functionwhere, Q is the quadratic term, q is the linear term and α is the constantterm. It is stored using a PLQ matrix as followsPLQ matrix of f(x) =x0 Q0 q0 α0x1 Q1 q1 α1............xk−1 Qk−1 qk−1 αk−1+∞ Qk qk αk .If convex operators like the conjugate, the Moreau envelope or the prox-imal average are applied to a PLQ function then the resulting function isalso a PLQ function [Kha13]. PLQ functions are closed under all convexoperators because they allow quadratic pieces [Luc13, Luc10, GKL14]. An-other advantage of the PLQ functions is that the representation and theevaluation of a PLQ function is simple. PLQ algorithms are as efficientas the fast algorithms but provide greater precision. Usually PLQ basedalgorithms have less numeric errors than fast algorithms especially with un-bounded domain. In addition, efficient PLQ algorithms are available for thestandard convex univariate operators.Another class of algorithms for PLQ functions, called GPH algorithms, isbased on graph-matrix calculus [Goe08]. The idea of GPH algorithms comesfrom the linear relation between the graph of the subdifferential and thegraph of the function of a convex operator [GL11, Kha13]. GPH algorithmsare used to compute the graph of a transform. These algorithms storethe graph of the subdifferential and the function value at a finite set ofpoints [Luc13].The GPH representation of f(x) is,GPH matrix of f(x) =x0 x1 x2 · · · xk xk+1s0 s1 s2 · · · sk sk+1y0 y1 y2 · · · yk yk+1where xi are the points in the domain of f , si are the subgradients of f atxi and yi is the function value at xi.The advantage of using the GPH matrix over the PLQ matrix is that aGPH matrix stores the graph of the subdifferential of a PLQ function whichsimplifies the computation of the most common convex operators [GL11].The performance of GPH algorithms is similar to PLQ algorithms (linear-time) but they are simpler. The GPH algorithms become especially efficientwhen implemented in matrix-based mathematical software like MATLABor Scilab. For example, the computation of the convex conjugate operator41.3. Algorithms for computing the conjugate of a bivariate convex functionbecomes a matrix multiplication operation. Consider the following GPHmatrix representation of f(x),GPH matrix of f(x) =xsy . (1.1)According to Goebel’s Graph-Matrix calculus rules [Goe08], the graph ofthe subdifferential of the conjugate of f is defined by the following formula,gph ∂(f∗) =[0 IdId 0]gph ∂(f).Applying this formula, the GPH matrix of the conjugate of f , denoted byf∗, is,GPH matrix of f∗(x) =[0 11 0]∗[xs]sTx− y ,= sxsTx− y ,(1.2)where ∗ is used for the standard matrix multiplication.The performance of the CCA toolbox can be extended by including thealgorithms for computing convex operators in higher dimension. For exam-ple, algorithms for computing the conjugate of a bivariate convex functionare already available . In the next section we briefly discuss their perfor-mance and the limitations.1.3 Algorithms for computing the conjugate of abivariate convex functionThe first algorithm to compute the conjugate of a convex bivariate PLQfunction was developed in [GL13]. The algorithm is implemented in Scilabusing the Computational Geometry Library CGAL [CGA]. In this algo-rithm, a planar arrangement is used to store the entities (vertices, edgesand faces) of a PLQ function. The dual arrangement is computed by loop-ing through the entities of the planar arrangement.The key idea of the algorithm proposed in [GL13] is to loop through thevertices and compute the conjugate of the edges associated with a vertex.51.3. Algorithms for computing the conjugate of a bivariate convex functionFor every primal edge, the algorithm loops through all the gradients asso-ciated with an edge and compute its dual vertex with a vertex index. Thealgorithm never creates duplicate vertices and edges. However one of thechallenges is to compute whether a vertex is duplicate or not and assign anindex if it is unique.A binary search tree is used to detect the duplicate points which hasO(N log N) worst-case time complexity. This algorithm can achieve alinear-time expected-case complexity if it is implemented using a hash tableor a linear-time worst-case time complexity for a worst-case trie (assumingbounded bit length).For a convex bivariate PLQ function, the full conjugate can be computedusing the partial conjugate. The partial conjugate is the conjugate withrespect to one of the variables. Partial conjugates are also PLQ functionsbut not necessarily convex [Roc70] e.g. the partial conjugate of the l1 norml1(x1, x2) = |x1| + |x2| is (l1)∗1(s1, x2) = ι[−1,1](s1) − |x2|. In [GKL14], thefull conjugate of a bivariate PLQ function is determined more easily byusing two partial conjugates. However, the complexity of the full conjugatecomputation algorithm and the partial conjugate computation algorithm isthe same.Another algorithm, based on parametric programming, is developedin [Kha13] to compute the conjugate of convex bivariate PLQ functions.Parametric programming is a method for computing the effect of pertur-bations in mathematical programs. Optimization problems that depend onone scalar parameter are called parametric programs while problems thatdepend on several parameters are called multi-parametric programs. Solvingthem involves computing the solution of an optimization problem for everyvalue in the parameter space [BBM03, PS11]. A parametric programmingproblem is categorized as a parametric linear program (pLP), a paramet-ric quadratic program (pQP), a parametric non-Linear program (pNLP), aparametric linear complementary problem (pLCP), etc.The algorithm in [Kha13] combines a Parametric Quadratic program-ming (pQP) approach with computational convex analysis. The input andoutput PLQ function is stored using a list representation which is internallyconverted into a face-constraints adjacency representation. In this algorithma planar graph is used to store the entities of a PLQ function and adjacencyinformation of the entities is not stored.The algorithm computes the vertices using Mulmuley’s segment inter-section algorithm [Mul88] which requires a log-linear time complexity in theworst-case. The time required to compute the remaining entities i.e. raysand segment is log-quadratic because for every vertex, the algorithm loops61.3. Algorithms for computing the conjugate of a bivariate convex functionthrough all adjacent edges which requires log-quadratic time in the worst-case. However, according to [Kha13, Theorem 5.7], the worst-case timecomplexity can be improved by using a half edge data structure as this datacontains the ordering information of the vertices. This algorithm still needsto detect the duplicate entities, which takes log-linear time.In total, the time required to compute the conjugate in [Kha13] withoutbreaking the similarity of input and output data structure is log-linear.A summary of the algorithms for computing the conjugate of a bivariatePLQ function is presented in Table 1.1.Table 1.1: The worst-case performance of the algorithms developed for com-puting the conjugate of a bivariate PLQ function.Algorithm Source Time Data Structure SpaceFull conjugate [GL13] Log-linear Red-Black Tree Linear(geometric alg.) Linear Trie (bit length Linearis bounded)Expected Linear Hash table LinearFull conjugate [Kha13] Log-linear List Linear(parametric opt.)Partial conjugate [GKL14] Log-linear Red-Black Tree Linear(geometric alg.)From Table 1.1, we can conclude that no algorithm is known so far tocompute the conjugate of a bivariate PLQ function in linear-time.In this thesis, we present a new algorithm to compute the conjugate ofa bivariate PLQ function using graph-matrix calculus and fully leveragingthe neighbour graph. To our knowledge, this algorithm is the first linear-time algorithm to compute the conjugate of a bivariate PLQ function. Weimplement our algorithm in Scilab [SCI] for bivariate PLQ functions.This thesis is organized as follows. We discuss the background and mo-tivation of computing the convex conjugate operator in Chapter 1. All thebasic notations and definitions required to explain our algorithm are in-cluded in Chapter 2. Our proposed algorithm is explained with an examplein Chapter 3. In this chapter, we also include a detailed explanation of thedata structure we used and the complexity of the algorithm. Results of thenumerical experiments are included in Chapter 4. Chapter 5 concludes thethesis.7Chapter 2PreliminariesIn this chapter, we discuss some basic definitions, properties and exam-ples required to explain our algorithm.Definition 2.1. [Effective domain] Consider f : Rd → R ∪ {+∞}. Theeffective domain of a function f , denoted by dom f is,dom f = {x ∈ Rd : f(x) < +∞}.Definition 2.2. [Proper function] A function f is said to be a proper func-tion if domf 6= ∅ and f(x) > −∞ for all x ∈ Rd.Definition 2.3. [Convex function] A proper function f : Rd → R ∪ {+∞}is convex if its domain is a convex set and for any two points x1, x2 ∈ dom fand θ ∈ [0, 1]f(θx1 + (1− θ)x2) ≤ θf(x1) + (1− θ)f(x2).Figure 2.1 illustrates an example of a convex function in which the linesegment passing through any two points of dom f lies above or on the graphof the function.Definition 2.4. [Convex Hull] Consider a set C ⊂ Rd. The closed convexhull of C is the smallest closed convex set that contains C.The closed convex hull of C is denoted by co (C) and can be computedas the closure ofco (C) = {n∑i=0λixi : xi ∈ C, λi ≥ 0,n∑i=0λi = 1, n ≥ 0}.Definition 2.5. [Lower semi-continuous function] A function f : Rd →R ∪ {−∞,+∞} is said to be lower semi-continuous (l.s.c) if,limx→x¯ inf f(x) ≥ f(x¯) for all x¯ ∈ Rd.82.1. Polyhedral setFigure 2.1: A convex function.Definition 2.6. [Additively separable function] A function f : Rd → R ∪{+∞} of n variables is called an additively separable function if for somesingle variable functions f1(x1), f2(x2), · · · , fn(xn) it can be represented asf(x1, x2, · · · , xn) =f1(x1) + f2(x2) + · · ·+ fn(xn).Otherwise the function f is said to be an additively non-separable function.2.1 Polyhedral setDefinition 2.7. [Polyhedral set] A polyhedral set in Rd is the intersectionof finitely many half-spaces.It is usually denoted as C = {x ∈ Rd : aiTx ≤ bi} where ai ∈ Rd,bi ∈ R, i = 1, · · · ,m. We have the following definitions regarding C.Definition 2.8. [Relative interior] The relative interior of a set C, denotedby ri C, is the interior of the set C with respect to the smallest affine setcontaining C.Definition 2.9. [Affinely independent] A set X ⊆ Rd, X 6= ∅ is said to beaffinely independent if no vector x ∈ X is expressible as an affine combina-tion of the vectors in X \ {x} and can not be represented as ∑ni=1 αixi = 0where the constants αi are not all zeros.Definition 2.10. [Dimension of a polyhedral set] The dimension of a poly-hedral set C, denoted by dimC, is the maximum number of affinely inde-pendent points in C minus 1.92.2. Piecewise Linear-Quadratic functionFor example, if C consists of a single point then dimC = 0 and if C isfull dimensional then dimC = d.Definition 2.11. [Proper face, Improper face] A face of a convex set C isa nonempty subset, F , of C with the property that if x1, x2 ∈ C, θ ∈ (0, 1)and θx1 + (1− θ)x2 ∈ F then x1, x2 ∈ F . A face, F , that is strictly smallerthan C is called a proper face. Otherwise F is said to be an improper face.Definition 2.12. [Polyhedral decomposition] The set C = {Ck : k ∈ K},where K is a finite index set, is called a polyhedral decomposition of D ⊆ Rdif it satisfies the following conditions(i) all of its members Ck are polyhedral sets,(ii)⋃k∈K Ck = D,(iii) for all k ∈ K, dimCk = dimD,(iv) ri Ck1 ∩ ri Ck2 = ∅, where k1, k2 ∈ K, k1 6= k2.Definition 2.13. [Polyhedral subdivision] The set C is a polyhedral sub-division if C is a polyhedral decomposition and the intersection of any twomembers of C is either empty or a common proper face of both.Example 2.14. Let C = {D1,D2,D3,D4} with D1∪D2∪D3∪D4 = D be acollection of polyhedral sets. In Figure 2.2(a), the collection C is a polyhedraldecomposition but not a polyhedral subdivision becauseD1∩D3 or D2∩D3 isnot empty or a common proper face of both. In Figure 2.2(b), the collectionC is a polyhedral decomposition and a polyhedral subdivision of D.2.2 Piecewise Linear-Quadratic functionDefinition 2.15. [Piecewise Linear-Quadratic (PLQ) function] A functionf : Rd → R ∪ {+∞} is a Piecewise Linear-Quadratic (PLQ) function if itsdomain can be represented as the union of finitely many polyhedral sets andon each polyhedral set the function is either linear or quadratic [RW09].The domain of a PLQ function can be decomposed into a polyhedraldecomposition and on each piece, f(x) = fk(x) if x ∈ Ck, k ∈ K, wherefk(x) =12xTQkx+ qkTx+ αk (2.1)102.2. Piecewise Linear-Quadratic function(a) A polyhedral decompositionbut not a polyhedral subdivi-sion.(b) A polyhedral subdivision.Figure 2.2: Examples of a polyhedral decomposition and polyhedral subdi-vision.with Qk ∈ Rd×d a symmetric matrix, qk ∈ Rd and αk ∈ R. For each pieceof the PLQ function f , we associate the function f˜k = fk + δCk where δCkis the indicator function defined asδCk ={0, if x ∈ Ck;∞, if x 6∈ Ck.Example 2.16. Consider the l1 norm function,f(x1, x2) =|x1|+ |x2|=x1 + x2, if − x1 ≤ 0,−x2 ≤ 0;−x1 + x2, if x1 ≤ 0,−x2 ≤ 0;−x1 − x2, if x1 ≤ 0, x2 ≤ 0;x1 − x2, if − x1 ≤ 0, x2 ≤ 0;This example is illustrated in Figure 2.3. The domain of the l1 norm functionis partitioned into four polyhedral set and on each polyhedral set the functionis linear.Definition 2.17. [Partition] A family of sets P = {Ck : k ∈ K} is a partitionof D if(i) ∀k ∈ K, Ck 6= ∅,(ii) D = ∪k∈KCk,112.2. Piecewise Linear-Quadratic functionFigure 2.3: Example of a bivariate PLQ function - the l1 norm function.(iii) Ck ∩ Cl = ∅ for k 6= l.Fact 2.18. (Partition of a convex PLQ function [Roc70, Theorem 18.2])Let C be a non-empty convex set and F(C) be the collection of all non-emptyfaces of C and RI(C) = {ri F : F ∈ F(C)} be the collection of all relativeinteriors of non empty faces of C. Then RI(C) is a partition of C.Definition 2.19. [Entity] Assume f is a PLQ function and ∪kCk= dom fis a polyhedral subdivision that induce a partition of dom f . An entity is anelement of the partition of the domain. For f : R2 → R ∪ {+∞}, an entityis either a vertex, an edge or a face [GL13].Definition 2.20. [Vertex] If the dimension of an entity is 0 then it is calleda vertex. Figure 2.4 is the example of vertex .Definition 2.21. [Edge] If the dimension of an entity is 1 then it is calledan edge. An edge for any x1 6= x2 is defined asE = {x ∈ Rd : x = λ1x1 + λ2x2, λ1 + λ2 = 1, λ = (λ1, λ2) ∈ Λ}. (2.2)In R2, an edge is classified as follows(i) Line: when Λ = R2.122.3. Subdifferential of a functionFigure 2.4: Vertex in R2.(ii) Ray: when Λ = R+ × R where R+ = {x ∈ R : x ≥ 0}.(iii) Segment: when Λ = R+ × R+.Definition 2.22. [Face] If an entity has a nonempty interior then it is calleda face [Kha13]. For a PLQ function, a face Qk is categorized as(i) a QQ face if the associated function is quadratic with Qk a symmetricpositive definite matrix.(ii) a QL face if the associated function is quadratic with Qk a symmetricpositive semi-definite matrix with exactly one positive and one zeroeigenvalues.(iii) a LL face if the associated function is linear.Definition 2.23. [Extreme point, [Fan63]] An extreme point of a convexset C, is a point x ∈ C, which has a property that if x = θx1 + (1 − θ)x2with x1, x2 ∈ C and θ ∈ [0, 1], then x1 = x and/or x2 = x.An extreme point of C is not in the relative interior of any line segmentlying entirely in C. An example of extreme point is showed in Figure 2.5.2.3 Subdifferential of a functionDefinition 2.24. [Subgradient] The subgradient of a function f at a pointx¯ ∈ dom f , is a vector s such thatf(x) ≥ f(x¯) + 〈s, x− x¯〉,132.3. Subdifferential of a function(-1,0) (1,0)(-1,1) (1,1)Figure 2.5: Red points are the extreme points for the set S =co B(−1, 0)∪B(1, 0) where B(1, 0) is the closed ball of radius 1 centered at (1, 0).142.3. Subdifferential of a functionf(x)(s,−1)(x¯, f(x¯))Figure 2.6: The subgradient of a function.for all x ∈ Rd. The vector s is said to be a subgradient of f at x¯. Itcreates a supporting hyperplane which has normal (s,−1) and goes throughthe point (x¯, f(x¯)) of the epigraph of f [RW09, Theorem 8.9].If the function f is convex and differentiable then f has a unique subgra-dient at a point x¯ and it is the gradient . If f is convex but nondifferentiableat x¯ ∈ int dom f then f has multiple subgradients at x¯. This is illustratedin Figure 2.7. A subgradient of the PLQ function f , defined by Equation(2.1), at a point x iss = ∇fk(x) = Qkx+ qk. (2.3)Definition 2.25. [Subdifferential[Roc70, RW09, MN15]] The subdifferentialof a function f at a point x¯ is a closed convex set which is the collectionof all subgradients of f at x¯ . It is represented by the notation ∂f(x) anddefined as∂f(x) ={s ∈ Rd : f(x) ≥ f(x¯) + 〈s, x− x¯〉,∀x ∈ Rd},when x ∈ dom f . If x /∈ dom f by convention we set ∂f(x) = ∅ [Bal10].If the function is convex and differentiable at x ∈ int domf , then ∂f(x) ={∇f(x)}. If a PLQ function f is not differentiable at x then its subdiffer-152.3. Subdifferential of a function(a) The function f(x) is differen-tiable at a point x and has a sub-gradient s.(b) The function is not differentiableat x and has many subgradients atx. Two subgradients e.g. s1 and s2are shown.Figure 2.7: The subgradient of a convex function.ential is∂f(x) = co {∇fk(x) : Ck 3 x}.For example, in Figure 2.7(a), the subdifferential is single valued i.e.∂f(x) = {s} but in Figure 2.7(b), it is multi-valued i.e. ∂f(x) = [s1, s2].Consider the example of the absolute value functionf(x) =|x|={−x, if x ≤ 0;x, if x > 0;The absolute value function and its subdifferential is presented in Figure 2.8The subdifferential at any x > 0 is ∂f(x) = {∇f(x)} = {1} and thesubdifferential at any x < 0 is, ∂f(x) = {∇f(x)} = {−1}. The functionis not differentiable at x = 0. The subdifferential at x = 0 is ∂f(0) =co {−1, 1} = [−1, 1].The graph of the subdifferential of f is,gph ∂f = {(x, s) : x ∈ dom f, s ∈ ∂f(x)}.Definition 2.26. [Conjugate function] Consider a proper convex lower semi-continuous function f : Rd → R ∪ {+∞}. The conjugate of f , denoted by162.4. Planar graph(a) (b)Figure 2.8: (a)The absolute value function and (b) the subdifferential ∂f(x)of the absolute value function.f∗, is defined asf∗(s) = supx(〈s, x〉 − f(x)). (2.4)Consider the geometric interpretation of the conjugate function which isillustrated in Figure 2.9. We want to compute a point x such that the slopeof the line segment passing through the point (x, f(x)) has the maximumintercept on the s axis.The affine functions that are less than f(x) are represented by the fol-lowing inequalities,〈s, x〉 − α ≤ f(x),⇔ 〈s, x〉 − f(x) ≤ α.The smallest value of α is found at f∗(s) which is the conjugate of f(x).2.4 Planar graphDefinition 2.27. [Planar graph] A graph G = (V, E) is said to be a planargraph if it can be drawn in a plane with no two edges crossing each otherexcept at a vertex to which they are incident.Examples of a planar and a non planar graph are presented in Figure2.10.Definition 2.28. [Degree] In a graph G = (V, E), the degree of a node v ∈ Vis the number of edges e ∈ E that are connected to v.172.4. Planar graphxf(x)f(x)− 〈s, x〉−f∗(s)〈s, x〉Figure 2.9: Geometric interpretation of a conjugate function182.4. Planar graph(a) A planar graph. (b) A non planar graph.Figure 2.10: Example of planar and non-planar graph.For example, in Figure 2.10(a) the degree of the node V is 4.Fact 2.29. (Euler’s Formula) For a connected planar graph G with nv ver-tices, ne edges and nf faces, the following must hold:nv − ne + nf = 2.19Chapter 3AlgorithmIn this chapter, we explain our algorithm for computing the conjugateof a proper convex l.s.c. bivariate PLQ function in linear-time using graph-matrix calculus.3.1 Data structureThe input for our algorithm is a graph, called the entity graph, andspecific information on each entity. The entity graph is built from a properconvex l.s.c PLQ function. Each node of the entity graph represents anentity of the PLQ function. From now we always assume the PLQ functionis a polyhedral subdivision.Example 3.1. Consider the example of the l1 norm, f(x1, x2) = |x1|+ |x2|.The partition of its domain is illustrated in Figure 3.1. This function hasnine entities: four faces, four rays and one vertex.Let G = (V, E) be the entity graph where V is the set of nodes and Eis the set of edges. Each node of the set V represents an entity. When twoentities are adjacent then we connect them using an edge. Figure 3.2 is theentity graph corresponding to the l1 norm function.In addition, associated with each node of G we store the following in-formation about an entity: the GPH matrix, the adjacent entities and theentity type.3.1.1 GPH matrixWe use GPH matrices to represent a PLQ function. For each entity,we need to store enough information so that we can completely recover itfrom the GPH matrix. We select some points from dom f , compute thefull subdifferential and the function value at those points. Points in theGPH matrix representation are stored in order. Suppose we pick n pointsto completely represent an entity. We represent the GPH matrix as203.1. Data structureFigure 3.1: The domain of the l1 norm functionFigure 3.2: Entity graph for the l1 norm function213.1. Data structureGPH matrix =x1 x2 . . . xns1 s2 . . . sny1 y2 . . . ynb1 b2 . . . bnb∗1 b∗2 . . . b∗nwhere xi is the coordinate of a point, si is a subgradient of f at the point xiand yi is the value of f at xi. The points xi represents a polyhedral set in theplane and are given in clockwise order. Note that we need to compute thefull subdifferential of f at xi. The binary flag bi is used to identify whetherxi is an extreme point. We classify a point using the flag from Table 3.1.Table 3.1: Flag for x and s.Flag Type0 extreme point1 non-extreme pointSimilarly, using another binary flag b∗i we indicate whether si is an ex-treme point. Table 3.1 contains the flags used for b∗i .In Rd, the dimension of the GPH matrix is (2d + 3)n, where n is thenumber of points. For example, in R2, the dimension of a GPH matrix is7n.Example 3.2. Consider the following examplef(x1, x2) ={12x12 + 12x22, if x1 ≥ 0, x2 ≥ 0;∞, otherwise;whose conjugate isf∗(s1, s2) =12(s21 + s22), if s1 ≥ 0, s2 ≥ 0;12s22, if s1 ≤ 0, s2 ≥ 0;0, if s1 ≤ 0, s2 ≤ 0;12s21, if s1 ≥ 0, s2 ≤ 0;The domain of the PLQ function f is shown in Figure 3.3(a). Suppose wewant to compute the GPH matrix for the entity E1 = {(x1, 0) : x1 ≥ 0}.The entity E1 is a ray which is adjacent to F1 = {x : x1, x2 ≥ 0} associatedwith function f1(x1, x2) =12x12 + 12x22 and entity F2 = {x : x1 ≥ 0, x2 ≤ 0}223.1. Data structureassociated with function f2(x1, x2) =∞. The domain of f2(x1, x2) is emptyand the subdifferential is ∂f2(x1, x2) = ∅.The point x = (0, 0) is an extreme point of E1 and we represent it usingthe flag b = 0. The subdifferential of the function f˜1 defined on R+ × R+and equal to f1 is∂f˜1(x1, x2) = {0} × (−∞, 0].The point (0, 0) is an extreme point of f˜1 and we store it twice in theGPH matrix. For x = (0, 0) a subgradient of f˜1 is s = (0, 0) which is anextreme point of ∂f˜1(0, 0) and we indicate it by using b∗ = 0. Any othersubgradient at x = (0, 0) e.g. s = (0,−1) is a non-extreme point and hasthe flag b∗ = 1.Consider another point x′ = (1, 0) (or any other point) from E1. We usethe flag b = 1 to indicate that x′ is a non-extreme point in the primal. Thesubdifferential of the function f1, whose domain is R+×R+, at x′ is {(1, 0)}which is an extreme point in the dual. The GPH matrix for E1 isGPH1 =1 0 00 0 01 0 00 0 −11 0 ∞1 0 00 0 1.In R2, we can recover the polyhedral set by computing the convex hull of xiwhere i = 1, · · · , n (si in the dual). Note that we need at least one extremepoint to recover the polyhedral set and all non-extreme points are connectedwith their adjacent extreme points.Fact 3.3. (Representation of a polyhedral set [Roc70, Theorem 19.1],[BJS11,Theorem 2.1]) Let C = {x ∈ R2 : aTi x ≤ bi, x ≥ 0, i = 1, · · · ,m} be a poly-hedral set. Assume the set of extreme points is nonempty; then it contains afinite number of elements e.g. x1, x2, · · · , xk. If C is bounded then the set ofextreme direction is empty. If C is not bounded then the set of extreme di-rections is nonempty and has a finite number of elements e.g. d1, d2, · · · , dl.Moreover, x¯ ∈ C if and only if it is represented as a convex combination ofthe extreme points and the nonnegative linear combination of the extremedirections, that is,x¯ =k∑i=1λixi +l∑i=1µidi,233.1. Data structure(a) (b)Figure 3.3: (a)The domain of a PLQ function f , we use blue dot to indicatean extreme point in the primal (b)The subdifferential of f , green dots areused to represent the extreme points in the dual.where∑ki=1 λi = 1, λi ≥ 0, i = 1, · · · , k, µi ≥ 0, i = 1, · · · , l.If we store enough points then it is possible to represent any polyhedralset using a GPH matrix. We store the polyhedral set having different numberof inequalities as follows,(i) No inequality: If no inequality is found then the domain is the fullspace. To represent the full space we create an artificial polyhedraldecomposition. Note that the polyhedral decomposition is not unique.However the decomposition should be a polyhedral subdivision. Figure3.4 shows one possible representation of the full space.(ii) Polyhedral sets with one inequality: If the inequality represents a halfspace then we use the same representation as the full space i.e. wemake an artificial polyhedral decomposition and then store the ex-treme and the non-extreme points. If the inequality represents a ray,which already contains an extreme point, we pick a non-extreme pointto represent it. We divide a line into two rays and store it with one ex-treme and two non-extreme points. Examples are presented in Figure3.5.243.1. Data structureFigure 3.4: Example of no inequality. We use dotted lines to indicate theartificial polyhedral decomposition. We represent extreme points by usingblue and non-extreme points by using orange color.(iii) Polyhedral sets with two or more inequalities: Consider the followingtwo cases,(a) If the hyperplanes associated with inequalities intersect, we pickthe extreme points of the intersection. If the extreme points arenot enough to represent the polyhedral set then we add somenon-extreme points.(b) If the hyperplanes do not intersect e.g. if they are parallel thenwe represent each of them individually using extreme and non-extreme points and recover the polyhedral set by computing theconvex hull.The representation of the polyhedral set with two or more inequalitiesis illustrated in Figure 3.6 and 3.7.Consider the domain of the l1 norm function which is illustrated in Fig-ure 3.1. Suppose we want to compute the GPH matrix of the face E1. Theentity E1 contains all the points with nonnegative coordinates. There isno unique representation of the GPH matrix. For efficiency we should pickminimal number of points to completely represent an entity.Example 3.4. The entity E1 can be represented using different GPH ma-253.1. Data structure(a) (b)(c)Figure 3.5: Example of polyhedral sets with one inequality. (a) A half space.To represent it we assume a ray in the half space and then select the extremeand the non-extreme points. (b) A ray which contains an extreme point.To represent it completely we pick a non-extreme point. (c) A line. Werepresent a line using two rays.263.1. Data structure(a) (b)Figure 3.6: Example of the polyhedral sets with two hyperplanes. (a) Thepolyhedral set if the hyperplanes are parallel (b) The polyhedral set if twohyperplanes intersect.(a) (b)Figure 3.7: Example of polyhedral sets with three inequalities.273.1. Data structuretrices. For example either of the following would work.1 0 00 0 11 1 11 1 11 0 11 0 10 0 0,5 1 0 0 00 0 0 1 101 1 1 1 11 1 1 1 112.5 1 0 1 501 1 0 1 10 0 0 0 0.Now we show how to recover the function from the GPH matrix. Theprocedure is general so we explain it for the PLQ function f : Rd →Rd ∪ {+∞}. Recall that, on each piece of the domain the function f isquadratic and represented by Equation (2.1). The subgradient of f at apoint x is represented by Equation (2.3). For each x we get d equationsfrom (2.3)(gradient information) and 1 equation from knowing the functionvalue. So for each x we have a total of d+ 1 equations.The number of unknowns for Qk is d+ d− 1 + · · ·+ 1 = d(d+1)2 since Qkis a symmetric matrix. The number of unknowns for qk is d and for αk is 1.The total number of unknowns ared(d+ 1)2+ d+ 1 =(d+ 1)(d2+ 1).So the minimum number of points required to determine the function fk is(dd/2e+ 1).In R2, if we want to recover a function from a GPH matrix, we need tocompute the value of 6 unknowns. For each point we get 3 equations. Sowe need to store at least two points to represent an entity. However, if twopoints are not enough to completely represent the polyhedral set then weneed to store more points.For every x we denote the function as followsfk(x) =y =12[x1 x2] [a bb c] [x1x2]+[q1 q2] [x1x2]+ α. (3.1)So the gradient can be computed ass1 = ax1 + bx2 + q1, (3.2)s2 = bx1 + cx2 + q2. (3.3)283.1. Data structureConsider the entity E1 of Example 3.1, which is a face. The GPH matrixfor E1 is,GPH1 =1 0 00 0 11 1 11 1 11 0 11 0 10 0 0.Here we pick three points from dom f to represent E1.The 6th row of the GPH of E1 is the flag for these points and the lastrow is the flag for the subdifferentials. We have an extreme point (0, 0) andtwo non-extreme points ((1, 0) and (0, 1)). We use a 0 as a flag to indicatethat the point (0, 0) is an extreme point. Similarly, we use 1 to indicate thatthe points (1, 0) and (0, 1) are non-extreme points.To get the function that defines the entity E1 we need to compute thevalue of 6 unknowns (a, b, c, q1, q2, α). From Equation (3.2) we get the fol-lowing linear system, 111 =1 0 10 0 10 1 1 abq1 .The solution of the system is abq1 =001 .Similarly using the Equation (3.3) we compute bcq2 =001 .If we substitute the value of a, b, c, q1 and q2 in Equation (3.1) we get α = 0.We deducef1(x1, x2) = x1 + x2. (3.4)In the GPH matrix representation, we can store a point x multiple timesif the subdifferential is multi-valued i.e. if we have multiple subgradientat x [GL10, GL11]. For example, consider the entity E5 of Example 3.1.293.1. Data structureThe entity E5 is a ray which is adjacent to two faces: E1 and E4. Considerx = (0, 0). The face E1 is associated with f1(x) = x1 + x2 and the onlysubgradient of f1 at x = (0, 0) is (1, 1). Similarly, the face E4 is associatedwithf4(x) = x1 − x2,and at x = (0, 0), the subgradient is (1,−1). At any point on E5 (except(0, 0)) the subdifferential is∂f(x1, x2) ={1} × [1,−1],= co {(1, 1), (1,−1)}.and has two extreme points (1, 1) and (1,−1). We represents the GPHmatrix associated with E5 asGPH5 =1 0 0 10 0 0 01 1 1 11 1 −1 −11 0 0 11 0 0 10 0 0 0.Similarly the vertex E9 is adjacent to four faces. The subdifferential is∂f(0, 0) =[−1, 1]× [−1, 1],= co {(1, 1), (1,−1), (−1, 1), (−1,−1)}.This set is representable using 4 points and a GPH matrix associated withE9 isGPH9 =0 0 0 00 0 0 01 −1 −1 11 1 −1 −10 0 0 00 0 0 00 0 0 0.Note that the subdifferential of every x inside of a face is single valued. How-ever, the subdifferential of a vertex or at any point of an edge (line/ray/segment)is usually multi-valued.In our implementation, we use two hypermatrices Hp and Hd to storethe GPH matrices of all entities in the domain of f and the domain of303.1. Data structuref∗ respectively. A hypermatrix is a generalization of a matrix to a multi-dimensional array. In Rd, the dimension of the hypermatrix is N×M×nmaxwhere N is the total number of entities, M = (2d+ 3) is the number of rowsof the GPH matrix and nmax = maxi=1,...,N ni where ni is the number ofcolumns in the GPH matrix of entity i. For example, the function l1 normhas nine entities and the maximum value of ni is 4 which is found from theGPH matrix of the vertex E9. So the dimension of the hypermatrix H usedto store the entities of the l1 norm is 9× 7× 4.Recall that with our data structure, for any point x, we can recover notjust one subgradient of f at x but the full subdifferential of f at x.3.1.2 Adjacent entitiesEach node of G contains information of its adjacent entities. We storethe adjacency information of all entities using a matrix. We call this matrixa Neighbour Matrix and denote it by NM . The dimension of NM is N ×mwhere N is the number of entities and m is the maximum degree of allvertices in the entity graph.Consider Example 3.1 which has nine entities. The vertex E9 has eightadjacent entities which is the maximum. So we use a Neighbour Matrix ofdimension 9× 8. The entity E1 is adjacent to E5, E6 and E9. In the entitygraph G, the node V1 which contains E1, stores the indices of the adjacententities i.e. 5, 6, 9. We represent the Neighbour Matrix of Example 3.1 asNM =5 6 9 0 0 0 0 06 7 9 0 0 0 0 07 8 9 0 0 0 0 05 8 9 0 0 0 0 01 4 9 0 0 0 0 01 2 9 0 0 0 0 02 3 9 0 0 0 0 03 4 9 0 0 0 0 01 2 3 4 5 6 7 8.Each row of this matrix contains the adjacency information of the corre-sponding entity. For example, the index of all adjacent entities of E5 isfound in the 5th row of NM .However, we never read the full NM matrix. We extract the non-zeroelements from each row which gives the full adjacency information of anentity. As a result the time complexity is not increased.313.2. Algorithm3.1.3 Entity TypeThe type of each entity is defined by using a flag from Table 3.2. We usean array, called T , of dimension 1×N to store the type of all entities.Table 3.2: Entity type.Entity Type FlagVertex 1Face 2Line 3Ray 4Segment 5For example, the array which stores the entity type of the l1 norm func-tion isT =[2 2 2 2 4 4 4 4 1]and means that entities E1 to E4 are faces, entities E5 to E8 are rays andE9 is a vertex.We visualize the data structure using Figure 3.8. Consider a PLQ func-tion with N entities and each entity has a GPH matrix, adjacency infor-mation and entity type. We store all GPH matrices in a hypermatrix, alladjacency information in neighbour matrix NM and all entity types in anarray T .3.2 AlgorithmConsider the conjugate computation problem as a graph traversal prob-lem. In Algorithm 1, we use breadth-first search to traverse the entity graphG.Consider f : R2 → R ∪ {+∞} where dom f has N entities. In thisalgorithm, when we traverse an entity i, we store the index of all adjacententities in an array denoted D. The dimension of D is 1×N . We traverseG according to the index stored in D. Note that we will not store any indexin D which is already stored in D. To check that the index of an entityis already stored in D or not we use a binary array I of dimension 1 × N .In the binary array, if the value of an element is 1 then the index of thecorresponding entity is already stored in D and if the value is 0 then it isnot stored yet.323.2. AlgorithmAlgorithm 1: Compute PLQ Conjugateinput : Hp (a hypermatrix which contains the GPH matrices of allentities), NMp (contains the adjacency information of allentities), Tp (contains the type of all entities).output: Hd (a hypermatrix which contains the GPH matrices of alldual entities), NMd (contains the adjacency information ofall dual entities), Td (contains the type of all dual entities).Initialize I with zero and set I(1) = 1 ;Initialize D with zero and set D(1) = 1 and N¯ = 1;for i← 1 to N doj ← D(i) ;Gp ← Hp(j,:,:) ;[Gd, t]← Conjugate GPH(Gp) ;Hd(j,:,:) ← Gd ;Td(j)← t ;if (N¯ < N) thenEadj = NMp(j, :) ;Compute Index(Eadj , D, I, N¯) ;endendNMd ← NMp ;333.2. AlgorithmFigure 3.8: Visualization of the data structureSuppose we want to traverse the entity graph G of Figure 3.2 usingAlgorithm 1. We start from E1 which is a face. We have the followinginformations about E1,Gp =1 0 00 0 11 1 11 1 11 0 11 0 10 0 0,Indices of adjacent entities =[5 6 9],Entity type =2.We start to traverse the entity graph G from the entity E1 and initializeD asD =[1 0 0 0 0 0 0 0],the index array I asI =[1 0 0 0 0 0 0 0],343.3. Complexityand N¯ = 1. We apply Algorithm 2 to compute the conjugate of E1. Thematrix Gd is the GPH matrix which contains the conjugate of E1, we haveGd =1 1 11 1 11 0 00 0 10 0 01 0 10 0 0.The next step is to identify the type of entity of Gd, i.e. if it is a vertex,an edge, or a face. We compute the total number of unique subdifferentialsof Gd, which is 1 and deduce that the conjugate of the face E1 is a vertex.Next we check the adjacent entities of E1 using Algorithm 3. We findthat the entity E1 is adjacent to the entities E5, E6, E9. Then we check thecorresponding index of I to see whether these entities are already includedin D or not. In the index array I, the 5th, 6th and 9th elements are zero i.e.these elements are not included in D. So we update D, I and N¯ asI =[1 0 0 0 1 1 0 0 1],D =[1 5 6 9 0 0 0 0 0],N¯ =4.We check the index of the next entity from D which is 5. Now we moveto E5 and do the same computation. Some steps of traversing the entitygraph(G) of the l1 norm function are illustrated in Figure 3.9.We traverse all nodes of G and update the array D and I accordingly.The changes of D and I after each iteration of Algorithm 1 are presented inTable 3.3.In Table 3.3, we use red color for a currently visiting vertex and bluecolor for already visited vertex.Figure 3.10 shows the partition of the primal and the dual domain of thel1 norm function. The mapping of the entities from the primal to the dualdomain is presented in Table 3.4. According to Table 3.4, the mapping is aone-to-one entity mapping.3.3 ComplexityThe space and time complexity of our algorithm is computed next.353.3. ComplexityAlgorithm 2: Compute GPHinput : Gp (the GPH matrix in the primal).output: Gd (the GPH matrix in the dual which contains theconjugate of Gp), t (type of the dual entity).//Compute Gd ;x = Gp(1 : 2, :), s = Gp(3 : 4, :) ;y = Gp(5, :), b = Gp(6, :) ;b∗ = Gp(7, :) ;The conjugate is Gd ←sxsTx− ybb∗;//Compute t ;Pu = number of unique vectors in {Gd(1, k), GdC(2, k) : k};Pt = number of columns of Gd ;if Pu = 1 and Pt ≥ 1 then// Gd is a vertex;t = 1 ;else if Pu = 2 then//Gd is an edge;Pb0 = number of elements in {Gd(6, k) : Gd(6, k) = 0} ;Pb1 = number of elements in {Gd(6, k) : Gd(6, k) = 1} ;if Pb1 = 2 then// Gd is a line;t = 3 ;else if ( Pb0 = 1 and Pb1 = 1 ) then// Gd is a ray;t = 4 ;else if Pb0 = 2 then// Gd is a segment;t = 5 ;endelse if Pu ≥ 2 then// Gd is a face;t = 2 ;end363.3. ComplexityAlgorithm 3: Compute Indexinput : Eadj (contains the indices of all adjacent entities of anentity), D (contains the indices of the entities to traverse), I(indicate that which entities are already included in D), N¯(number of nonzero elements in D).output: D, I, N¯ .Adj ← [extract the non zero elements from Eadj ] ;for i← 1 to size of(Adj) doindex← Adj(i) ;if (I(index) == 0) thenI(index) = 1 ;N¯ ← N¯ + 1;D(N¯) = index ;endendTable 3.3: Iterations of Algorithm 1.Iterations D IInitialization 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 01 1 5 6 9 0 0 0 0 0 1 0 0 0 1 1 0 0 12 1 5 6 9 4 0 0 0 0 1 0 0 1 1 1 0 0 13 1 5 6 9 4 2 0 0 0 1 1 0 1 1 1 0 0 14 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 15 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 16 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 17 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 18 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 19 1 5 6 9 4 2 3 7 8 1 1 1 1 1 1 1 1 1373.3. Complexity(a) Traversing E1.(b) Traversing E5.(c) Traversing E6.(d) Traversing E9.Figure 3.9: Traversing the entity graph of the l1 norm function using Algo-rithm 1.383.3. ComplexityTable 3.4: Mapping of the primal entity to the dual entity for the l1 normfunction.Primal entity Type Dual entity TypeE1 face 1 E1′ vertex 1E2 face 2 E2′ vertex 2E3 face 3 E3′ vertex 3E4 face 4 E4′ vertex 4E5 ray 1 E5′ segment 1E6 ray 2 E6′ segment 2E7 ray 3 E7′ segment 3E8 ray 4 E8′ segment 4E9 vertex 1 E9′ face 1(a) Partition of dom f . (b) Partition of dom f∗.Figure 3.10: The partition of domain of the l1 norm function and its conju-gate.393.3. Complexity3.3.1 Space complexityProposition 3.5. Consider a proper convex l.s.c bivariate PLQ functionf : R2 → R2 ∪ {+∞} with N entities. The worst-case space complexity forcomputing the conjugate of f using Algorithm 1 is O(N2).Proof. In Algorithm 1, we use two hypermatrices (Hp and Hd) to storethe input and output GPH matrices, a Neighbour matrix NM to store theadjacency information, two arrays (Tp and Td) to store the type of entities,an array D to store the indices to traverse and a binary array I.The space complexity of our algorithm mainly depends on the size ofthe hypermatrix and the Neighbour matrix. Recall that the size of thehypermatrix H is N ×M × nmax where N is the total number of entities,M = (2d + 3) is the number of rows of the GPH matrix and nmax is themaximum number of columns in the GPH matrix. If we assume d is constant,the worst-case space complexity of H is O(N).In addition, we use four arrays: Tp, Td, D and I. In the worst-case, thespace required to store each array is O(N).In Algorithm 1, we use a planar graph as the entity graph. Recall that thesum of the degrees of the vertices equals twice the number of edges and thedimension of the Neighbour matrix NM is N ×m where m is the maximumdegree of all vertices. For N entities, in the worst-case the maximum degreeof the vertices is (N − 1). So the worst-case space complexity of NM isO(N2).Remark 3.6. A sparse matrix data structure would lower that space com-plexity to linear as would an adjacency list data structure i.e. the quadraticspace complexity of our implementation is a Scilab limitation.In addition, the quadratic space complexity does not arise when themaximum degree of vertices is bounded. For example, the worst-case spacecomplexity of a grid domain in R2 is linear where the maximum degree ofthe vertices is 8 and the dimension of NM is N × 8. The following resultshows that even for nongrid domains, the matrix NM is sparse with at mostO(N) nonzero entries.Corollary 3.7. Let G = (V, E) be a connected planar graph with ne edges andnv vertices. Assume nv ≥ 3. Then the total number of edges is ne ≤ 3nv−6.Proof. In a planar graph G, the edges divide the plane into different regionsand each region is called a face of the graph G. The total number of edgesbordering a face Fi is called the degree of Fi.403.3. ComplexityNote that, in any planar graph G, since an edge separates 2 faces, thesum of the degree of all faces is equal to twice the number of edges i.e.m∑i=1deg Fi = 2ne.Any face has degree deg Fi ≥ 3. So 2ne =m∑i=1deg Fi ≥ 3nf , hencenf ≤ 23ne. According to Euler’s formula from Fact 2.29, nv−ne+nf = 2. Sonf = 2+ne−nv ≤ 23ne, which gives 13ne ≤ nv−2, consequently ne ≤ 3nv−6.Unfortunately, we have to use the Neighbour matrix NM to store theadjacency information due to the limitation of suitable data type in Scilab.Linear space complexity can be achieved if this algorithm is implementedin another language like Java or C. Then we can represent each node of theentity graph by using a structure containing the GPH matrix, the adjacentlist and the entity type. Although Scilab supports the use of structure,its structure implementation is list-based while other languages use a moreefficient array-based implementation.3.3.2 Time complexityProposition 3.8. The worst-case time complexity for computing the conju-gate of a proper convex l.s.c bivariate PLQ function f with N entities usingAlgorithm 1 is O(N).Proof. The time complexity of our algorithm comes from the following parts(i) Extracting N GPH matrices from the hypermatrix Hp according tothe index stored in D and store it to E,(ii) Computing the conjugate of E,(iii) Checking the adjacency information of E and updating the arrays Dand I,(iv) Storing the GPH matrix, containing the conjugate of E, in Hd in theindex which is found from the array D.Task (i) amounts to accessing an index of a hypermatrix Hp which is a multidimensional array. The advantage of using an array over other data types isthat the time for accessing any element of an array is always constant. Thisis the main reason for using a hypermatrix to store the GPH matrices. Sothe time complexity for accessing any element of Hp, where the dimension of413.3. ComplexityHp is N ×M × ni, is O(1). So the total time required to access N elementsof Hp is O(N).Task (ii) involves computing the conjugate. The time complexity forcomputing the conjugate of N entities is linear i.e. O(N).In addition, task (iii) involves storing the index of adjacent entities in thearray D. Recall that the indices stored in D are used to traverse the entitygraph. In this part, when storing an index in D we need to check whetherthe index is already stored in D or not. In the worse-case, searching theindex of an entity in an array takes O(N) time. For N entities, the timefor storing all indices in D is O(N2). To improve from this quadratic timecomplexity, we use a binary array I. Every time we want to insert the indexof an entity in D we check the corresponding position of I. If we find 0, thatmeans the index is not stored in D and we can insert it in D. If we find 1, itindicates this index is already stored in D. Accessing a position of I takesconstant time. So the time complexity for N entities reduces to linear i.e.O(N).Finally, task (iv) requires accessing a position of a hypermatrix Hd andstoring the output GPH matrix in Hd. Similar to the first part, this alsotakes linear time i.e. the time complexity is O(N).In total, the time complexity isO(N +N +N +N) = O(N).Therefore, the overall time complexity for our algorithm is linear.We proved that it is possible to compute the conjugate of a bivariatePLQ function in linear time while using the same input and output datastructure.42Chapter 4Numerical experimentsIn this chapter, we will present some examples and the result of a nu-merical experiment. Our algorithm is implemented in Scilab [SCI].4.1 Example 1The PLQ (the l1 norm) function explained in Chapter 4 was an exampleof a polyhedral subdivision. Now we explain an example of a PLQ functionthat is not a polyhedral subdivision.Example 4.1. Consider the PLQ function f which is defined as follows,f(x1, x2) =12(x21 + x22), if x1 ≥ 0, x2 ≥ 0;−2x1 + 12x22, if x1 ≤ 0, x2 ≥ 0;∞, otherwise.(4.1)The domain of f has three pieces: a strictly convex piece, a convex pieceand a piece with infinite function value. The dom f is decomposed into sixentities: two finite faces, three rays and one vertex. This is presented inFigure 4.2.The polyhedral set for the PLQ function f is C = {C1, C2, C3} which isnot a polyhedral subdivision because C1 ∩ C3 or C2 ∩ C3 is neither emptynor a common proper face of both. We need to modify the domain to makeit a polyhedral subdivision.Note that there is no unique way to make a polyhedral subdivision. Wedivide C3 into two pieces where on each piece the function value is infinite.Now the domain of f becomes a polyhedral subdivision as C1 ∩ C3 andC2 ∩ C4 is a common proper face of both. The modified domain of f ispresented in Figure 4.3.The input for Algorithm 1 consists of a hypermatrix Hp, a Neighbourmatrix NM and an array Tp. The hypermatrix Hp has the following six434.1. Example 1Figure 4.1: The function f(x1, x2) from Equation 4.1E4E5 E3E1E2E612(x12 + x22)−2x1 + 12x22∞Figure 4.2: Partition of dom f which is not a polyhedral subdivision.444.1. Example 1E4E5 E3E1E2E612(x12 + x22)−2x1 + 12x22∞∞Figure 4.3: Partition of dom f which is a polyhedral subdivision.GPH matrices,Hp(1, :, :) =1 0 00 0 11 0 00 0 112 0121 0 10 0 0,Hp(2, :, :) =−1 0 00 0 1−2 −2 −20 0 12 0 121 0 10 0 0.The entity E3 is a ray which is adjacent to two faces: F1 and F3. We picktwo points to build a GPH matrix for E3. The function associated with F1isf1(x1, x2) =12(x21 + x22),454.1. Example 1and the function associated with F3 isf3(x1, x2) =∞.Since f3 =∞, dom f3 = ∅ and the subdifferential is ∂f3 = ∅.Assume we pick a point (0, 0) which is an extreme point of E3. Thesubdifferential of the function f˜1 defined on R+ × R+ and equal to f1 is∂f˜1(0, 0) ={0} × (−∞, 0].In the GPH matrix we store the point x = (0, 0) twice because it is theextreme point of the function f˜1. For s = (0, 0), we set b∗ = 0 as it isthe extreme point of ∂f˜1(0, 0). When we choose any other subgradient e.g.s = (0,−1) then we set b∗ = 1 as it is a non-extreme point.Similarly, if we pick another point (1, 0), the subdifferential of f1 at (1, 0)is {(1, 0)} which is an extreme point. The GPH matrix of E3 isHp(3, :, :) =1 0 00 0 01 0 00 0 −112 0 ∞1 0 00 0 1.A similar computation is required to compute the subdifferential of E5and E6 because in both entities we need to represent the infinity in thedomain.Hp(4, :, :) =0 0 0 01 0 0 10 0 −2 −21 0 0 112 0 0121 0 0 10 0 0 0,Hp(5, :, :) =−1 0 00 0 0−2 −2 −20 0 −22 0 ∞1 0 00 0 1,464.1. Example 1Hp(6, :, :) =0 0 0 00 0 0 00 −2 −2 00 0 −2 −10 0 ∞ ∞0 0 0 00 0 1 1.The maximum degree of entities is 5. So the Neighbour matrix NM off isNM =3 4 6 0 04 5 6 0 01 6 0 0 01 2 6 0 02 6 0 0 01 2 3 4 5 .The type of all input entities are stored in Tp asTp =[2 2 4 4 4 1]which indicates that the domain of f has two faces, three rays and a vertex.We apply Algorithm 1 and compute the conjugate of ff∗(s1, s2) =12(s21 + s22), if s1 ≥ 0, s2 ≥ 0;12s22, if −2 ≤ s1 ≤ 0, s2 ≥ 0;0, if −2 ≤ s1 ≤ 0, s2 ≤ 0;12s21, if s1 ≥ 0, s2 ≤ 0;∞, otherwise.The conjugate is illustrated in Figure 4.4. The domain of f∗ has twelveentities: four faces, five rays, one segment and two vertices. The partitionof dom f∗ is illustrated in Figure 4.5.We store all the entities making a partition of dom f∗ in a hypermatrix474.1. Example 1Figure 4.4: The conjugate f∗(s1, s2).E1E2E4 E3E5E6E7E9 E8E11E12(0, 0)(−2, 0)12s2212s12012(s12 + s22)E10Figure 4.5: Partition of the domain of f∗.484.1. Example 1Hd as follows,Hd(1, :, :) =1 0 00 0 11 0 00 0 112 0121 0 10 0 0,Hd(2, :, :) =−2 −2 −20 0 1−1 0 00 0 10 0 121 0 10 0 0,Hd(3, :, :) =1 0 00 0 −11 0 00 0 012 0 −∞1 0 00 0 1,Hd(4, :, :) =0 0 −2 −21 0 0 10 0 0 01 0 0 112 0 0121 0 0 10 0 0 0,Hd(5, :, :) =−2 −2 −20 0 −2−1 0 00 0 00 0 −∞1 0 00 0 1,494.2. Example 2Hd(6, :, :) =0 −2 −2 00 0 −2 −10 0 0 00 0 0 00 0 −∞ −∞0 0 0 00 0 1 1.The adjacency information of the entities are not changed in the dual i.e.NMp = NMd whileTd =[2 4 2 2 4 2].Table 4.1 is the mapping of the entities from the primal to the dualdomain.Table 4.1: Mapping of the primal entity to the dual entity.Primal entity Type Dual entity TypeE1 face 1 E1′ face 1E2 face 2 E7′ ray 3E3 ray 1 E3′ face 3E4 ray 2 E2′, E6′, E7′, face 2, ray 2, ray 3,E10′, E11′, E12′ segment 1, vertex 1,vertex 2E5 ray 3 E9′ ray 5E6 vertex 1 E4′,E8′ face 4,ray 4According to Table 4.1, the mapping is a one-to-many entity mapping.4.2 Example 2Example 4.2. Consider the example of 2D energy function,f(x1, x2) =12(x12 + x22)which is presented in Figure 4.6.This function does not have any inequality constraint. The domain ofthis function is the full two dimensional space and it has only one entity.To represent the full space we have to make some artificial partitions. Notethat the number of partitions is not unique. However the domain should504.2. Example 2Figure 4.6: The 2D energy function.be divided in a way that it becomes a polyhedral subdivision. Suppose wedivide the domain of the energy function as followsf(x1, x2) =12(x12 + x22), if − x1 ≤ 0,−x2 ≤ 0;12(x12 + x22), if x1 ≤ 0,−x2 ≤ 0;12(x12 + x22), if x1 ≤ 0, x2 ≤ 0;12(x12 + x22), if − x1 ≤ 0, x2 ≤ 0.Now the domain of f(x1, x2) has four pieces which is illustrated in Figure4.7. This is a polyhedral subdivision and dom f has nine entities: four faces,four rays and a vertex.The next step is to compute the GPH matrices for all entities. We build514.2. Example 212(x12 + x22)12(x12 + x22)12(x12 + x22)12(x12 + x22)(1,0)(0,0)(-1,0)(0,1)(0,-1)E5E6E7E8E2 E1E3 E4E9Figure 4.7: The domain of 2D energy function. The dotted lines indicatethat the partitions are not real partitions.524.2. Example 2the GPH matrices Hp as follows,Hp(1, :, :) =1 0 00 0 11 0 00 0 112 0121 0 10 0 0,Hp(2, :, :) =−1 0 00 0 1−1 0 00 0 112 0121 0 10 0 0,Hp(3, :, :) =−1 0 00 0 −1−1 0 00 0 −112 0121 0 10 0 0,Hp(4, :, :) =1 0 00 0 −11 0 00 0 −112 0121 0 10 0 0,Hp(5, :, :) =1 0 0 10 0 0 01 0 0 10 0 0 012 0 0121 0 0 10 0 0 0,534.2. Example 2Hp(6, :, :) =0 0 0 01 0 0 10 0 0 01 0 0 112 0 0121 0 0 10 0 0 0,Hp(7, :, :) =−1 0 0 −10 0 0 0−1 0 0 −10 0 0 012 0 0121 0 0 10 0 0 0,Hp(8, :, :) =0 0 0 0−1 0 0 −10 0 0 0−1 0 0 −112 0 0121 0 0 10 0 0 0,Hp(9, :, :) =0 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0 .The energy function is the only self conjugate function [Roc70]. So the sameGPH matrices is found in the dual i.e. Hp = Hd. The neighbour matrixNM and the entity type T for the energy function is the same as the l1norm function.All the PLQ functions discussed above have a small number of entities.In the next section we give an example a of PLQ function with a largenumber of entities.544.3. Partition of domain using a grid of points4.3 Partition of domain using a grid of pointsExample 4.3. Consider the following additively separable PLQ functionf(x1, x2) = x41 + x42.We set f1(x1) = x41 and f2(x2) = x42.We use plq build function from CCA package [CCA] to approximate anunivariate PLQ function from f1. The plq build function is used for thequadratic approximation of f1. We use a grid of points to approximate f1into an univariate PLQ function. From the univariate PLQ function weapproximate a bivariate PLQ function.For example, if we use the points 0, 1, 2, 3 and 4 then we get an univariatePLQ function with ten pieces. For preserving the shape plq build functiontakes one additional point within every interval and divide the domain intoten quadratic pieces. We get the following PLQ matrix from f1,PLQ matrix of f1(x1) =0.00 0.00 0.00 ∞0.75 0.67 0.00 0.001.00 6.00 −8.00 3.001.61 9.06 −14.12 6.062.00 21.64 −54.55 38.552.57 29.16 −84.65 68.653.00 49.52 −189.09 202.643.55 61.21 −259.26 307.894.00 89.47 −459.70 663.40∞ 0.00 0.00 256.00.Figure 4.8 shows the PLQ function built from f1(x1) = x41. We considerthe pieces with finite function value and approximate the univariate PLQfunction into a bivariate PLQ function. The univariate PLQ function f1has eight finite pieces. So the bivariate PLQ function contains 8 ∗ 8 = 64pieces. Figure 4.9 illustrates the resulting bivariate PLQ function f(x1, x2)approximating x41 + x42.Next we compute all the entities and represents them using GPH matri-ces. We build the hypermatix Hp and compute the neighbour matrix NM .Now we apply our algorithm and compute the conjugate of f which isf∗(s1, s2) =f∗1 (s1) + f∗2 (s2),where f∗1 (s1) = (34)43 s143 and f∗2 = f∗1 .554.3. Partition of domain using a grid of pointsFigure 4.8: The approximation of univariate PLQ function f1(x1).Figure 4.9: The approximation of function f(x1, x2) = x41 + x42.564.3. Partition of domain using a grid of pointsy = 0.0034x - 0.5095R² = 0.999101020304050600 2000 4000 6000 8000 10000 12000 14000 16000 18000TimeTotal entityFigure 4.10: The time complexity for computing the conjugate of the func-tion from Example 4.3 where dom f is partitioned into a grid.We increase the number of pieces by increasing the size of the grid andmeasure the time for computing the conjugate of f . The time complexityof the function f is presented in Figure 4.10.4.3.1 Performance ComparisonWe compare the performance of our proposed Algorithm with the al-gorithm developed in [Kha13]. We compute the conjugate of the functionf(x1, x2) from Example 4.3 and measure the time for the computation usingboth algorithms. The times for different grid size using both algorithms areshown in Table 3.8.574.3. Partition of domain using a grid of pointsTable 4.2: Comparison of computation time using the algorithm from[Kha13] and Algorithm Compute PLQ Conjugate. All the times are in sec-onds.Total entity [Kha13] Proposed Algorithm81 4.1 0.3169 14.4 0.5289 20.4 1.2441 40.2 1.4625 67.9 2.3841 110.7 2.91,089 173.4 3.71,369 263.2 4.81,681 375.1 5.42,025 604.3 6.52,401 921.4 8.42,809 1,234.9 9.13,249 1,651.4 10.8The table shows that the computation time of the proposed algorithmis more than 100 times faster than the other algorithm.Figure 4.11 is the plot of the time complexity for these two algorithms.For the algorithm developed in [Kha13], when we fit a linear trendline thevalue of R2 is 0.91. However, if we fit a quadratic trendline then we getR2 = 0.99 which means this algorithm actually runs in quadratic time.We plot the time complexity of our proposed algorithm. We fit a lineartrendline and compute the R2 value which is 0.99. The value of R2 provesthat the proposed algorithm is a linear time algorithm.We run all numerical experiments on a Core(TM) i5 processor, 64 bitOS, 8.00 GB RAM, 2.40 GHz HP Pavilion x360 laptop, running Windows10. The implementation of the algorithm is done using Scilab version 5.5.2.We perform the numerical experiment several times and obtained similarresults each time.The implementation of the algorithm from [Kha13] was a pure Scilabcode that did not include the improvement of using the half-edge data struc-ture provided in an external library. At the price of considerable complexityand a loss in portability, the [Kha13] algorithm can be implemented in log-linear time. However, our new algorithm would still be faster (linear-time)and much simpler.584.3. Partition of domain using a grid of points0.00200.00400.00600.00800.001000.001200.001400.001600.001800.000 500 1000 1500 2000 2500 3000 3500TimeTotal EntityTotal Entity vs Time Previous AlgorithmProposed AlgorithmFigure 4.11: The time complexity for computing the conjugate of the func-tion from Example 4.3 using the algorithm from [Kha13] and the proposedalgorithm .59Chapter 5ConclusionThis chapter summarizes the work we have completed so far and ourfuture plans. Our goal was to remove the limitations and improve theworst-case complexity of the existing algorithms developed for computingthe conjugate of a bivariate PLQ function. The outcome of this thesis isthe first linear-time algorithm to compute the conjugate of a bivariate PLQfunction.In this thesis, we worked with a proper convex l.s.c PLQ function. Weassume that the domain of the function is a polyhedral subdivision. Otherwisewe need to create a polyhedral subdivision.We used GPH matrices to store the entities of a PLQ function andstored all GPH matrices using a hypermatrix. Moreover we stored the fulladjacency information of the entities. The adjacency information was notstored in the previous approaches. We traversed the entity graph, accessedeach entity and computed the conjugate. We also stored the informationabout the visited entity. So for each entity we never need to loop throughall the entities to check whether it is visited or not. As a result the overallcost for traversing the entity graph becomes linear. This is the improvementof our algorithm over the algorithms developed before (see Table 1.1).We computed only the convex conjugate operator. The improved com-plexity of this algorithm encourage us to develope GPH based algorithmsfor computing the other convex transforms like the proximal average, theMoreau envelope, the addition and the scalar multiplication. An interest-ing future work will be developing new algorithms using GPH matrix-baseddata structure for computing all the convex transforms of a bivariate PLQfunction.We implemented our algorithm in R2. We provided a detail explanationabout the implementation in Chapter 3. However there is the potential toimplement this algorithm in higher dimension. Directions for future workinclude computing the conjugate of a convex PLQ function of d variables.We presented some examples to show the performance of our algorithmfor different types of bivariate PLQ functions. The results are included inChapter 4. We showed that our algorithm can deal with a PLQ function60Chapter 5. Conclusionwhich has thousands of pieces and still computes the conjugate in linear-time. We included a graph in Chapter 4 to show the comparison of per-formance of this algorithm and a previous algorithm. The graph clearlyvisualize that our proposed algorithm is significantly faster than the previ-ous algorithms.61Bibliography[Bal10] E. J. Balder. On subdifferential calculus. Lecture notes, Univer-siteit Utrecht, 2010. → pages 15[BBM03] F. Borrelli, A. Bemporad, and M. Morari. Geometric algorithmfor multiparametric linear programming. Journal of OptimizationTheory and Applications, 118(3):515–540, 2003. → pages 6[BJS11] M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali. Linear programmingand network flows. John Wiley & Sons, 2011. → pages 23[CCA] The computational convex analysis numerical library.http://atoms.scilab.org/toolboxes/CCA. → pages 55[CGA] Computational Geometry Algorithms Library.http://www.cgal.org. → pages 5[Fan63] K. Fan. On the Krein-Milman theorem. Convexity, 7:211–220,1963. → pages 13[FH12] P. F. Felzenszwalb and D. P. Huttenlocher. Distance transformsof sampled functions. Theory of Computing, 8(19), 2012. → pages3[GKL14] B. Gardiner, J. Khan, and Y. Lucet. Computing the partialconjugate of convex piecewise linear-quadratic bivariate func-tions. Computational Optimization and Applications, 58(1):249–272, 2014. → pages 4, 6, 7[GL10] B. Gardiner and Y. Lucet. Convex hull algorithms for piecewiselinear-quadratic functions in computational convex analysis. Set-Valued and Variational Analysis, 18(3-4):467–482, 2010. → pages29[GL11] B. Gardiner and Y. Lucet. Graph-matrix calculus for compu-tational convex analysis. In Fixed-Point Algorithms for Inverse62BibliographyProblems in Science and Engineering, volume 49 of Springer Opti-mization and Its Applications, pages 243–259. Springer New York,2011. → pages 4, 29[GL13] B. Gardiner and Y. Lucet. Computing the conjugate of convexpiecewise linear-quadratic bivariate functions. Mathematical Pro-gramming Series B, 139(1-2):161–184, 2013. → pages 1, 3, 5, 7,12[Goe08] R. Goebel. Self-dual smoothing of convex and saddle functions.Journal of Convex Analysis, 15(1):179–192, 2008. → pages 4, 5[Her16] C. Hermosilla. Legendre transform and applications to finiteand infinite optimization. Set-Valued and Variational Analysis,24(4):685–705, 2016. → pages 2[HUL07] J Hiriart-Urruty and Y. Lucet. Parametric computation of thelegendre-fenchel conjugate with application to the computationof the moreau envelope. Journal of Convex Analysis, 14(3):657,2007. → pages 2[Kha13] J. Khan. Computational convex analysis using parametricquadratic programming, 2013. → pages vi, viii, 4, 6, 7, 13, 57, 58,59[LBT09] Y. Lucet, H. Bauschke, and M. Trienis. The piecewise linear-quadratic model for computational convex analysis. Computa-tional Optimization and Applications, 43(1):95–118, May 2009.→ pages 3[Luc97] Y. Lucet. Faster than the Fast Legendre Transform, the Linear-time Legendre Transform. Numerical Algorithms, 16(2):171–185,1997. → pages 3[Luc06] Y. Lucet. Fast moreau envelope computation i: numerical algo-rithms. Numerical Algorithms, 43(3):235–249, 2006. → pages 2,3[Luc09] Y. Lucet. New sequential exact euclidean distance transform al-gorithms based on convex analysis. Image and Vision Computing,27(1):37–44, 2009. → pages 163Bibliography[Luc10] Y. Lucet. What shape is your conjugate? A survey of computa-tional convex analysis and its applications [reprint of mr2496900].SIAM Review, 52(3):505–542, 2010. → pages 1, 2, 3, 4[Luc13] Y. Lucet. Techniques and open questions in computational convexanalysis. In Computational and analytical mathematics, volume 50of Springer Proceedings in Mathematical & Statistics (PROMS),pages 485–500. Springer, 2013. → pages 1, 3, 4[MN15] B. S. Mordukhovich and N. M. Nam. Geometric approach toconvex subdifferential calculus. Optimization, 0(0):1–35, 2015. →pages 15[Mul88] K. Mulmuley. A fast planar partition algorithm. i. In 2013 IEEE54th Annual Symposium on Foundations of Computer Science(1988), volume 00, pages 580–589, Los Alamitos, CA, USA, 1988.IEEE Computer Society. → pages 6[PS11] P. Patrinos and H. Sarimveis. Convex parametric piecewisequadratic optimization: Theory and algorithms. Automatica,47(8):1770 – 1777, 2011. → pages 6[PW16] C. Planiden and X. Wang. Strongly convex functions, moreau en-velopes, and the generic nature of convex functions with strongminimizers. SIAM Journal on Optimization, 26(2):1341–1364,2016. → pages 2[Roc70] R. T. Rockafellar. Convex Analysis. Princeton University Press,Princeton, New york, 1970. → pages 6, 12, 15, 23, 54[RW09] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis.Springer-Verlag, Berlin, 2009. → pages 3, 10, 15[SCI] Scilab. http://www.scilab.org/. → pages 1, 7, 4364
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computation of convex conjugates in linear time using...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computation of convex conjugates in linear time using graph-matrix calculus Haque, Tasnuva 2017
pdf
Page Metadata
Item Metadata
Title | Computation of convex conjugates in linear time using graph-matrix calculus |
Creator |
Haque, Tasnuva |
Publisher | University of British Columbia |
Date Issued | 2017 |
Description | Computational Convex Analysis focuses on computing the convex operators which are used very often in convex analysis. The Fenchel conjugate is one of the most frequently used convex operator. The objective of this thesis is to develop an algorithm for computing the conjugate of a bivariate Piecewise Linear-Quadratic (PLQ) function. Although some algorithms already exist for computing the conjugate of a bivariate PLQ function, their worst-case time complexity is not linear. Our challenge is to improve the worst case time complexity to linear. We use a planar graph to represent the entities of a PLQ function. Each node of the graph contains a GPH matrix which represents an entity of the PLQ function . In addition, we store the adjacency information and type of all entities. We traverse the graph using breadth first search and compute the conjugate of each entity. Moreover we store the information of visited entities using a binary flag. So we never need loop through all entities to check whether it is already visited. As a result we get linear computation time in the worst case. We perform numerical experiment which confirms that our algorithm is running in linear-time. We provide a comparison of the performance of this algorithm with the previous algorithms. Finally we explained how to extend this algorithm in higher dimensions while keeping the worst-case time complexity linear. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2017-03-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0343417 |
URI | http://hdl.handle.net/2429/61061 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Irving K. Barber School of Arts and Sciences (Okanagan) Computer Science, Department of (Okanagan) |
Degree Grantor | University of British Columbia |
Graduation Date | 2017-05 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2017_may_haque_tasnuva.pdf [ 1.57MB ]
- Metadata
- JSON: 24-1.0343417.json
- JSON-LD: 24-1.0343417-ld.json
- RDF/XML (Pretty): 24-1.0343417-rdf.xml
- RDF/JSON: 24-1.0343417-rdf.json
- Turtle: 24-1.0343417-turtle.txt
- N-Triples: 24-1.0343417-rdf-ntriples.txt
- Original Record: 24-1.0343417-source.json
- Full Text
- 24-1.0343417-fulltext.txt
- Citation
- 24-1.0343417.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0343417/manifest