UBC Graduate Research

An exploration of matrix equilibration Liu, Paul 2015-09

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


42591-Liu_Paul_CS517_An exploration of_2015.pdf [ 219.76kB ]
JSON: 42591-1.0103601.json
JSON-LD: 42591-1.0103601-ld.json
RDF/XML (Pretty): 42591-1.0103601-rdf.xml
RDF/JSON: 42591-1.0103601-rdf.json
Turtle: 42591-1.0103601-turtle.txt
N-Triples: 42591-1.0103601-rdf-ntriples.txt
Original Record: 42591-1.0103601-source.json
Full Text

Full Text

An exploration of matrix equilibrationPaul LiuAbstractWe review three algorithms that scale the infinity-norm of each row andcolumn in a matrix to 1. The first algorithm applies to unsymmetric ma-trices, and uses techniques from graph theory to scale the matrix. Thesecond algorithm applies to symmetric matrices, and is notable for be-ing asymptotically optimal in terms of operation counts. The third is aniterative algorithm that works for both symmetric and unsymmetric ma-trices. Experimental evidence of the effectiveness of all three algorithmsin improving the conditioning of the matrix is demonstrated.1 IntroductionIn many cases of practical interest, it is important to know if a given linear system is ill-conditioned. If the condition number of the linear system is high, then a computed solution willoften have little practical use, since the solution will be highly sensitive to a perturbation in theoriginal system. To handle ill-conditioned systems, practitioners often preprocess the matrix ofthe linear system by means of scaling and permutation. This preprocessing stage often bettersthe conditioning of the matrix and is referred to as equilibration.In the case of direct solvers, equilibration schemes have been devised as to make the solvesmore stable. For example, the scheme used in [3] computes a reordering and scaling of the matrixto eliminate the need for pivoting during gaussian elimination. In the case of iterative methods,equilibration is used to minimize the given matrix's condition number under some norm, as todecrease round-off errors in matrix vector products. For this paper, we focus on equilibrationalgorithms whose purpose is to minimize the condition number of a given matrix.One way to lower the condition number is to equilibrate the matrix, by scaling and permutingits rows and columns. For the condition number in the max-norm, optimal scalings have beenidentified for various classes of matrices [1]. However, there is currently no known algorithmto optimally equilibrate a general matrix and minimize its condition number, though severalheuristics have been identified that seem to work well. One such heuristic is to scale all rowsand columns of a given matrix to one under some given norm. This approach works well formost ill-conditioned systems, as shown by the numerical evidence provided in [2]. This paperwill present three algorithms  each interesting in its own right  that scale the max-norm ofeach row and column in a given matrix to one.2 Three algorithmsFor convenience, we will assume that the given matrix A is a real square matrix that only haspositive entries. Since we will only consider algorithms which scale and permute A, it is clearthat we can enforce positivity on A without loss of generality. We also assume that it has anon-zero diagonal, though all of the algorithms described in this section works without thatrestriction with minor or no modifications.112341'2'3'4'VR VC−1−1−1−101Figure 2.1: A bipartite graph on 8 nodes. Each edge (i, j′) is weighted with a weight wij .2.1 An algorithm for unsymmetric matricesThe first algorithm we present is due to Duff and Koster [5] and is implemented as algorithmMC64 in the widely used HSL Software Library. The algorithm is intended for unsymmetricmatrices, and attempts to make A as diagonally dominant as possible. To be precise, thealgorithm first finds a permutation matrix P such that the product of the diagonal entries ofPA is maximized. Then scaling matrices R and C are found so thatA˜ = PRAChas 1's on the diagonal and all other entries less than or equal to 1.This is achieved by modeling A as a bipartite graph, and then finding the minimum weightedmatching in that graph. Before we proceed further into the details of this algorithm, we introducethe reader to the weighted bipartite matching problem.2.1.1 Weighted bipartite matchingsDefine a graph G to be bipartite if it can be partitioned into two node sets VR = {1, 2, . . . , n}and VC = {1′, 2′, . . . ,m′}, where every edge of the graph connects a vertex in VR to a vertex inVC (an example is shown in Figure 2.1). Each edge (i, j′) of the graph is directed and weightedwith a real number wij . In Figure 2.1, the directions are shown by arrows and the weights areshown by the edge labels.The weighted bipartite matching problem then, is to choose a set M of edges in G, no two ofwhich are incident to the same vertex, such thatcost =∑(i,j′)∈Mwijis minimized. We refer to the optimal set M as a minimum matching.If |VR| = |VC |, one can also impose the restriction that each vertex in VR ∪ VC is incidentto an edge in M . This is called a minimum perfect matching. Note that M provides a 1-to-1and onto map between elements of VR and VC . In Figure 2.1, the minimum perfect matching isdenoted by the bolded edges.An interesting fact about weighted bipartite matchings is that if there is a minimum perfectmatchingM , then there exists a set of dual variables {ui} and {vj} such that for all edges (i, j′),{ui + vj = wij , if (i, j′) ∈Mui + vj ≤ wij , if (i, j′) /∈M(2.1)2This theorem, as well as a polynomial time algorithm of computing the minimum perfect match-ing along with the dual variables, is outlined in [6].2.1.2 Application to matrix equilibrationSuppose A has dimensions n× n, nnz non-zeros, and follows the restrictions we outlined at thestart of Section 2.Then MC64 solves the equilibration problem in the following manner:1. A bipartite graph G is created from A, with vertex i ∈ VR representing row i of A andvertex j′ ∈ VC representing column j of A. An edge (i, j′) is then added to the graph ifAij 6= 0. The weight of the edge is set to − log(Aij).2. The minimum perfect matching M is computed. From the matching, we determine apermutation matrix P by setting Pji = 1 for each edge (i, j′) in the matching, and Pij = 0everywhere else. Note that since the matching M is 1-to-1 and onto, P will have the effectof permuting row i to row j. Hence the diagonal of the permuted matrix PA will be exactlythe non-zeros of A that the edges in M correspond to.3. Using the dual variables {ui} and {vj} from computing M , the scaling matrices R andC are computed by setting R = diag(r1, r2, . . . , rn) and C = diag(c1, c2, . . . , cn), whereri = exp(ui) and ci = exp(vi).By weighing G in the manner described in step 1, we see that finding the minimum perfectmatching will be equivalent to minimizingcost =∑(i,j′)∈Mwij=∑(i,j′)∈M− log(Aij)= − log ∏(i,j′)∈MAijwhich is the same as maximizing the product∏(i,j′)∈M Aij .By the permutation computed in step 2, the product we maximized in step 1 will be permutedto the diagonal of PA, hence maximizing the product of the diagonal of PA over all possiblepermutations P .By setting R and C as described in step 3, we see from Equation 2.1 that for each non-zeroelement αij in RAC, we'll haveαij = exp(ui)Aij exp(vj)= exp(ui + vj + log(Aij))≤ exp(− log(Aij) + log(Aij))= 1with equality when (i, j′) ∈M .Hence A˜ = PRAC will have 1's on the diagonal and all other entries less than or equal to 1.Since the most practical algorithm to compute a minimum perfect matching runs in O(V E)time on a graph with V vertices and E edges, this scaling algorithm runs in O(n · nnz) timein the worst case. As noted in [5] however, the running time behaves more like O(n+ nnz) formost matrices.Note that the key insight to MC64 is the reduction of matrix equilibration to minimumbipartite matching. By choosing different weight functions for the edges of G, we open MC64 toa wide variety of modifications. There are variants of MC64 which maximizes different quantities,3such as the smallest element on the diagonal [5], or the ratio of the largest and smallest elementin A [7]. Hence MC64 is interesting in that it provides a framework on which further algorithmscan be built. Furthermore, it introduces numerical analysts to interesting algorithms from graphtheory. As we'll see in section 3, it also works well in practice.2.2 An algorithm for symmetric matricesIn this section we assume that A is symmetric.The second algorithm we present is due to Bunch [2] and is implemented in the soon-to-be-released software package SYM-ILDL [8]. The algorithm is intended for symmetric matrices,and computes a diagonal matrix D such thatA˜ = DADhas max-norm 1 on every row and column.Before we dive into the details of Bunch's algorithm, we first prove a helpful lemma.Lemma 1. Let L be the lower triangular part (not including the diagonal) of A and let 4 bethe diagonal of A. If there exists a diagonal matrix D such that every row in D(L +4)D hasmax-norm 1, then every row and column in DAD will have max-norm 1.Proof. If D(L+4)D has max-norm 1 in every row, then D(LT +4)D will have norm 1 in everycolumn. Note that every entry in DLTD and DLD will be less than or equal to 1. HenceDAD = D(L+4+ LT )D= D(L+4)D +DLTD= DLD +D(LT +4)D.The second equality shows that DAD has norm 1 in every row, and the third equality showsthat DAD has norm 1 in every column.Let T = L +4. In Bunch's algorithm, we greedily scale one row and column of T in order,starting from the first row to the last row of A. Let D = diag(d1, d2, . . . , dn) be our scalingmatrix. Bunch's algorithm is simply the following:• For 1 ≤ i ≤ n, setdi :=(max{√Tii, max1≤j≤i−1djTij})−1.The correctness of the algorithm is also easy to see: since di ≤ (max1≤j≤i−1 djTij)−1, scalingrow i by di ensures thatdiTijdj = (DTD)ij ≤ 1 (2.2)for every element on the i-th row of DTD except possibly (DTD)ii. The diagonal element istaken care of by ensuring di ≤(√Tii)−1, where the square root comes from the constraint(DTD)ii = d2iTii ≤ 1. (2.3)As di is chosen to be the maximum possible value bounded by constaints 2.2 and 2.3, there willalways be at least one entry of magnitude 1 on the i-th row after we are done. Since we scale allrows from 1 to n, DTD will have max-norm 1 in every row. Hence DAD will have max-norm1 in every row and column by our lemma.Though there doesn't seem to be any obvious extensions to Bunch's algorithm, it is interestingdue to its simplicity and its asymptotically minimal runtime of O(n + nnz), where n is thedimension of A and nnz is the number of non-zeros in A. As far as the author can tell, it isthe only equilibration algorithm for symmetric matrices that can be described by a single lineof code.42.3 An algorithm for general matricesThe final algorithm we present is the author's own creation, though it shares many similaritieswith the one presented in [9].1For convenience, we refer to the final algorithm as ALG3. Aswith Duff's algorithm, this algorithm attempts to find matrices R and C such thatA˜ = RAChas max-norm 1 in every row and column. When A is symmetric, the algorithm is designed togive R = C.Let r(A, i) and c(A, i) denote the i -th row and column of A respectively, and let D(i, α) tobe the diagonal matrix with Djj = 1 for all j 6= i and Dii = α. Using this notation, ALG3 isshown in 2.3.Algorithm 1 ALG3 for equilibrating general matrices in the max-norm.1: R := I2: C := I3: A˜ := A4: while R and C have not yet converged do:5: for i := 1 to n6: αr :=1√||r(A˜,i)||∞7: αc :=1√||c(A˜,i)||∞8: R := R ·D(i, αr)9: C := C ·D(i, αc)10: A˜ := D(i, αr)A˜D(i, αc)11: end for12: end whileGiven a tolerance , the convergence criterion for ALG3 is that |1− αr| <  and |1− αc| < for every row and every column. When this criterion is reached, we terminate the algorithm.As we can see above, each iteration of the for loop (line 5) attempts to fix a single row andcolumn of A˜. On iteration i, we take the maximum element αi in row i, scale it to√αi, andthen do the same for column i. If the maximum of both row and column i is on the diagonal,it is scaled to 1. Though we might undo some of the scaling we did for row and column i whenwe are scaling some other row and column j > i, it seems from experiments that this does notstop the algorithm from converging.Heuristically, the norm of each row and column converges to 1 since on each iteration of thewhile loop, we take the maximum element in each row and column to approximately square rootof their original value. Since the iteration Tn+1 =√Tn converges to 1 linearly for any initialT0, we expect to see this in the algorithm. In fact, as we can see in Figure 2.2, the maximumelement in A˜ seems to converge linearly as expected. On a test of 1000 random 100 × 100matrices, ALG3 successfully returned R and C for which RAC had max-norm 1 on every rowand column. Since each iteration of the while loop costs O(nnz) operations, this algorithmseems to take O(nnz · log ) time if we assume linear convergence.Furthermore, the algorithm seems to work for any norm, not just the infinity norm. That is, ifwe switched the norm on lines 6 and 7 to the p-norm, then ALG3 returns matrices R and C suchthat RAC has p-norm 1 in every row and column. This is also the behaviour of the algorithmgiven in [9], for which they prove convergence results for the infinity norm and p-norm for anyfinite p ≥ 1.1This algorithm can be seen as a variant of Ruiz's algorithm, as the same proof of correctness will apply withthe same convergence rate.5Figure 2.2: Convergence of ALG3, with iteration plotted against log log of the maximum of A.Each `iteration' in the graph above is one iteration of the while loop. The flat part ofthe plot past iteration 26 is the algorithm reaching the limits of machine precision.Another note of interest is that we did not need to use the square root function for thisalgorithm. In fact, any function f with the property that the two sequencesyn+1 = yn/f(yn) (2.4)xn+1 = f(xn) (2.5)both converge to 1 for any initial x0 and y0 seems to work in place of the square root. We donot even have to use the same f for different iterations of the for loop, though this seems abit extreme. Equation 2.4 comes from the constraint that we when scale a row or column ofA˜ during the course of ALG3, we want the elements A˜ij of that row or column to converge to1. For the max-norm, the largest element αi of row i will get scaled to αi/f(αi). So for thealgorithm to work in the max-norm, αi/f(αi) must approach 1. Equation 2.5 comes from theconstraint that αr and αc must converge to 1 for R and C to converge. Hence 1/f(xn) mustapproach 1, which implies that f(xn) approaches 1. Though the convergence of 2.4 and 2.5 to1 seem to be necessary conditions on f , they are actually sufficient. The proof of this followsstraightforwardly from the proof of the algorithm given in given in [9].3 Results3.1 For unsymmetric matricesFor the unsymmetric case, we took a set of real, ill-conditioned, non-singular matrices fromUniversity of Florida's sparse matrix collection [4], ranging from sizes 1157 to 8192 with a max-imum of 83842 non-zeros. In addition, we generated some smaller dense matrices in MATLAB(unsym-rand01 to unsym-rand5) using the built-in rand command (rand returns matriceswith entries picked uniformly from the open interval (0,1)). To make these random matricesill conditioned, half of its rows (randomly selected) were scaled by 10. The same was done forthe columns. The condition number in the 1-norm before and after equilibration with MC64and ALG3 are shown in In Figure 1. These are listed as condorig(A), condMC64(A) , andcondALG3(A)) respectively. The epsilon tolerance for ALG3 was set to 10−8.6Table 1: Condition numbers before and after equilibration for ALG3 and MC64matrix n nnz of A condorig(A) condMC64(A) condALG3(A)dw4096 8192 41746 1.50 · 107 9.63 · 105 5.90 · 105rajat13 7598 48762 1.46 · 1011 1.62 · 101 6.07 · 102utm5940 5940 83842 1.91 · 109 2.75 · 109 3.90 · 109tols2000 2000 5184 6.92 · 106 1.08 · 102 1.11 · 102rajat19 1157 3699 9.17 · 1010 5.87 · 1011 7.33 · 108unsym-rand05 1024 1048576 2.73 · 1013 1.77 · 105 2.27 · 105unsym-rand04 512 262144 2.07 · 1011 1.85 · 105 5.66 · 105unsym-rand03 256 65536 1.26 · 1011 1.80 · 104 2.78 · 104unsym-rand02 128 16384 1.03 · 109 1.18 · 104 1.26 · 104unsym-rand01 64 4096 1.59 · 107 1.40 · 103 2.10 · 103As we can see in Table 1, ALG3 and MC64 are comparable, though ALG3 is asymptoticallyfaster. For most matrices however, MC64 manages to obtain a lower condition number thanALG3.For the utm5940 and rajat19 matrix, both ALG3 and MC64 were unable to decrease thecondition number by much. These cases are to be expected, as MC64 and ALG3 are algorithmsbased on heuristics. An interesting extension of this project would be to work out specific worstcases for both algorithms, and see how to alter them as to make these worst cases less damaging.3.2 For symmetric matricesFor the symmetric case, we again took a set of real, symmetric, non-singular matrices fromUniversity of Florida's sparse matrix collection [4], ranging from sizes 1806 to 11948 with amaximum of 149090 non-zeros. In addition, we generated some smaller dense symmetric matricesin MATLAB. As in the unsymmetric case, the matrices were generated with rand and thenscaled to be ill-conditioned. They were then made symmetric by adding each matrix to theirtranspose. For each matrix, the condition numbers in the 1-norm before equilibration, afterequilibration with Bunch's Algorithm, and after equilibration with ALG3 were computed. Theseare listed as condorig(A), condBunch(A), and condALG3(A)) respectively. The results are shownin in Figure 2. As in the unsymmetric case, the epsilon tolerance for ALG3 was set to 10−8.Table 2: Condition numbers before and after equilibration for ALG3 and Bunch's algorithmmatrix n nnz of A condorig(A) condBunch(A) condALG3(A)bcsstk18 11948 149090 6.48 · 1011 1.21 · 105 1.21 · 105c-44 10728 85000 1.14 · 108 1.47 · 105 2.15 · 105meg4 5860 25258 4.19 · 1010 1.73 · 101 1.73 · 101sts4098 4098 72356 4.44 · 108 3.18 · 104 3.18 · 104bcsstk14 1806 63454 1.31 · 1010 9.91 · 103 9.91 · 103sym-rand05 1024 1048576 1.79 · 1011 2.81 · 105 2.24 · 105sym-rand04 512 262144 3.07 · 109 9.63 · 104 7.42 · 104sym-rand03 256 65536 1.93 · 109 3.32 · 104 4.20 · 104sym-rand02 128 16384 9.12 · 1011 2.19 · 104 2.12 · 104sym-rand01 64 4096 2.04 · 108 1.90 · 103 9.36 · 103As we can see in Table 2, the resulting condition numbers between ALG3 and Bunch's algo-rithm are quite comparable, with ALG3 slightly better in most test cases. In all tests cases, thecondition number was successfully decreased.74 ConclusionWe presented three algorithms that scale the max-norm of every row and column in a matrix to1. In the unsymmetric case, MC64 seemed to be slightly better, though both ALG3 and MC64were unable to decrease the condition number for some matrices. In the symmetric case, ALG3seemed to be slightly better, though Bunch's algorithm was much simpler and faster.All three algorithms have unique strengths:• MC64 handles unsymmetric matrices well, and is open to extension by reweighing the edgesof the bipartite graph described in Section 2.1.• Bunch's algorithm is simple to code, and achieves an asymptotically optimal runtime ofO(n+ nnz).• ALG3 is general purpose, and can be used for both symmetric and unsymmetric matrices.Furthermore, experimental evidence suggests that it can be extended to equilibrate thep-norm, not just the max-norm.Ultimately, the choice of an equilibration method will depend on the specific problem beingsolved. What we demonstrated here however, is that there are many ways to equilibrate amatrix: using techniques from graph theory (MC64), iterative methods (ALG3), and greedyalgorithm design (Bunch). The author hopes that the reader too will be inspired to create theirown equilibration technique, and looks forward to future techniques to come.5 AcknowledgmentWe thank Prof. Chen Greif, for his support and generous extension on the due date of thisproject.References[1] F. L. Bauer. Optimally scaled matrices. Numerische Mathematik, 5:7387, 1963.[2] James Bunch. Equilibration of symmetric matrices in the max-norm. Journal of the ACM,18:566572, 1971.[3] A.R. Curtis and J.K. Reid. On the automatic scaling of matrices for gaussian elimination.J. Inst. Maths. Applics., 10:118124, 1972.[4] T. A. Davis and Y. Hu. The university of florida sparse matrix collection.[5] Iain Duff and Jacko Koster. On algorithms for permuting large entries to the diagonal of asparse matrix. SIAM Journal on Matrix Analysis and Applications, 22:973996, 2001.[6] L. R. Ford and D.R. Fulkerson. Flows in Networks. Princeton University Press, 1962.[7] D. R. Fulkerson and P. Wolfe. An algorithm for scaling matrices. Siam Review, 4:142146,1962.[8] Chen Greif, Shiwen He, and Paul Liu. SYM-ILDL, a package for the incomplete LDLTfactorization of indefinite symmetric matrices.[9] Daniel Ruiz. A symmetry preserving algorithm for matrix scaling. Technical Report 7552,INRIA, 2001.8


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items