TIEBREAKING THE MINIMUM DEGREE ALGORITHM FOR ORDERING SPARSE SYMMETRIC POSITIVE DEFINITE MATRICES by IAN ALFRED CAVERS B.Sc. The University of British Columbia, 1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF COMPUTER SCIENCE We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA December 1987 © Ian Alfred Cavers, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 D a t e December 16, 1987 DE-6(3/81) Abstract The minimum degree algorithm is known as an effective scheme for identifying a fill reduced ordering for symmetric, positive definite, sparse linear systems, to be solved using a Cholesky factorization. Although the original algorithm has been enhanced to improve the efficiency of its implementation, ties between minimum degree elimination candidates are still arbitrarily broken. For many systems, the fill levels of orderings produced by the minimum degree algorithm are very sensitive to the precise manner in which these ties are resolved. This thesis introduces several tiebreaking enhancements of the minimum degree algorithm. Emphasis is placed upon a tiebreaking strategy based upon the deficiency of minium degree elimination candidates, and which can consistently identify low fill orderings for a wide spectrum of test problems. All tiebreaking strategies are fully integrated into implementations of the minimum degree algorithm based upon a quotient graph model, including indistinguishable sets represented by uneliminated supernodes. The resulting programs are tested on a wide variety of sparse systems in order to investigate the performance of the algorithm enhanced by the tiebreaking strategies and the quality of the orderings they produce. ii Contents Abstract ii Table of Contents iii List of Tables ix List of Figures xi Acknowledgements xii 1 Introduction 1 1.1 The Solution of Sparse Linear Systems 1 1.2 Thesis Overview 7 2 Evolution of the Minimum Degree Algorithm 8 2.1 The Cholesky Factorization of Symmetric Positive Definite Systems 8 2.2 Representation of Symmetric Matrices Using Graphs 10 2.3 Origin of the MDA 14 2.4 Elimination Graph Model 16 2.4.1 Elimination Graphs 17 iii 2.4.2 The MDA and the Elimination Graph Model 21 2.4.3 Implementation Limitations of the Elimination Graph Model 22 2.5 Reachable Set Model 25 2.5.1 Implicit Formation of Elimination Graphs 25 2.5.2 The MDA and the Reachable Set Model 27 2.5.3 Implementation Limitations of the Reachable Sets Model 28 2.6 The Quotient Graph Model 31 2.6.1 Eliminated Supernodes 32 2.6.2 Formalisms of Quotient Graphs 33 2.6.3 Successive Quotient Graph Formation 35 2.6.4 The MDA and the Quotient Graph Model 36 2.6.5 Storage Requirements of the Quotient Graph Model 38 2.7 Mass Elimination of Indistinguishable Nodes 42 2.7.1 Indistinguishable Nodes 43 2.7.2 Integration of Uneliminated Supernodes into the MDA . . 44 2.7.3 Identification of Indistinguishable Sets 46 2.7.4 The MDA With Uneliminated Supernodes 48 2.7.5 Supernode Efficiency Considerations 51 2.8 External Degree of Uneliminated Supernodes 51 2.9 Further Extensions 53 2.9.1 Incomplete Degree Updating 53 iv 2.9.2 Multiple Elimination 54 3 Primary Tiebreaking of the MDA 56 3.1 Tiebreaking Motivation 56 3.2 Tiebreaking Goals 58 3.3 Other Tiebreaking Research 59 3.4 Proposed Tiebreaking Strategy 60 3.5 Deficiency Calculations 64 3.6 Connectivity Lists 66 3.7 Deficiency and Uneliminated Supernodes 67 3.7.1 Normal Deficiency Calculations With Supernodes 67 3.7.2 The Deficiency, Degree and Elimination of Supernodes 70 3.8 Degree and Connectivity Updating 74 3.9 MDA With Deficiency Tiebreaking 81 3.10 Compatibility of Tiebreaking With Additional MDA Extensions 86 3.10.1 Tiebreaking and Incomplete Updating 86 3.10.2 Tiebreaking and Multiple Elimination 88 4 Secondary Tiebreaking of the M D A 90 4.1 Motivation and Goals 90 4.2 Proposed Secondary Tiebreaking Strategies 94 4.2.1 Strategy One 94 4.2.2 Strategy Two 101 v 4.2.3 Strategy Three 102 5 Implementation Considerations 107 5.1 Implementation Overview 107 5.2 Common Implementation Details of the Norm and Tie Modules 109 5.2.1 Common Data Structures 109 5.2.2 Successive Quotient Graph Formation 112 5.2.3 Implementation of Eliminated and Uneliminated Supern-odes 117 5.2.4 Implementation of Reachable Sets 127 5.2.5 Lower Adjacency Structure Formation 129 5.3 Distinct Features of the Norm Module 130 5.4 Distinct Features of the Tie Module 133 5.4.1 Connectivity Calculation and Updating 133 5.4.2 Connectivity Lists 134 5.4.3 The Elimination Node Selection Process 142 5.4.4 Reachable Set Storage 142 5.4.5 Degree and Reachable Set Updating 146 5.5 Implementation of Secondary Tiebreaking 148 5.6 Implementation of the Solve Module 149 6 Results and Analysis 152 6.1 Testing Environment 152 6.2 Test Problems 153 6.3 Primary Tiebreaking Test Results 160 vi 6.3.1 Ordering Test Results 160 6.4 Factor and Solve Timings 175 6.5 Secondary Tiebreaking Test Results 177 7 Concluding Remarks 182 7.1 Summary 182 7.2 Future Work 184 Bibliography 186 A Additional Secondary Tiebreaking Testing Results 189 vii List of Tables 3.1 Fill Fluctuations of Minimum Degree Orderings 58 4.1 Fill Fluctuations of Minimum Degree Orderings Produced With Deficiency Tiebreaking 93 5.1 Supernode Comparison 118 5.2 Node Selections With Degree < Maxcondeg 139 6.1 Harwell-Boeing Test Problems Selected 157 6.2 Test Problem Characteristics 159 6.3 Array Constants Used for Testing 161 6.4 Tie and Norm Ordering Testing 164 6.4 Continued 165 6.4 Continued 166 6.4 Continued 167 6.4 Continued 168 6.4 Continued 169 6.5 Fill Summary 170 6.6 Solution Time Summary 176 6.7 Testing of Secondary Tiebreaking 180 viii A . l Testing of Secondary Tiebreaking Criterion #3 190 A . l Continued 191 A . l Continued 192 A . l Continued 193 ix List of Figures 1.1 Effect of Preorderings on Fill 3 2.1 Representation of Symmetric Matrices Using Graphs 10 2.2 Permutations and the Relabelling of Graphs 11 2.3 Reachable Set Example 13 2.4 Elimination Graph Modelling of Cholesky Factorization 19 2.5 Fill Edge Production 20 2.6 Elimination Graph Model MDA Example 23 2.7 The Star Graph 25 2.8 Equivalence of Reachable Sets and Adjacency Sets 26 2.9 Reachable Set Model MDA Example 29 2.10 Reachable Set Inefficiences 30 2.11 Eliminated Supernode Formation 33 2.12 Example Comparing Elimination and Quotient Graph Modelling 37 2.13 Quotient Graph Model MDA Example 39 2.14 Storage Shortfall of a Supernode Representative 41 2.15 Eliminated Supernode Aadjacency Set Storage 42 2.16 An Example of Indistinguishable Nodes 44 2.17 Formation of an Uneliminated Supernode 46 x 2.18 Effects of Uneliminated Supernode Formation 50 3.1 Deficiency Example 61 3.2 Deficiency Tiebreaking Example 63 3.3 Supernode Expansion for a Connectivity Calculation 69 3.4 Elimination of a Supernode Representative 71 3.5 Connectivity and Degree Updating 76 3.6 MDA Example Using Deficiency Tiebreaking 84 3.6 Continued 85 4.1 Classification of Elimination Affected Nodes 96 4.2 Connectivity and Class Three Triangles 97 5.1 Adjacency Lists 110 5.2 Quotient Graph Transformation Example 116 5.3 Uneliminated Supernode Adjacency List Storage Example . . . . 120 5.4 Example of an Uneliminated Supernode Formation 124 5.5 Node Selection in the norm Module 131 5.6 Selection Degree 139 5.7 Connectivity List Length - nine point 1089 141 5.8 Connectivity Index Length - nine point 1089 141 6.1 The Initial Mesh of fullspider 154 6.2 The Initial Mesh of 6hole 155 6.3 The Initial Mesh of Shole 155 xi Acknowledgements First and foremost I would like to thank my supervisor, Dr. Jim Varah, for the advice, guidance and support he has given me while writing this thesis. Many thanks are also extended to Dr. Uri Ascher for reading the final draft. In addition, I would like to thank my fellow graduate students for answering my many questions and Roger Grimes of Boeing Computer Services who provided many of the test problems. Finally, I would like to thank my wife Diana for her endless support and patience despite the many lonely evenings. xii Chapter 1 Introduction 1.1 The Solution of Sparse Linear Systems This thesis discusses the solution of large systems of linear algebraic equations of the form: Ax = b (1.1) where A is an iVxJV sparse matrix, and 6 and x are both column vectors of dimension N. A precise definition of the term sparse matrix is very difficult and is often the subject of much debate. For the purposes of this thesis a matrix is considered sparse if few of its coefficients are nonzero. Acceptable nonzero percentages have been suggested but more practically Duff [DER86] suggests that "a matrix is sparse if there is an advantage in exploiting its zeros". Large sparse linear systems arise from a broad spectrum of application areas found in many fields of science and engineering. The finite difference discretiza-tion of partial differential equations does create many interesting problems, but large sparse systems also arise, for example, in linear programming, heat con-duction studies, structural analysis, power grid networking, astronomy, physics, economic modelling and chemical engineering. The structures of the correspond-ing matrices range from the highly regular banded matrices arising from the discretization of partial differential equations, to the matrices for certain chem-ical engineering applications, for which there is essentially no regularity to the 1 matrices' structures. (See [Duf86] .) This wide range of sparse systems is accompanied by a large number of so-lution methods. In the 1960's the study of the solution of sparse matrices dealt mainly with iterative methods. However, with the increased availability of com-puter storage and efficient computer codes, direct methods have become increas-ingly popular during the 1970's and early 1980's. There is no simple answer as to whether iterative, direct or some combination of both methods is the better overall approach. Different methods within each of the two classes are partic-ularly well suited to different problems and programming environments. From this broad selection of problems and solution methods, this thesis will concen-trate upon symmetric, positive definite systems and their solution by a direct method based upon the Cholesky decomposition method. One of the most important characteristics of the symmetric, positive definite matrices, is that unlike their indefinite cousins, pivotting is not necessary during the factorization process to ensure numerical stability [GvL83]. The matrix A of equation 1.1 can be factored with no pivotting into LLT, where L is lower triangular. The two triangular systems Ly = b and LTx — y can then be solved using forward and backward substitution to obtain the solution x. In fact, since for any permutation matrix P, PAPT is also symmetric and positive definite, any initial ordering of equations and variables for A can be handled in a similar fashion by solving the equivalent system PAPT Px = Pb. (1.2) For small, dense, symmetric positive definite matrices this is essentially the whole story. A straightforward Cholesky factorization and solution of the result-ing triangular systems produces x. However, keeping track of all elements in A and L is not practical for very large sparse systems. Attempts must be made to take advantage of the system's zero coefficients, to reduce storage requirements and solution effort. The general sparse matrix solution method considered in this thesis only manipulates those coefficients of A and L which are nonzero. Advantage is not taken of zero entries in L produced by cancellation, only of 2 * * * * * * * * * * * * * * * * Matrix 1 Suffers from complete fill-in. * * * * * * * * * * * * * * * * Matrix 2 No fill-in experienced. Figure 1.1: Effect of Preorderings on Fill those zero entries which may be predicted prior to the actual factorization. When the matrix A undergoes Cholesky factorization, the factor L normally suffers from some fill. A fill entry in L is a nonzero entry which corresponds to a zero entry in the same position of the original matrix. An observation important to the solution of sparse symmetric, positive definite systems (and other sparse symmetric systems for that matter) is that the amount of fill produced by the factorization process can be drastically affected by a preordering, P. A simple example in Figure 1.1 demonstrates the potential fill reduction effects of a good ordering. The stars identify nonzero entries in each matrix structure represented. If Cholesky factorization were applied to matrix 1, L would suffer from complete fill-in: assuming cancellation does not occur, no entries in L's lower triangular portion would be zero. However, if rows and columns 1 and 5 are exchanged to produce matrix 2, its factorization results in no fill entries. From the above discussion it is obvious that the judicious choice of an ordering is important to the solution of large sparse systems. 3 It is generally accepted that the complete solution process for sparse, sym-metric, positive definite systems using the Cholesky method can be broken into four distinct and independent steps. 1. (Ordering) Determination of a good ordering or permutation P for A. 2. (Symbolic Factorization) To exploit the general sparsity of the system, the nonzero structure of L is identified and an appropriate data structure initialized in preparation for the Cholesky factorization of PAPT = LLT. 3. (Numerical Factorization) Decompose PAPT into LLT using the Cholesky method. 4. (Solve) Solve LLTPx = Pb. By dividing the process into these four distinct phases, the flexibility of the solution process is increased. As will be subsequently discussed, steps 1 and 2 do not require the numerical values of the system, only its nonzero structure. If several systems have the same stucture but different numeric values, one ordering and one symbolic factorization step can be used in the solution of all systems. Similarly, if several linear systems have the same coefficient matrices but dif-ferent right hand sides, steps 1 through 3 need only be executed once. Finally, by separating the ordering phase from the remainder of the process, different ordering schemes can be selected without affecting the remaining three stages. Although limited discussion of all four steps outlined above is included, this thesis is primarily concerned with the ordering problem of step 1. Several criteria have been proposed in the literature upon which to base ordering algorithms. It is probably safe to assume that no single criteria has been proposed which can provide the basis for an optimal ordering algorithm for all problems and programming environments. A selection of these criterion are: minimum band width, minimum arithmetic for factor and solve, minimum storage, and minimum fill-in. Methods which attempt to minimize band width are commmon but do not generally take full advantage of a matrix's sparsity [GL81]. At first glance the 4 minimum arithmetic criterion appears to be very desirable, but Rose has shown [Ros72] that an ordering scheme based strictly upon this criterion is impractical. The minimum fill-in criterion, however, has received wide attention as a possible criterion for the optimal ordering. As a criterion, minimum fill-in has several advantages. As shown by Rose [Ros72], minimizing file-in would result in the minimization of the back solve operations required by step 4 of the solution process. In addition, although minimum fill-in does not necessarily guarantee a minimization of arithmetic op-erations for the entire process, it is a reasonable approximation of minimum arithmetic in practice [Ros72]. Finally, for methods which take full advantage of sparsity, the minimization of new nonzero entries in L results in a minimization of primary storage. Thus the minimum fill-in criterion is an attractive goal be-cause it encompasses, in at least an approximating fashion, several of the more desirable criteria. In Section 2.2 it will be demonstrated that the structure of a symmetric matrix can be represented by an undirected graph. In 1972 Rose [Ros72] showed that finding the minimum fill-in ordering of a symmetric matrix was equivalent to finding the minimum number of edges which could be added to affect the triangulation of the matrix's associated graph. (See Section 2.2 for a definition of triangulated graphs.) Much effort was devoted, with no success, to finding an efficient algorithm which could produce the minimum triangulation of a general graph. However, the minimum triangulation problem, and hence the minimum fill-in problem, was shown by Yannakakis to be NP-complete [Yan8l] and it was realized that only good heuristic algorithms were feasible for large sparse problems. Many different approaches to the minimum fill-in ordering problem have been suggested for large, symmetric positive definite systems. Nested dissection is an algorithm very successful at minimizing fill for problems arising from regular, planar grids, through recursive partitioning of the mesh. Unfortunately, the im-plementation of nested dissection algorithms is very domain dependent, although current research is attempting to refine the recursive partitioning process for 5 more realistic and irregular domains. (See for example [GL78a].)The minimum deficiency algorithm (or local minimum fill-in algorithm) proposed in its sym-metric form by Tinney and Walker [TW67] and later by Rose [Ros72] with a graphical interpretation, attempts to globally minimize fill-in by locally mini-mizing the fill-in experienced at each stage of the Cholesky factorization. (See Section 2.1 for a description of the Cholesky factorization process.) Unfortu-nately, high execution costs make the minimum deficiency algorithm impractical when implemented in its original form. Hsieh and Ghausi [HG72] presented a very interesting class of algorithms which rely upon a local minimization of fill as well but use a probabilistic prediction of fill at each stage of the Cholesky de-composition. Dembart and Erisman [DE73] showed that algorithms based upon this approach were not as practical as a more popular approach referred to as the minimum degree algorithm. The minimum degree algorithm (or MDA), introduced by Tinney and Walker [TW67], is the symmetric analog of a general ordering algorithm first suggested by Markowitz [Mar57]. This heuristic algorithm for the minimum fill-in ordering problem has received much attention in industry and is considered a practical approach, effective in reducing fill and arithmetic costs. The basic algorithm can be described rather simplistically as follows. (For those needing to refamiliarize themselves with the Cholesky method, see Sec-tion 2.1.) At each stage of the Cholesky factorization row and column permu-tations maintaining symmetry are performed on the unfactored portion of A so that the newest column of the growing factor L, contains the fewest off-diagonal nonzero elements. This algorithm is obviously not deterministic when two or more columns of the unfactored portion contain the same number of off-diagonal nonzero members. Such ties are arbitrarily broken in the original algorithm. This thesis includes a general discussion of the evolution of the MDA and its efficient implementation. However, a strong emphasis will be placed upon newly developed tiebreaking strategies which remove some of the arbitrary nature of the algorithm mentioned above and result in orderings with improved characteristics. 6 1.2 Thesis Overview Chapter 2 traces the development of the minimum degree ordering algorithm from its origins in the Markowitz algorithm, to one of its most successful ver-sions based upon the quotient graph model. The enhancement of the algorithm by indistinguishable sets is considered, as well as the multiple elimination and incomplete degree updating extensions. In Chapter 3 the deficiency tiebreaking extension of the MDA is introduced as an alternative to a random selection amongst a group of minimum degree elimination candidates. The modification of the node selection process is detailed for the MDA version based upon a quotient graph model, with indistinguishable sets in the form of uneliminated supernodes. Three additional tiebreaking schemes are introduced in Chapter 4. Each alternative scheme is intended to assist in the node selection process of the MDA when the deficiency tiebreaking strategy of Chapter 3 fails to isolate a single minimum degree candidate for elimination. Chapter 5 discusses the implementation details of a number of programs developed to test the tiebreaking strategies. Three ordering programs are con-sidered, as well as a program which executes the remaining three steps of the solution process. The first ordering program is an implementation of the MDA using a quotient graph model, while in the second program the MDA is enhanced by deficiency tiebreaking. The third ordering program actually represents three different programs, obtained when the MDA is enhanced by deficiency tiebreak-ing and one of the three secondary tiebreaking schemes of Chapter 4. Chapter 6 presents the results and analysis of the extensive testing conducted with the proposed tiebreaking strategies. The deficiency tiebreaking strategy is emphasized. In the final chapter the discussion and results of the thesis are summarized and topics for additional research are identified. 7 Chapter 2 Evolution of the Minimum Degree Algorithm This chapter traces the development of the MDA (minimum degree algorithm) from its origins in the Markowitz algorithm to its most effective forms using arbi-trary tiebreaking. The first two sections provide the necessary background infor-mation on the Cholesky factorization and the elementary aspects of graph the-ory used to model the Cholesky factorization process in different versions of the MDA. In subsequent sections, the MDA's evolution is traced from Markowitz's algorithm, through the elimination graph and reachable set models, to the quo-tient graph model used by SPARSPAK [GL78c]. Mass elimination based upon indistinguishable nodes is then discussed along with the use of external degrees. Finally, the last section discusses two further refinements, incomplete degree up-dating and multiple elimination, not implemented by the test programs devel-oped for this thesis. 2.1 The Cholesky Factorization of Symmetric Positive Definite Systems It is well known that the existence and uniqueness of the Cholesky factorization, L L T , of a symmetric, positive definite matrix A is guaranteed. (See [GvL83] for example.) However, there are a number of different forms of the Cholesky 8 method which can be used. For the purposes of studying the modelling of the factorization process (see Sections 2.4- 2.6), the outer product form is the most useful. This version is by no means the most conducive for an implementa-tion and a second form, mentioned in Section 5.6, was actually selected for the algorithmic basis of the factorization routines. Following the notation and formulations of George and Liu [GL81], for 1 < i < N, let di represent the pivot at the ith step of the factorization process, Vi be a column vector of length (N — t), Li be an NxN matrix from which the factor L is determined, Hi be an (N — i)x(N — i) matrix, and finally let represent an (N — i)x(N — i) identity matrix. The Cholesky factorization of A, an NxN matrix, can be described in terms of the decomposition step illustrated in equation 2.1. A = A 0 = H0 = di Vx Hl " 1 y/d[ 0 Vi/Vd~i h J L 0 Hi = L1A1LJ 0 h (2.1) where H^H^- V^/d^ It can easily be shown that Hi is also a symmetric positive definite matrix and the step shown in equation 2.1 can be recursively applied in the following manner. 1 0 0 A, -1 0 0 Hi 1 0 0 0 0 o v2/v/37 J 2 = L2A2L2 0 d2 V? 0 V2 H2 \ 1 0 0 0 1 0 . 0 0 H2 1 0 0 0 Vd~2 V2T/y/T2 0 0 J 2 (2.2) where H2 = H2- V2V2T/d2 In this fashion, the basic factorization step illustrated above is applied a total of N times to produce 9 x x 2 X X x x x 3 X X, X. X X X X 4 X x 5 X x x x 6 X © Figure 2.1: Representation of Symmetric Matrices Using Graphs A = L\Li... LN I LNLN_L Due to the special nature of each L,-, it can be shown quite simply that A = LLT where Thus each step of the factorization process computes a column of the factor L, with the computation of each column independent of all previously determined columns. 2.2 Representation of Symmetric Matrices Using Graphs Essential to particular versions of the MDA, is the manner in which the matrix A is modelled during a simulated Cholesky factorization. All three forms of the MDA discussed in Sections 2.4-2.6 rely upon a representation scheme of A's nonzero structure which is based on undirected graphs. An undirected graph, G = (X,E), can be defined by a set of nodes, X, and a set of edges, E. An edge is defined as an unordered pair of nodes. A picto-rial interpretation of each undirected graph, G, is available as demonstrated in L = Li + L2 + • • • + LN - (N - 1)I0. 10 Matrix PAP T Graph G' Figure 2.2: Permutations and the Relabelling of Graphs Figure 2.1. Ignoring for the moment the matrix A, the circles of the accompany-ing illustrations represent the graph's nodes, while the line segments joining two nodes represent its edges. (Braces are used to indicate unordered pairs.) For the remainder of this thesis, only undirected graphs will be discussed and they will be simply referred to as a graph. As previously mentioned, the sparsity structure of a symmetric matrix can be represented by a graph. Each diagonal entry of the matrix is represented by its own node in the corresponding graph. The positioning of the off-diagonal, nonzero entries of the matrix are recorded by the edges of the graph. For example, in Figure 2.1 an edge connecting nodes 2 and 3 corresponds to the nonzero entries found at row 2, column 3 and row 3, column 2 of the matrix. The ordering or labelling of the nodes shown in G is based upon the ordering of the matrix A. Unless stated otherwise, it will be assumed that the labelling of a graph is found by numbering the diagonal entries of the matrix from 1 to N. A more formal description of the correspondence between a symmetric matrix, A, and a labelled graph, G = (X, E), is as follows. ait G A i G X and If P (^ /) is an NxN permutation matrix, the symmetric matrices A and PAPT can be represented by graphs differing only in the specific labels assigned 11 to each graph's set of nodes. Thus a reordering of A corresponds to a relabelling of its graph G. Figure 2.2 demonstrates the new labelling required after the matrix A, in Figure 2.1, is reordered with a simple permutation matrix P. The correspondence between reordering and relabelling allows the problem of finding a good permutation for A to be thought of instead, as the problem of determining a good, alternative labelling for the graph G. Each of the three models of Cholesky factorization to be discussed work with such basic graphs to start. However, the models differ in the manner in which the changes to the matrix, during the modelled decomposition, are reflected in the graph. The sophistication of the representation used increases from the basic elimination graph model, through the reachable set model to the quotient graph model. These extended features and higher levels of abstraction will be introduced and discussed as needed. Several elementary definitions of graph theory, which are important in sub-sequent discussion, will now be outlined. Additional definitions and notation specific to a particular area will be introduced as needed in the appropriate sections of this thesis. Two nodes x and y are said to be adjacent in G if x, y G X and {x, y} E E. The adjacency set of y, adj(y), is the set of nodes in G which are adjacent to y. An alternate definition of the adjacency operator accepts an operand that is a set of nodes and can be defined as follows. Let Y C X. adj{y) = {x e X \ {x,y} E E} (2.3) adj(Y) = {x e X | 3y (y <E Y and {x, y} € E)} (2.4) The degree of a node x in G, deg(x), is simply defined as the number of nodes in its adjacency list. (2.5) For example, in Figure 2.1 the degree of node 2 is 3. 12 S= {4,5} reach(x.S) = {1,3,6 Figure 2.3: Reachable Set Example The reachable set of node x is closely related to its adjacency set. Let S represent a subset of nodes from the graph, S C X . The reachable set of x, reach(x,S), is made up firstly of all nodes which are members of adj{x) but not members of S. As well, all nodes which are not in the set S, but reachable through its members are included. A node, y, is reachable from x if there is a sequence of edges, connecting only members of S (or a member of S to x or y), providing a continuous path from x to y. For example, if x ^ x ^ s is the only continuous path connecting nonadjacent nodes X4 and X 5 , £ 5 is a member of reach{xi) if and only if x2 and x\ both belong to S. reach(x, S) = {y G S | y ^ x and (y G adj(x) A small example in Figure 2.3 illustrates the concept of a reachable set. When it is unclear to which particular graph one of the three operators reach, adj or deg is to be applied, the operator's name is subscripted by a symbol to indicate the intended graph. For example, reacha0 {x, S) refers to re's reachable set using graph G0. The operators may also be subscripted by a symbol indicating a set of nodes that are considered eliminated. As discussed in Chapter 1, fill is introduced to L during the Cholesky factor-ization process. A more precise definition of fill can be given as follows. Let the nonzero structure of matrix A be represented by GA — (XA, EA)- If the Cholesky or y is reachable from x through S)} (2.6) 13 factorization were performed on A to produce L, the sum L + LT can be thought of as the filled matrix, F. Let the nonzero structure of F be represented by the graph Gp = (Xp,Ep). Throughout this thesis it will be assumed that no cancellation occurs during the Cholesky process. This implies that C Ep. In other words, a nonzero coefficient of A will correspond to a nonzero coefficient in the same position of F. Finally, the fill of matrix A can be defined as fill{A) = E F - E A . (2.7) A graph Gi = (Xi,Ei) is a subgraph of G2 = (X2,E2) if Xi C X2 and Ei C E2. As mentioned briefly in the introductory chapter, the triangulation of a graph is an important concept for the discussion of minimal fill orderings. A cycle in a graph is a path consisting of more than one node, which ends at the first node of the path. n-in2 ... rikrik+i where ni = n/c+i If for every such cycle in a graph G, with k > 3, there is an edge connecting two nodes n,- and n ; such that rij ^ n, +i and rij ^ then the graph is triangulated. Finally, a clique is a set of nodes in a graph which are all pair-wise adjacent. Each node in the group is adjacent to all other members. 2.3 Origin of the M D A As mentioned in Chapter 1, the MDA originated as the symmetric analog of an algorithm first proposed in 1957 by Markowitz [Mar57]. Essentially, the algo-rithm was originally intended to reduce the amount of arithmetic necessary to produce the LU factorization of a sparse, nonsingular matrix. Local minimiza-tion of arithmetic was approximated by performing row and column exchanges at each step of the Gaussian elimination, so as to minimize the product of the 14 number of off-diagonal nonzero coefficients in the pivot row and column. In ad-dition, to maintain numerical stability, the chosen pivot could not be permitted to be too small. It should be noted that the column and row exchanges may be different because in general problems there is no attempt to maintain sym-metry. When more than one pivot was acceptable with regards to stability and arithmetic minimization, ties were arbitrarily broken. By observation of the Gaussian elimination process, one can see that the product of the number of off-diagonal nonzeros in the pivot row and pivot column is equal to the maximum number of fills in L+U, which could be produced by the modifications of a single elimination step. By choosing the smallest numerical product, the Markowitz strategy approximates pivot selection using a minimum fill strategy. Of course, this local strategy will not in general produce a global minimization of fill, know to be NP-complete, but in practice the algorithm has proven successful at reducing fill and arithmetic for a wide range of sparse problems. Tinney and Walker [TW67] introduced the use of a similar algorithm in the solution of large sparse symmetric, positive definite systems. Pivoting for stabil-ity was no longer required but any row (column) exchanges had to be matched by exchanging the corresponding pair of columns (rows) to maintain symmetry. The Markowitz algorithm can be reformulated in terms of the outer product form of the Cholesky method discussed in Section 2.1. At the ith stage of the process the next column [y/a^ V?/y/d~i\T is chosen so as to have the minimum number of nonzero entries of all columns in Hf. The chosen column is exchanged with column one of Hi, along with the corresponding exchange of rows. It is no longer necessary to consider the product of row and column nonzero totals because symmetry dictates they have the same nonzero structure. Once again ties between columns with the same number of nonzeros is arbitrarily broken and the order in which the columns are chosen gives the new ordering of A. Considering this modified description of the algorithm it is easier to under-stand why it is considered to result in a local minimization of fill. At each step 15 of the process the next column is chosen from those remaining in the unfactored portion, if,-. Since the chosen column becomes a column of L without further changes, choosing the column with the minimum number of nonzeros locally min-imizes L's fill. With this very restricted outlook, however, the number of new nonzero entries the choice produces in i f , + i is not considered by the algorithm. This aspect of the process is discussed in greater depth in Chapter 3. Rose [Ros72] developed a graph theoretic description of the Tinney and Walker algorithm, stimulating much renewed interest in the scheme. A detailed discussion of this proposal will be made in Section 2.4, but the key feature of this approach is the redirection of attention from matrices to the undirected graphs representing their nonzero structure. Rose renamed the scheme the min-imum degree algorithm because when translated into graphical terms, Tinney and Walker's algorithm dictates choosing the column in the ith step whose cor-responding node has minimum degree in the graph of Hi. Once again, of course, there may be several nodes of minimum degree, and the manner in which these ties are resolved can have a significant impact on the ordering produced. Until this point descriptions of the minimum degree algorithm have been based upon the underlying Cholesky factorization. As outlined in Chapter 1, it is desirable if the ordering phase can be isolated from the remainder of the process. Thus explicitly performing the Cholesky to determine a good ordering is obviously unacceptable. Instead, the factorization process will be simulated using graphs which model the system's nonzero structure during the decompostition. The remainder of this chapter discusses three such simulations, based on slightly different graph modelling schemes, and their efficient application to the MDA. 2.4 Elimination Graph Model As discussed previously, the key to a particular version of the minimum degree algorithm is the manner in which symbolic processing on the sparsity pattern of a matrix simulates the effects of Cholesky factorization. Although all versions of the MDA discussed in this thesis perform the simulation using graphs, the 16 modelling schemes differ in their degree of sophistication. The simplest approach, the elimination graph model or explicit model, is discussed in this section. 2.4.1 Elimination Graphs This model explicitly shows the changes in the graph of the unfactored portion of the matrix A as the Cholesky factorization takes place. One so called elimination graph, G,-, is used to model the nonzero structure of each matrix Hi discussed in Section 2.1. Each elimination graph is a simple graph as described in Section 2.2 and receives its name from the process used to transform one such graph into its successor. By modelling the nonzero structure of Hi at each stage of the factorization, all information essential to the MDA is available. Referring back to the basic step of the outer product form of the Cholesky method presented in Section 2.1, the nonzero structure of Hk is simply predicted from Hk-i. (Ho — A.) Assuming that desired permutations have already been performed, Hk is produced from Hk-i by removing its first row and column. In a second step Hk is corrected by —VkVkT/dk to produce Hk. If no cancellation is assumed to occur from this correction, the nonzero structure of Hk can be predicted as follows. Hk will have an off-diagonal element at (i, j), i ^ j, iff • Hk has a nonzero element at the same position or • (^*)« and (Vjt)y are nonzero and the outer product calculation pro-duced a new nonzero entry at position (i, j). Recall that it is the nonzero structure of Hk which ultimately dictates, through Vjb+i, the nonzero structure of the factor L. Thus if the outer product correction produces a new nonzero entry in Hk, this corresponds to a new nonzero entry in L. It is important to note that the no cancellation assumption allows the prediction of the new nonzero structure without reference to the numerical values. As previously mentioned, at each stage of the factorization process, the mod-elling scheme under consideration represents the nonzero structure of each matrix 17 Hk-i by a graph Gk-i- It is desirable that the transformation of one elimination graph, into its successor, Gk, be possible without reference to the underly-ing matrix structures. This would allow the simulated Cholesky factorization to predict the nonzero structure of L, for a given ordering, using only elimination graphs. Fortunately, there exists a special correspondence between the matrix transformations outlined above and the transformation of their corresponding graphs. Let the node in Gk-i corresponding to the column chosen for in the kth step, Vk, be referred to as ck. The formation of Hk from Hk-i corresponds to the elimination of cjt from Gk-i to form Gk- This special elimination process can be described in two stages. In the first stage cjt and all eminating edges are removed from Gk-i- This corresponds to the removal of the pivot column and. row from Hk-i to form Hk- In the second stage, the transformation to Gk is completed by adding edges to the graph so that the set of nodes adja^^Ck) form a clique. This second step is necessary because the outer product correction will ensure that for each pair of nonzero values in V*, (Vjt),- and (Vk)j (i 7^ j), nonzero values in Hk appear at positions (i,j) and Since there is a strict relationship between off-diagonal entries and edges, clique formation corresponds to the outer product updating of Hk- Any new fill edges added to form the clique correspond to new fill entries in the factor L. To summarize, the edges found in Gk can be easily predicted from the structure of Gk-i as follows. Gk will have edges between nodes x and y iff • x and y were joined by an edge in Gk-\ or • x and y were both joined by an edge to the node eliminated from Gk-i The elimination mechanism described above permits successive elimination graphs to be easily produced by a simple transformation of their predecessor. Once the initial elimination graph of A, G o , has been formed, the Cholesky factorization of A can be viewed as a sequence of elimination graphs. GQ , G j , G 2 , . . . , GTV-I 18 Figure 2.4: Elimination Graph Modelling of Cholesky Factorization 19 © © > © - © 7 © L a fill edge Figure 2.5: Fill Edge Production Figure 2.4 illustrates the decomposition process by comparing the nonzero structure of each matrix Hi and its corresponding graph at each stage of the process for a small system. The nodes are eliminated in the sequence given by A's original ordering. Ultimately, the nonzero structure of the factor L is desired so that full ad-vantage can be taken of its sparsity during the Cholesky factorization of A. The graph of the filled matrix, from which L's structure is immediately appar-ent, can be found from the graphs used to model the elimination process. Let GA = (XA,EA) be the graph of A and GF = (XF,EF) the graph of the corre-sponding filled matrix resulting from A's symmetric elimination in its original ordering. Obviously XF = XA but EF consists of all edges originally in A, EA, and all fill edges introduced during the simulated decomposition. George and Liu [GL81] described EF more formally as follows. The unordered pair {i, j} G EF iff {i, j} G EA or {i, k} G EA and {j, k} G EA and k < min(i,j). In other words, two nodes will be adjacent in GF if they were adjacent in GA or in some elimination graph they were both adjacent to a node eliminated before themselves. In the graph segment of Figure 2.5, if x is eliminated before y and z, y and z will be adjacent in GF, and a fill produced. (When the meaning is clear, fill will be used to refer to a new edge or a new nonzero.) As an aside, if the structure of L were actually desired as a by-product of a simulated Cholesky factorization the previous discussion does not provide the 20 most convenient vehicle for its determination. In practice the structure of L is more directly obtained by recording the adjacency set of each node just before it is eliminated. These sets correspond to the nonzero structure of L's columns as they are finalized in an explicit factorization process. 2.4.2 The M D A and the Elimination Graph Model In all previous discussion of the elimination graph model, the order of node elim-ination was the same as the original ordering given by A. When the ordering of columns in Hi is changed during an explicit Cholesky factorization, the columns and rows must be permuted. In Section 2.1, however, it was shown that the structure of a symmetric matrix's graph does not change when the matrix un-dergoes permutations which maintain symmetry. This fact makes the modelling of the factorization process very amenable to an overlying application of the MDA. The formation of successive elimination graphs is conducted exactly as described in Section 2.4.1 irrespective of the order in which nodes are selected for elimination by the MDA. The translation of Tinney's basic form of the MDA into graphical terms re-quires very little work. At each stage of the modelled factorization, instead of choosing the column from Hi with the minimum number of nonzero entries, the node of minimum degree in G, is chosen for elimination. (Recall that degGi(x) = |ad7Gt.(x)|.) The elimination graph of A, G 0 , is labelled using the numbering induced by the original ordering of A. This labelling is maintained throughout the series of elimination graphs subsequently formed and recording the ordering in which the nodes are eliminated provides the fill reducing permu-tation suggested by the MDA. It should be stressed, once again, that if several minimum degree nodes exist in the elimination graph, the selection of the elim-ination node from this group is completely arbitrary. The basic algorithm can be described more formally by the following pseu-docode description. The notation introduced in this chapter is used and it is assumed that the initial graph, G 0 , is labelled in the standard fashion. 21 MDA: Basic Algorithm Let D( = the degree of node i S = the set of eliminated nodes adjold = a saved adjacency set 1. Form G 0 , S := 0, i := 1, N := |X| 2. For all nodes k G X Dk:= \adjGo{k)\ 3. While i < N do (a) Choose a node xt- for which Dx. = minkex-s{Dk) (ties arbitrarily broken) (b) adjold := adjG{_, (x,-) (c) If i <> JV then form the new elimination graph G,- from G,_i by eliminating node x,-(d) For each y G adjold Dy := |adjG.(y)| (e) t* := t + 1 The minimum degree algorithm using the elimination graph model is demon-strated in Figure 2.6. It should be noted that only two fills were produced in comparison to the seven fills produced when the factorization was simulated using the original ordering as shown in Figure 2.4. 2.4.3 Implementation Limitations of the Elimination Graph Model Although the elimination graph model allows for a very intuitive description of the MDA, there are very definite disadvantages to its use for an MDA imple-mentation. There are several different schemes which could be used to represent 22 7^^ 0 2 Figure 2.6: Elimination Graph Model MDA Example 23 elimination graphs in an actual implementation. Probably the most common idea is to maintain a list of the adjacency set members for each node in the current elimination graph. With the elimination graph model, each time a node, x, is eliminated the adjacency structure of all nodes in adj(x) must be updated. Such detailed and potentially time consuming updating is not required for the reachable set model to be subsequently discussed. The high degree of updating required, is also reflected in the need for a very flexible data structure. Flexi-bility is required because the maximum size attained by a node's adjacency set (or list) during the factorization process can not be predicted. This of course precludes the easy use of a single large integer array to store all adjacency lists of a single elimination graph. Pointers could be used to provide the necessary dynamic nature to the data structure; however, this can result in costly increases in storage and execution time overhead. Finally, the main difficulty with the elimination graph, or explicit, model is that the size of the overall adjacency list data structure is also unpredictable. If, for instance, the star graph of N nodes was a subgraph of the overall elimination graph and node x was eliminated, the size of the adjacency list for this portion would increase from 2N to N(N — 1). Thus there is no way to know whether sufficient storage is available before attempting to apply a MDA implementation based upon this model. Although the above limitations reduce the usefulness of elimination graphs for actual implementations, the model remains very helpful during the design and analysis of the minimum degree algorithm and its modifications. This is because of the equivalence of elimination graphs to more complex models of the Cholesky decomposition. In fact the reachable sets and quotient graph models, to be subsequently discussed, will be shown to merely simulate the basic elimination graph model at different levels of abstraction. 24 |adj(x)| = N Figure 2.7: The Star Graph 2.5 Reachable Set Model The model discussed in this section attempts to remove many of the implemen-tation problems associated with the elimination graph model, by the implicit formation of elimination graphs at each step of the modelled factorization. The reachable set model leaves the original graph of A intact. No explicit manipula-tion and updating of the adjacency lists is required. 2.5.1 I m p l i c i t F o r m a t i o n o f E l i m i n a t i o n G r a p h s In the reachable set model, when a node is chosen for elimination it is not removed from the actual representation of the elimination graph. Instead the node is marked as eliminated and included in the set of all eliminated nodes, S. Reachable sets, as discussed in Section 2.2, are then used in place of adjacency sets to provide this model with power equivalent to the elimination graph model. In almost all respects the reach(x, S) operator replaces the adj(x) operator of the elimination graph model. Consideration of the following example, in which the factorization of a matrix, A, is being simulated by both the elimination graph and reachable set models, will help clarify this situation. Figure 2.8 shows a portion of A's original graph, G 4 , after xi, x2, x 3 and x 4 have been eliminated. There exists a path p = (x6, x2,xt, XT) in G0, meaning that x6 G reacha0(x7, S) and X 7 G reachc0(x6, S). The relationship between x7 and x6 can be shown 25 Reachable Sets Model Elimination Graph Model Figure 2.8: Equivalence of Reachable Sets and Adjacency Sets to correspond to being adjacent nodes in the appropriate elimination graph. Starting with the original graph, as nodes are eliminated, all members of their adjacency set become pairwise adjacent. Thus as the nodes X 3 , X 2 , and x± are eliminated from path p, in any order, z 6 and x7 must become adjacent nodes in the corresponding elimination graph. In fact, at any time during the simulation of a particular matrix's factorization using the reachable set model, all members of reach,G0{x, Si) must belong to adja{ in the corresponding elimination graph. In a similar manner it can be demonstrated that the converse statement is also true. If two nodes, x and y, are adjacent in an elimination graph, at the corresponding point of the implicit model's simulation, x and y must also be adjacent or connected by a path of strictly eliminated nodes. The previous discussion can be formalized into the following equivalence re-lation, which is very important for the successful modelling of Cholesky factor-ization using reachable sets. Let refer to the set of i eliminated nodes at the ith stage of the process. For all y, y G X - S{ adJG{{y) = reachGo(y,Si) 26 Figures 2.6 and 2.9 can be compared to provide further confirmation of this result. Using the definition of reachable sets and their special equivalence to adja-cency sets, the characterization of the filled matrix, F, found in Section 2.4.1, can be reformulated. Ep = {{xi,Xj}\xj G reachGo{xi,Si-i)} i = 1,2,... ,N - 1 (2.8) where is the set of nodes {xi,x2,.. .,x,_i} eliminated before x,- and x,- G {1,2,..., N} for i = 1 to N. Less formally, this states that the set of edges for the filled graph will consist of all pairs {x,-, x3} such that Xj is uneliminated and can be reached from x,, just before x,- is eliminated. It should be noted, however, that fills now appear as new members in a node's reachable set rather than a new edge in an adjacency structure. A formal proof of the above formulation is presented by George and Liu [GL81]. 2.5 .2 T h e M D A a n d t h e R e a c h a b l e S e t M o d e l The MDA relies upon the calculation and comparison of the degrees of unelim-inated nodes, during the factorization process. In the elimination graph model, the degree of a node was taken as the size of its adjacency set. However, it has been shown that adja.(x) = reacha0{x, 5,-) and it follows that the degree of an eliminated node in the reachable set model is the number of nodes in its reachable set. degSi {x) = \reachGo (x, S{) | (2.9) With this established the MDA can be easily reformulated in terms of reachable sets. It is assumed that G = (X,E) is the graph of the coefficient matrix of the symmetric system under consideration. 27 MDA: Reachable Set Model Let Dt = the degree of node i S — the set of eliminated nodes 1. Form G 0 , S := 0 2. For all nodes ke X Dk := \adjGo{k)\ 3. While S ^X do (a) Choose a node t £ X9, for which Dt — min t6x«_c(s)(-D«) (ties arbitrarily broken) (b) For each y 6 reacho0 {t, S) Dv := \reachGo(y,Sl>{t})\ (c) 5 : = 5 U {t} Figure 2.9 demonstrates the MDA using the reachable set model. Figure 2.6 demonstrated the MDA using elimination graphs on the same example, allowing for further comparison of the two models. 2 .5 .3 I m p l e m e n t a t i o n L i m i t a t i o n s o f t h e R e a c h a b l e S e t s M o d e l The main advantage of the implicit model over the explicit model discussed in Section 2.4, is the alleviation of the worst data structure difficulties encountered by elimination graph representations. The reachable set model works on the original graph of A throughout the elimination process. If adjacency lists are to be used to represent the graph, their size does not change during the simulation because the formation of successive elimination graphs is done implicitly. As a 28 min deg =1 eliminate node 3 min deg =2 eliminate node 4 min deg =2 eliminate node 6 min deg =2 eliminate node 7 min deg =2 eliminate node 1 and so on ... S - U reach(LS) = {3,4 ,5,7} reach(2,S) = {4,6} reach(3,S) = {1} reach(4,S) = {1.2} reach(5,S) = {1,6,7} reach(6,S) = {2,5} reach(7,S) = {1.5} S = {3} reach(1,S) = {4,5,7} reach(2,S) = {4,6} reach(4,S) = {1.2} reach(5,S) = {1,6,7} reach(6,S) = {2,5} reach(7,S) = {1,5} S = {3,4} reach(1,S) reach (2,S) reach(5,S) reach(6,S) reach(7,S) {2,5,7} {1,6} {1,6,7} {2,5} {1,5} S = {3,4,6} reach(1,S) = {2,5,7} reach(2,S) = {1,5} reach(5,S) = {1,2,7} reach(7,S) = {1,5} S = {3,4,6,7} reach(1,S) = {2,5} reach(2,S) = {1,5} reach(5,S) = {1,2} Figure 2.9: Reachable Set Model MDA Example 2 9 Figure 2.10: Reachable Set Inefnciences result, a single one dimensional array could be used to store the graph's adja-cency lists in a sequential fashion. More importantly, however, the total storage requirements for the process are known a priori and the original data structure used to represent a graph of the system remains intact. Unfortunately, the resolution of the representation and storage problems has introduced another difficult situation. Experience shows that the major propor-tion of effort required by an MDA implementation will be spent on reachable set determination. If reachable sets are inefficiently formed, the performance of the entire program will be seriously affected. For example, consider the sub-graph displayed in Figure 2.10. If node 9 is eliminated, 7 different paths and 12 nodes must be traversed to determine that the three nodes x, y and z are in reach(10, S U {9}). This represents a significant amount of work to recalculate the degree of one node. Figure 2.10 is only a very small scale example. When large graphs are considered, with hundreds or thousands of nodes, search paths can become unacceptably long. George and Liu [GL80] have shown that, al-though implementations based upon this implicit model have desirable storage characteristics, the inefficiencies of reachable set determination are unacceptable for a large selection of problems. They showed that more efficient implemen-tations could be based on the model discussed in the next section, which uses reachable sets but attempts to resolve the long search path problems. It should be emphasized, that like the explicit model, the reachable sets model 30 is a very useful tool for the study of the MDA and its extensions. It also helps in the understanding of more sophisticated models and their origins despite being unsuitable for implementations itself. 2.6 The Quotient Graph Model This section discusses the use of quotient graphs in a versatile model of the Cholesky factorization process. The quotient graph model, first proposed by George and Liu [GL78b,GL80], was chosen for implementing the MDA in the successful Waterloo Sparse Linear Equations Package or SPARSPAK [GL78c]. A great deal of work has been conducted by other researchers on this and very simi-lar representations. Essentially equivalent schemes were developed for modelling the factorization of systems arising from finite element problems but in these pa-pers reference is made to generalized elements instead of quotient graphs. (See [DR83] for example.) For the remainder of this thesis, essentially all discussion of the MDA and its extensions, established and proposed, will be based upon a quotient graph modelling of the factorization process. The application of quotient graphs results in a modelling of Cholesky fac-torization lying somewhere between the explicit and implicit models previously discussed. In a similar fashion to the explicit model, an initial graph represent-ing the nonzero structure of A is formed and is explicitly modified only after the elimination of some nodes during the course of the simulated factorization process. The result is a sequence of special graphs, quotient graphs, representing the nonzero structure of Hi at each stage of the symmetric elimination process. However, unlike the explicit model, the formation of successive quotient graphs does not need more storage than that originally allocated for the first quotient graph, GQ. It will be shown that the storage requirement of an implementation based on the quotient graph model is known a priori. Like the implicit model, the desirable storage characteristics of the quotient graph model are attained using the reachable set operator during the factorization simulation. However, to im-prove the efficiency of reachable set formation, the model guarantees that paths 31 searched will be a maximum of two nodes in length. As will be shown in sub-sequent discussion, the quotient graph model is a clever mixture of the explicit and implicit models. This combination of features allows it to capture the good characteristics of both schemes, without acquiring their extreme implementation difficulties. 2.6.1 E l i m i n a t e d S u p e r n o d e s In the reachable set model, when a node was chosen for elimination, it was nagged as an eliminated (or a deleted) node, added to the set of eliminated nodes, S, but remained a part of the graph. These steps are also followed when the node is a member of a quotient graph, unless it is adjacent to other previously eliminated nodes. In this case, a final step of the elimination process collapses the set of connected, eliminated nodes into an eliminated supernode. When the context is clear, eliminated supernodes are simply referred to as supernodes. Figure 2.11 illustrates the graphs of the reachable set and quotient graph models at the same stage of the factorization process for a simple problem. The formation of two supernodes is demonstrated but they would, of course, be formed separately during a quotient graph modelling of the factorization. There are two connected sets of eliminated nodes, {2,3,4} and {6,7}, shown in the reachable set model's graph. The corresponding quotient graph is formed by collapsing each set of nodes into a single, representative supernode. The adjacency set of each newly formed supernode consists of all uneliminated nodes adjacent to one or more members of the collapsed set of nodes, just prior to the supernode's formation. Once a supernode has been formed in a quotient graph, it can be treated as a single eliminated node despite its origins. In particular, further eliminations can result in supernodes being absorbed into new, larger supernodes. For example, the elimination of node 5 from the quotient graph of Figure 2.11 would result in the formation of a single supernode originating from supernodes {2,3,4} and {6,7} and node 5. Using quotient graphs to simulate the factorization process results in a model 32 S = {2,3,4,6,7} Reachable Sets o 0 Quotient Graph Model Figure 2.11: Eliminated Supernode Formation very similar to the implicit model previously discussed. The key observation is that the formation of supernodes does not change the reachable sets of une-liminated nodes. For example, the reachable sets of nodes 1,5,8,9 and 10 are unaffected by the switch to a quotient graph, as long as the reach operator is considered to treat an eliminated supernode in the same manner as any other eliminated node. In addition, the length of a path, through 5, between any two uneliminated nodes is never greater than two edges. If the supernodes of successive quotient graphs are continually updated during the elimination pro-cess, reachable sets can be economically determined irrespective of the size of the eliminated set, S. 2.6.2 Formalisms of Quotient Graphs The ideas presented in the previous section and introduction can be formalized in graph theoretic terms. Let G = (X, E) be a graph of a sparse system where: • X = the set of nodes in the system's original graph 33 • E = the set of edges of the system's original graph • S = the set of eliminated nodes at a particular stage of the elimination process • G(S) = a subgraph of G composed of nodes in the set S. Let P be a partitioning of the graph G on the set X. The partitioning consists of set Zi (i = 1,..., £), which are denned such that: t IJ Zk = X and Vi Vj, with i ^ j, Z{ n Zj = 0. k=l P = {Zi, Z2, • • •, Zi} By definition, the quotient graph of G with respect to P is G/P = {P,E") = {Xq,E") where Eq = {{Z{, Zj} I adj(Zi) n Z,- / 0}. In other words edges between the nodes, Zi, of G/P exist if at least one member from the set Z,- is adjacent to a member of Zj in the original graph. It should be emphasized that each node of a quotient graph represents one or more nodes of the original graph of the system. One particular partitioning of G leads to the specific sub-class of general quo-tient graphs used to model the Cholesky factorization process. The component partitioning can be defined as C(S) = {Y C S I G(Y) is a connected subgraph }. (2.10) In a connected subgraph there exists a path between any pair of nodes in the subgraph. Each member of the component partitioning is a connected subset 34 of e l i m i n a t e d nodes coalesced to f o r m a supernode as de s c r i bed i n the prev ious subsect ion . It s h o u l d be no ted t ha t i n p rac t i ce , fo r each supe rnode o r i g i n a t i n g f r o m a g roup of nodes, one node is chosen as a rep re senta t i ve a n d a l l o ther membe r s i gno red f o r the r e m a i n d e r of the process. I n t u r n , one c a n def ine the partitioning of X induced by S as Q{S) = {{y} \ y e x - s } u C(S). (2.11) T h e quo t i en t g r a p h of G w i t h respect to Q(S) p roduces quo t i en t graphs as d i scussed i n the p rev i ou s subsect ion. Q(S) c a n be t h o u g h t of as s i m p l y the set of nodes, X9, of the quo t i en t g r aph . F o r examp le , F i g u r e 2.11 shows the quo t i en t g r a p h of the s y s tem ' s o r i g i n a l g r aph w i t h respect t o t he f o l l o w i n g p a r t i t i o n . Q(S) = { { 1 } , {5}, {8}, {9}, {10}, {2,3,4}, { 6 , 7 } } 2.6.3 Successive Quotient Graph Formation A s p rev i ou s l y m e n t i o n e d , t he Cho l e s k y f a c t o r i z a t i o n process c a n b e m o d e l l e d by a sequence of quo t i en t g raphs. L e t 5,- be the set of i nodes e l i m i n a t e d b y the ith stage of the m o d e l l e d f a c t o r i z a t i on . T h e g r a p h c o r r e s pond i n g t o each stage of t he process, G,-, is t he quot i en t g r aph w i t h respect t o the p a r t i o n i n g of X i n d u c e d b y Si. T h u s the effects of a p a r t i c u l a r node e l i m i n a t i o n sequence o n the f a c t o r i z a t i o n c a n be reco rded b y the fo l l ow ing sequence of graphs. G i r*i c i r*i 0, U r l 5 « j r 2 , . . . , IJff-l where G] = G/Q{ST) GQ = G is t he g r a p h of the s y s t e m w i t h no nodes e l i m i n a t e d a n d the super sc r ip t q is used t o emphas i ze t h a t th i s is a sequence of quo t i en t g raphs. T h e e xamp l e i l l u s t r a t e d i n F i g u r e 2.11 suggested t h a t t he reachab le sets of u n e l i m i n a t e d nodes were no t affected by supernode f o r m a t i o n . It has, how-ever, been f o r m a l l y s h o w n [GL81] t h a t the reachab le sets o f u n e l i m i n a t e d nodes i n c o r r e spond i ng g raphs f r o m the reachab le set a n d quo t i en t g r a p h mode l s are equ iva lent . reachG{y,Si) = reachG^(y,C(Si)) 35 This result allows the modelling of the factorization process by the quotient graph model to be considered as essentially equivalent to the reachable set and elimination graph models. Figure 2.12 contrasts the elimination graph and quo-tient graph models on a very small problem. The elimination proceeds according to the original labelling. A quick check shows that the adjacency set of each node in an elimination graph is equal to the reachable set of the same node in the cor-responding quotient graph. Fill edges and eliminated nodes are shown bolded. 2.6.4 The M D A and the Quotient Graph Model Basing the MDA on a quotient graph model results in very few visible changes to the pseudocode form of the algorithm presented for the implicit model of Section 2.5. The main change is that successive quotient graphs are explicitly formed, while in the reachable set model all changes in the graph were implicit. As well, the reachable sets are formed using the current quotient graph rather than the original graph and the component partitioning, C(S), is used in place of S. MDA: Quotient Graph Model Let Di — the degree of node t S = the set of eliminated nodes G (— (X,E)) — the graph of the system's original coefficient matrix Gq = the current quotient graph save = variable for saving a reachable set 1. 5 := 0, Gq := G 2. For all nodes ke X D k := \adjG{k)\ 3. While S ^ X do 36 (a) Choose a node t , t £ S , for which Dt = min, ex-s(A) (ties arbitrarily broken) (b) save := reachaq{t,C{S)) (c) S :=SU {t} (d) Gq := G/Q(S) (e) For each y G save Dy := |reac/iG«(y>C(S'))l Step 3d of the algorithm implies that each quotient graph used in the mod-elling process is formed from the original graph, G, and the latest partitioning induced by S. This could require the storage of two complete graphs. In an actual implementation, however, each quotient graph can be simply transformed into its predecessor, as briefly discussed in Section 2.6.1 and illustrated in Fig-ure 2.12. A formal algorithm to effect these transformations will be presented in Chapter 5. Figure 2.13 illustrates the application of the MDA using quotient graphs on the same small problem used to demonstrate the elimination graph and reach-able set versions of the MDA. However, the elimination of the first node is not illustrated. When a supernode is formed it will be represented for the remainder of the process by one member chosen from the group of coalesced nodes. 2.6.5 Storage Requirements of the Quotient Graph Model The advantages of the quotient graph model over the elimination graph and reachable set model were briefly outlined in the introduction to Section 2.6. The relative efficiencies of the models depend to a certain extent upon the data structure scheme used to represent graphs. As mentioned previously, one of the most common methods is to store adjacency lists for each of the graph's nodes. Although other structures are available which are more efficient for certain classes of problems, using adjacency lists provides a general purpose storage 38 Figure 2.13: Quotient Graph Model MDA Example 39 scheme, which will be shown to be easily applied to quotient graphs. A detailed description of the scheme will be presented in Chapter 5 Reachable set formation in quotient graphs is certainly more efficient than in the reachable set model. However, having to form a reachable set is certainly more difficult than a simple scanning of an adjacency list required in the elimi-nation graph model. The distinct advantage of quotient graphs, over elimination graphs, is their storage requirements. When modelling the factorization process by a sequence of quotient graphs, the original block of memory allocated for the adjacency structure of G% can be used, never requiring additional storage. In the following discussion reference will be made to the sequence of quotients graphs G] = (Xq,Eq) i = 0,1,..., N — 1, where the system being modelled has an NxN coefficient matrix. It has been shown [GL80] that Each edge of a graph corresponds to two nodes stored in different adjacency lists. Thus equation 2.13 shows that the overall size of the adjacency structure representing any quotient graph formed during the course of the factorization cannot exceeed the size of the original structure. Thus at least theoretically, additional storage should not be required, but whether or not the modelling process and the MDA can be adapted to take advantage of this observation needs to be investigated. Fortunately the quotient graph modelling scheme can be formulated so as to take full advantage of the original storage. By observing the formation of succes-sive quotient graphs during a factorization sequence, it can be easily confirmed that the size of an uneliminated node's adjacency set can never increase while it remains uneliminated. | £ ' + i l < | £ , - | f o r i - = 0 , l , . . . , A T - 2 (2.12) and therefore (2.13) 40 Figure 2.14: Storage Shortfall of a Supernode Representative For all x € X - 5, \adjG<,{x)\ < \adjG«(x)\ (2.14) In fact in a quotient graph representation, it is not even possible for an unelim-inated node, x, to become adjacent to any entirely new nodes during the course of the elimination. Unlike elimination graphs, when nodes are eliminated at least one eliminated node is left in the graph to represent each coalesced supernode. Thus, although new nodes may be added to z's reachable set, a supernode is acting as a bridge and the two uneliminated nodes are not adjacent. As nodes adjacent to x are eliminated and form supernodes with other eliminated nodes, no additional supernodes can become adjacent to x since each is referenced in the structure by a single representative. In fact the formation of supernodes often reduces the size of adj(x), as pairs of adjacent supernodes in adj(x) are collapsed to form a single node. The representation of an eliminated supernode's adjcency list, however, will often require more storage than can be provided by the representative node of the coalesced set alone. As discussed in Section 2.6.1, when a group of eliminated nodes is coalesced, the adjacency set of the new supernode consists of all unelimi-nated nodes originally adjacent to one of the coalesced nodes. In Figure 2.14, if x is eliminated and designated as the representative of the new supernode {x, y, z}, its original adjacency list of two members does not provide enough room for an adjacency set of five members. 41 Original Storage 3 Storage for adj(supernode) 22 ^ 2 2 X --> end of list marker --> actual adj list supernode = {3,7,22} Figure 2.15: Eliminated Supernode Aadjacency Set Storage Fortunately, it has been shown [GL80] that if Z is a connected set of nodes in G\ to be combined into a single supernode in graph Gf X># G ? (x)| > \adjGr(Z)\ + 2{\Z\-l). (2.15) xez This result demonstrates that there is sufficient space for the adjacency list of the new supernode in the combined storage originally allocated to the set of coalesced nodes. In addition the 2(\Z\ — 1) term allows for some sort of pointer scheme to link up the blocks of storage into a unified adjacency list. Figure 16 demonstrates how this observation could be implemented for a supernode's adjacency list. When more than one block is needed to represent the adjacency list, the last entry is used to store a pointer to the next available block. A flag is used to terminate the list. The complete implementation details of such a scheme are given in Chapter 5. 2.7 Mass Elimination of Indistinguishable Nodes The efficiency of the MDA can be further enhanced by the introduction of une-liminated supernodes as described by George and Liu [GL80]. It was found that during the factorization process, connected sets of uneliminated nodes could be 42 identified as indistinguishable and be collapsed to form a single supernode. For the remainder of the process each supernode can be treated as a single unit, rep-resented by one of the coalesced set members. When the representative node is selected for elimination, it was found that the entire set of nodes could be elim-inated one after the other, a mass elimination, and still be in accordance with the MDA. This section will discuss the definition and identification of indistin-guishable nodes and the enhancement of the MDA by uneliminated supernodes. The symmetric code of the Yale Sparse Matrix Package, YSMP, also uses uneliminated supernodes but refers to them as prototype nodes. In a similar scheme developed as part of a solution method for systems arising from finite element problems, Duff and Reid [DR83] use supervariables instead of supern-odes. In their scheme identical rows in the coefficient matrix are identified and the variables of the corresponding grouped rows into supervariables. 2.7.1 Indistinguishable Nodes Consider the quotient graph Gq = (Xq,Eq) one of a sequence of quotient graphs modelling a factorization process and let S be the set of eliminated nodes. Two nodes, x,y G Xq — S are said to be indistinguishable with respect to elimination, if the following expression is true. reacfiGi{x,S) U {x} = reacho<i(y,S) U {y} (2.16) For example, in Figure 2.16 the set of nodes {x ,y ,z} can be classified as indis-tinguishable. The set reachai(x, S) U {x} is often called the neighborhood set of x. Although the concentration of this discussion is on quotient graphs, the same definition can be used to classify indistinguishable nodes in the reachable sets model if Gq is replaced by the graph of the original coefficient matrix. As well, the normal conversions from reachable sets to adjacency sets can be made to provide an equivalent definition in the elimination graph model. 43 S = {4,5} reach(x.S) U {x} = reach(y.S) U {y} = reach(z,S) U {z} Figure 2.16: An Example of Indistinguishable Nodes 2.7.2 I n t e g r a t i o n o f U n e l i m i n a t e d S u p e r n o d e s i n t o t h e M D A Two important theorems have been developed by George and. Liu [GL80] which allow the successful integration of the concept of indistinguishable nodes, in the form of supernodes, into the MDA. The important charcteristics of these two theorems are outlined below. For a more rigorous discussion and presentation consult [GL80]. Firstly, it was shown that once a group of nodes are identified as indistinguish-able, they will remain so throughout the elimination process until eliminated themselves. Secondly, it was shown that if one node of an indistinguishable set was chosen for elimination by the MDA, that all remaining members would be of minimum degree after the elimination of the first node. Thus once one node of an indistinguishable set is targetted for elimination, the entire set can be eliminated as a unit and still obey the constraints of the MDA. These two results, along with the definition of indistinguishable nodes itself, suggest that an identified indistinguishable set of nodes could be treated as a single unit, or supernode, throughout the course of the elimination process. Each node has the same neighborhood set and therefore the same degree. Once a group of indistinguishable nodes is identified, the unit will never have to be broken up and can even be treated as a single unit during the elimnation phase of the MDA. What needs to be investigated, however, is how other uneliminated nodes are 44 affected with respect to the MDA, by the formation of uneliminated supernodes. Theorem 1 Consider a set I C X of indistinguishable nodes in the graph G = (X, E) and the set of eliminated nodes S. If x S and x £ I but there exist a y G I such that y G reacha(x,S), then I C reacha[x, S). Proof: If y G reacha{x, S) then by the symmetry of reachable sets in quotient graphs x G reacha(y, S). But since / is an indistinguishable set, the definition of such sets requires that for all z G / , x G reacha[z, S). Thus once again by the symmetry of quotient graphs, all members z of I must be in reacha(x,S) and therefore J C reachc{x, S). • Theorem 1 shows that if a node, x, can reach one node in an indistinguishable set, all members of the set must be in x's reachable set. As a result, simple modifications of the MDA allow the formation of supernodes without affecting the remainder of the uneliminated nodes. All that is required is that when the degree of a node x is determined, the number of nodes making up each supernode in x's reachable set is taken into consideration. In this fashion the formation of uneliminated supernodes does not affect the degree of other uneliminated nodes. Together, the observations discussed above permit the formation of supern-odes representing indistinguishable sets without dramatically affecting the mod-elling process or the MDA itself. When an indistinguishable set is identified in a quotient graph, one member of the set is chosen as a representative. All other members of the indistinguishable set, and their associated edges, are removed from the quotient graph. For the remainder of the elimination process the sup-ernode is essentially treated as a single node, with one degree and one reachable set. Unlike their eliminated cousins, however, the membership of uneliminated supernodes is recorded so that the correct degree of other nodes can be calcu-lated and so that all nodes represented by the supernode can be identified when selected for elimination. If the original MDA is be to followed closely, the degree of a supernode representative must include the other members of the indistin-guishable set. In Section 2.8 one further enhancement of the MDA is discussed which takes a different approach to degree determination. A formal algorithm 45 x represents the supernode {x,y,z} Figure 2.17: Formation of an Uneliminated Supernode describing the necessary quotient graph transformations for uneliminated super-node creation will be presented in Chapter 5. Taking x as the representative of the new supernode, Figure 2.16 can be restructured to illustrate the two types of supernodes, eliminated and unelim-inated, now considered by the algorithm. The formation of the new graph in Figure 2.17 brings up one potentially unsettling observation. In the original graph of Figure 2.16, nodes y and z were not adjacent to eliminated nodes 5 and 4. In Figure 2.17, however, these nodes have been absorbed into the unelimi-nated supernode, which is adjacent to both 5 and 4. It is important to remember that the MDA and the simulated factorization only need the reachable sets and degrees of uneliminated nodes correctly represented. If the reach operator is considered to take into account all members of uneliminated supernodes, com-parison of Figures 2.16 and 2.17 show that the reachable sets and degrees of uneliminated nodes were unaffected by the supernode formation. 2.7.3 Identification of Indistinguishable Sets As shown in the previous section, the introduction of supernodes into the MDA is fairly straight forward. What is not clear at this point is how to find such indistinguishable sets. Obviously, some identification schemes are totally unrea-46 sonable. For example, if the definition of indistinguishable nodes was used in a search of all uneliminated nodes after each elimination step, the costs would be unacceptable. Fortunately, George and Liu [GL81] have presented an alternative approach which can be used to identify indistinguishable sets after each elimi-nation step and associated quotient graph transformation. Although the scheme does not identify all sets, in practice it has been shown to find a reasonable proportion. Let Qi and Q 2 be two eliminated supernodes, and y an uneliminated node in the quotient graph Gq = (Xq,Eq). In addition let S be the set of elimi-nated nodes. (An eliminated supernode may only consist of a single node.) The following observation was made by George and Liu [GL81]. If y E adjGi{Qi) H adjGq(Q2) and adjGq{y) C adjGq(Qi) U adjGq(Q2) U { Q u Q2} then reachG1(y, S) U {y} = adjGq(Q1) U adjGq{Q2). This observation provides a realistic vehicle for the identification of indistin-guishable nodes within adjGq(Qi) D adjGq(Qi). Q\ can be thought of as a newly eliminated node, which may have been coalesced with other nodes to form an eliminated supernode, and Q 2 a previously formed eliminated supernode, for which adj(Qi) C)adjGq(Q2) ^ 0. All nodes y of this intersection, whose adjacency set is a subset of adjGq(Qx) U adjGq(Qi) U {Qi,Q2}, will be classified as indis-tinguishable. The membership of the indistinguishable set, J, can be described more formally as follows. / = {y € adjGq(Qi) n adjGq(Q2) \ adjGq{y) C adjGq(Ql) Ua(frc«(<?2)u{Qi,Q 2}} (2.17) A search for indistinguishable nodes can be conducted after each elimination step using the newest eliminated supernode Qi and all appropriate, previously eliminated supernodes Q 2 . Further details of such a search are given in Chapter 5. 47 2.7.4 The M D A With Uneliminated Supernodes The incorporation of uneliminated supernodes into the MDA requires several changes and additions to the pseudocode algorithm presented in Section 2.6.4. The two new functions, expand and size, are included in the pseudocode al-gorithm. The function expand takes as an argument any uneliminated node in the current quotient graph and returns a set consisting of the argument and all other indistinguishable members of its coalesced set if it is a supernode. The function size is used to count the number of nodes in a set. All uneliminated supernodes contribute the number of nodes coalesced to form the supernode. As stated previously, when a supernode is formed, all members and associated edges are removed from the quotient graph for the remainder of the process. Under the current definition of the reach operator, it only looks at the current quotient graph and does not attempt to expand uneliminated supernodes. These two func-tions are necessary to properly calculate the degrees, so as to be in agreement with the original intent of the MDA. Thus in step 3f (see following algorithm), if A; is a supernode, then the other indistinguishable members it represents and all nodes implicitly represented by supernodes in its reachable set must be included in the degree value calculated. A comment is also necessary to clarrify the formation of successive quotient graphs, Gq. As mentioned in Section 2.6.4, each quotient graph is not formed directly from G but from its predecessor. As well, uneliminated supernodes are carried over from the predecessor and do not have to be reformed after each elim-ination step. Thus step 3d should only be considered as a formalism, used to simplify the algorithm. With the introduction of uneliminated supernodes there are very few changes necessary for the transformation process. Instead of elimi-nating a single node, the mass elimination of a supernode must be considered. Since only the representative is actually present in the quotient graph, the new graph's formation proceeds as previously described. The remaining members of the supernode are virtually eliminated by including them in S and appending them in an arbitrary order to the new MDA ordering being recorded. 48 MDA: Quotient Graph Model With Uneliminated Supernodes D{ = the degree of node i S = the set of eliminated nodes G (= (X, E)) = the graph of the system's original coefficient matrix Gq (= {X9, E9)) = the current quotient graph save = variable for saving a reachable set expand(x) = {x} U {all other members of the supernode represented by x} size(W) = the number of members in set W, with each supernode x G W counting \expand(x)\ 1. S := 0, G9 := G 2. For all nodes k G X Dk \adjG(k)\ 3. While S # X do (a) Choose a node t G X9, for which Dt = m i n t 6 x « - c(5 ) ( A ) (ties arbitrarily broken) (b) save := reach,G<i{t,C(S)) (c) S := S U expand(t) (The nodes of a mass elimination are recorded in an arbitrary order.) (d) G9 := G/Q(S) (e) Identify all sets of indistinguishable nodes amongst the members of save and form a supernode for each set. Update G9 and X9 appro-priately. (f) For each y G save Dy := size(reachGq(y,C(S))) + \expand(y) \ — 1 Finally it should be noted that step 3e of the algorithm precedes step 3f so that effort is not spent updating the degrees of nodes which are going to be 4 9 Figure 2.18: Effects of Uneliminated Supernode Formation coalesced into a supernode in the next step. It should be reemphasized that only one degree update is ever required for a supernode after an elimination step. In the example used to illustrate all previous versions of the algorithm (see Figure 2.13), the introduction of indistinguishable nodes affects the sequence of quotient graphs formed once S = {3,4,6}. At this point nodes 1 and 5 would be identified as indistinguishable, using the scheme discussed in Section 2.7.3, and coalesced to form a supernode as shown in Figure 2.18. For the remainder of the process, if the same elimination order is assumed, no additional indistinguishable nodes would be identified. Eventually the super-node {1,5} would have been eliminated in a single mass elimination step, saving the simulated process one cycle overall. However, since ties between nodes of minimum degree are arbitrarily broken, node 2 could have been chosen, instead of node 7, as the fourth node to be eliminated. In this case the supernode {1,5} could have been coalesced with node 7, after the elimination of node 2, to form a single supernode representing the set {1,5,7}. This supernode formation would have saved two elimination steps. 50 2.7.5 Supernode Efficiency Considerations Once a group of indistinguishable nodes has been identified and coalesced into a single supernode, several aspects of the MDA benefit from their formation. While uneliminated supernodes are present in the quotient graph, in general fewer of the costly degree updates are required. Only the degree of a supernode represen-tative ever needs recalculation, rather than each node of the indistinguishable set, as previously required. In addition, when a supernode is eliminated in a mass elimination step, only one set of degree updates and one quotient graph transformation is required for the entire group of indistinguishable nodes. This mass elimination also reduces the number of searches for nodes of minimum de-gree subsequently required. Finally, the formation of uneliminated supernodes reduces the number of nodes in the quotient graph, helping the determination of reachable sets by reducing the number and complexity of search paths. At this stage it is not entirely clear that the cost of identifying indistinguish-able sets will not outweigh the benefits discussed above. Adding to the difficulty of resolving this question is the fact that the ability of the MDA to identify indis-tinguishable nodes may be very problem dependent. The identification scheme outlined in Section 2.7.3 does not guarantee to find all indistinguishable sets. Some problems may simply not have indistinguishable nodes and in other prob-lems they may not be identified. In either case, however, the costs of searching for the sets would have been committed before applying of the ordering scheme. Fortunately, George and Liu [GL80,GL87] have shown that the overall efficiency of the MDA is significantly increased for a wide variety of large sparse systems, as long as the identification of indistinguishable sets is carefully implemented. 2.8 External Degree of Uneliminated Supernodes In the basic form of the MDA, the degree of a node is taken as the number of adjacent nodes in the current elimination graph. To conform with this original 51 restriction, the degree calculation of an uneliminated supernode, as discussed in Section 2.7.4, took into account the number of indistinguishable nodes associated with the representative. The choice of a minimum degree node by the MDA, can alternatively be thought of as choosing the node whose elimination will result in the formation of the smallest clique. When a supernode is deleted by a single mass elimination step, the size of the resulting clique is independent of the number of indistinguishable nodes actually represented by the supernode. This motivates the use of external degrees, as introduced, by Liu [Liu85], for uneliminated supernodes, rather than the true degree of the original MDA. A slightly different motivational approach, discussed in Chapter 3, is used here to independently arrive at the same external degree enhancement of the MDA. The external degree of an uneliminated supernode in a quotient graph is the number of uneliminated nodes in its reachable set and does not include those nodes which have been identified as indistinguishable to itself. In other words, the external degree is the degree of the supernode in its collapsed form. For example, the external degree of node x in Figure 2.17 is 3 while the true degree of each node it represents is actually 5. The external degree calculation does, however, still take into account the size of other uneliminated supernodes. The external degree enhancement of MDA is the first advancement discussed which actually changes the node selection process of the MDA, rather than just providing a more efficient vehicle for the simulation process. In practical terms its implementation does not significantly affect the execution costs of the MDA. The pseudocode algorithm presented in Section 2.7.4, need only be modified by removing the \expand(k)\ — 1 term from step 3f. But more importantly, Liu has shown [Liu85j that the introduction of external degrees improves the quality of the orderings produced by the MDA. Reductions of 3 to 7% in the number of nonzeros in the factored matrix were observed on k-by-k grid problems using a nine point operator, with k varying from 30 to 70. Liu also reports that similar reductions were observed on less regular problems. 52 2.9 Further Extensions This section briefly oulines two additional enhancements of the MDA. Incom-plete degree updating is a technique used to delay degree updating of outmatched nodes until absolutely necessary. Using the multiple elimination scheme, de-gree updating is delayed until an entire set of minimum degree nodes has been eliminated. Neither enhancement was implemented for the testing presented in Chapter 6, but both are important ideas that should not be overlooked. 2.9.1 Incomplete Degree Updating As discussed in Section 2.7, the coalescing of an indistinguishable set reduces the number of degree updates that must be performed. An incomplete degree updating is said to be performed because only the degree of the representative ever has to be updated, rather than each member of its indistinguishable set. The notion of incomplete degree updating can be extended with the introduction of outmatched nodes, first used in the YSMP. The definition of outmatched nodes is very similar to that of indistinguishable nodes. To simplify this discussion, consider a quotient graph, G9 = (X9,E9), which does not have uneliminated supernodes. If x and y are uneliminated nodes of G9, x is said to be outmatched by y if reachGi (y, S) U {y} C reachGi (x, S) U {x}. (2.18) From this definition it is obvious that the degreeat[x) > degreeGq(y). In addi-tion it can be shown that once this outmatched relationship is formed, it will be preserved by the MDA until one of the two nodes is eliminated [GL87]. These observations permit the expansion of incomplete updating. If x becomes out-matched by y, the degree of x will always be greater than or equal to that of y and the MDA can eliminate y before x. As a result, until y is eliminated it is not necessary to update the degree of any nodes, such as x, outmatched by it. Although the detection and storage of outmatched nodes is a tricky implemen-tation issue, the expanded incomplete degree update technique can significantly 53 reduce the number of unecessary degree updates and improve the performance of the MDA [GL87]. 2.9.2 Multiple Elimination In previously discussed versions of the MDA, a single node, or supernode, was eliminated at each elimination step of the algorithm, by graph transformation and degree updating phases. The multiple elimination technique proposed by Liu [Liu85], delays all degree updating phases until an entire independent set of minimum degree nodes (possibly including supernodes) has been eliminated. A single degree updating phase is then carried out on all affected, uneliminated nodes in the remaining graph. This tends to reduce the level of degree updating required because some of the affected nodes might have been influenced by the elimination of more than one node in the independent set. Liu did not refer to independent sets in the original paper cited above. This term was introduced in a later paper by George and Liu [GL87]. In this context a subset of a graph's set of nodes is said to be independent if no pair of nodes in the set is adjacent in the corresponding elimination graph. Eliminating an independent set of minimum degree nodes from a quotient graph, G9 = (X9, E9), can be thought of as follows. First, a node of minimum degree, x, is eliminated but the degrees of nodes in reach.Qq(x, S) are not updated. Another node, y, with the same degree is subseqently chosen from the unaffected portion of the graph's nodes, X9 — reachcq{x) — {x}, and eliminated without degree updating. This is repeated until no unaffected nodes with the same degree as x remain. At this point all affected nodes have their degrees updated in a single step. It should be noted that this technique will not necessarily produce a minimum degree ordering. However, Liu [Liu85] has shown that in general the quality of orderings produced is maintained when the MDA is modified by multiple elimination, while the required execution times were reduced. This concludes the discussion of the MDA's development to its current status. In the next chapter a tiebreaking extension is considered which provides an 54 additional criterion for selecting elimination nodes. 55 C h a p t e r 3 P r i m a r y M D A T i e b r e a k i n g o f t h e In the development of the minimum degree algorithm one common algorithmic feature is maintained in each successive formulation. If at any stage of the order-ing process more than one minimum degree node exists in the current graph, the elimination node is arbitrarily selected from the group of acceptable candidates. This chapter discusses the development of a new tiebreaking strategy to help re-duce the random nature of this selection and the integration of this scheme into the MDA using a quotient graph model. The node selection criterion forming the basis of this strategy relies upon the comparison of each minimum degree candidate's deficiency. It is hoped that this tiebreaking strategy will improve the quality of minimum degree orderings produced, by consistently reducing the amount of fill experienced for a wide variety of problems. For a tiebreaking strategy to have a significant impact upon the quality of order-ings produced, there must be sufficient opportunity to apply the new criterion. If more than one minimum degree candidate was identified in only a very small fraction of node selections, a tiebreaking enhancement of the MDA might not be profitable. In the majority of practical test problems considered in Chapter 6, 3.1 Tiebreaking Motivation 56 however, a significant fraction of node selections require an arbitrary choice be-tween two or more elimination candidates. As an example, consider the nine point problem of Chapter 6, consisting of 4225 nodes. When the MDA, with indistinguishable nodes, is applied to the graph of this problem, with its original labelling, 2181 node selections were made. (Each supernode selection contributed only one unit towards this total.) From this total 2167 tiebreaking opportuni-ties were identified. Because the choice between minimum degree candidates is arbitrary, this large number of tiebreaking opportunities results in a profusion of possible minimum degree orderings for the problem. To estimate the poten-tial impact of a tiebreaking strategy, the fill characteristics of these different orderings need to be investigated. In an actual MDA implementation the choice between minimum degree can-didates is not random. One particular node from a group of candidates is always selected. For example, perhaps the first minimum degree candidate of the origi-nal labelling is always chosen. If the initial labelling of the graph were modified, different nodes might be chosen from groups of candidates observed for node se-lections with tiebreaking potential. This in turn could produce a different mini-mum degree ordering for the problem. Thus the fill of several different minimum degree orderings of the same problem can be observed by applying the MDA implementation to graphs of the same problem with different initial labellings. Table 3.1 summarizes the fill values observed for six different minimum degree orderings of the 4225 nine point problem. Each fill value records the number of new nonzero entries introduced into the factor L. The first fill value is for the ordering produced when the MDA program, norm, of Chapter 5 is applied to a standard lexicographic ordering of the problem. The five remaining fill values correspond to orderings produced when the graph's initial order was randomly permuted. In this relatively small sample of the problem's collection of minimum degree orderings, the potential impact of an effective tiebreaking strategy is immediately apparent. The fill level of ordering 1 is approximately 14% smaller than the fill value observed for ordering 6, showing the sensitivity of fill levels to the initial ordering of the problem. Redirecting attention to the node selection 57 Ordering MDA Fill 1 99644 2 108222 3 113026 4 111345 5 110660 6 115643 Table 3.1: Fill Fluctuations of Minimum Degree Orderings process, these results demonstrate the sensitivity of fill levels to the particular manner in which ties between minimum degree candidates are resolved. These fill fluctuations were not restricted to this particular system. By far the majority of test problems considered in Chapter 6 exhibited this fill instability characteristic. Even a small reduction in fill can lead to a very significant reduction in the effort required by the remainder of the solution process. To illustrate this effect the numerical solution of the 4225 node nine point problem was calculated using orderings 1 and 6. Timings of the explicit factorization and solve steps of the solution process were recorded in each case. (See Chapter 5 for a outline of a program developed to execute the two steps and Chapter 6 for a description of the testing environment.) Ordering 6 required 816 CPU seconds for the factorization and 50 CPU seconds for the solution of the resulting triangular systems. The fill reduced ordering dramatically reduced the factorization CPU requirements by 24% to 621 CPU seconds, while the number of CPU seconds required for the triangular solutions was reduced to 44. These values demonstrate the tremendous savings attainable if the MDA can be enhanced by a tiebreaking criterion which consistently selects fill reduced orderings. 3.2 Tiebreaking Goals As mentioned above, the primary goal of a tiebreaking enhancement of the MDA is to improve the quality of orderings produced for a wide range of problems. The 58 MDA is used as a fill reducing, ordering scheme. Therefore, it is desirable that the tiebreaking orderings result in a reduced level of fill, without a significant increase in execution time and storage requirements. In Section 3 .1, the sensitivity of minimum degree orderings to the initial permutation of the system's unknowns and equations was discussed. A secondary goal of any tiebreaking strategy should be to increase the stability of the fill levels experienced when ordering different initial permutations of the same system. Suppose that on average the orderings produced by a new tiebreaking scheme are known to suffer from reduced fill in comparison to the orderings of a standard MDA. Reducing the fill fluctuations, resulting from different inital orderings, increases the confidence that an ordering produced using the tiebreaking scheme will actually attain a reduced level of fill for novel problems. It may be found that these goals are only partially attainable on a small class of systems. However, it is hoped that a tiebreaking strategy will be effective on a wide spectrum of sparse matrix problems. 3.3 Other Tiebreaking Research Very limited successful research on effective tiebreaking strategies for the MDA has been performed. Some researchers originally believed that an effective strat-egy could not be found and that efforts in this direction would be wasted. Re-cently attention has been refocussed on this problem with the realization that the manner in which ties between nodes of minimum degree are broken could have a significant impact upon the quality of orderings produced. (See Section 3.1.) In a recently published book on direct methods for the solution of sparse matrices [DER86], Duff, Erisman and Reid recognize the potential gains of a tiebreaking strategy but conclude, it is felt prematurely, that local tiebreaking strategies will be ineffective. Other than the ideas presented in this thesis, the only other known active research into the tiebreaking problem is that of George and Liu [GL87]. Their 59 approach to tiebreaking the MDA is very different from the strategy to be dis-cussed in Section 3.4. They suggest the use of a profile reduction algorithm on the matrix before applying the MDA algorithm. They did not detail the motiva-tion of this approach and it is not entirely clear why it would be beneficial. The reverse Cuthill-McKee ordering algorithm was chosen and the resulting process consists of the following two stages. A -» A = QAQT PAPT In the first stage the profile of A is reduced using the permutation Q, de-termined by the reverse Cuthill-McKee algorithm. The MDA is then applied to the resulting matrix A to find the fill reducing ordering represented by the permutation matrix P. George and Liu hope that the preordering Q will reduce the effects of a random initial ordering and stabilize the fill levels of the final ordering at a low level. This research is still far from complete but their pre-liminary results showed that the method was effective on a 180-by-180 square grid problem using a nine point finite differencing molecule. It is not at all clear whether this success can be extended to more general sparse problems. In some respects Liu's multiple elimination technique described in Sec-tion 2.9.2 is a partial tiebreaking strategy. One independent set of minimum degree nodes is eliminated before other minimum degree nodes are considered. George and Liu [GL87] mention in their concluding remarks that the potential of this scheme is presently being investigated as well. 3.4 Proposed Tiebreaking Strategy In the introductory chapter to this thesis another heuristic algorithm for the reordering of symmetric, sparse matrices was mentioned. In its graph theoretic form, the minimum deficiency algorithm was first proposed by Rose [Ros72]. In a similar fashion to the MDA, this algorithm approximates a global minimization of fill using a local minimization. Instead of eliminating the nodes of minimum degree at each stage, however, the node selected by this alternate algorithm must 60 Elimination Graph Quotient Graph Figure 3.1: Deficiency Example have minimum deficiency. Let Gq = (Xq,Eq) be a quotient graph representing any particular stage of the factorization process and let S be the set of eliminated nodes. In addition, assume for the moment that uneliminated supernodes are not implemented by the modelling process. (Their introduction will be discussed in later sections.) The deficiency of node x G Xq, def(x), can be defined in the following manner. def(x) = { {z,y} | z ^ y and {z,y} C reachG,(x) and z reachGq[y) } (3.1) The significance of the previous definition is most easily recognized by consid-ering the definition in an elimination graph environment. Essentially, using the conditioned equivalence of adjacency and reachable sets, the deficiency of a node x is the set of all pairs of nodes in adj(x) which are not adjacent. Or in other words, the deficiency of x is the set of edges amongst members of its adjacency set, which are required to make adj(x) a clique. As an example, in Figure 3.1 a small quotient graph and its equivalent elimination graph are illustrated. The deficiency of node 8 is {{2,3}, {3,4}} and that of node 2 is {{l, 3}, {1,8}}, while no edges exist in the deficiency of nodes 1, 3 or 4. The number of edges needed to make a node's adjacency set a clique is pre-cisely the amount of fill that will be experienced if that node is eliminated. Thus 61 by choosing the elimination node to have minimal deficiency at each stage, the minimum deficiency algorithm is actually choosing the node whose elimination produces the least fill. Unfortunately the cost of maintaining a record of the the size of each active node's deficiency throughout the factorization process is exorbitant for large graphs, making a complete implementation of the minimum deficiency algorithm impractical. The notion of deficiencies can be used in a more limited fashion, however, to form the basis of a tiebreaking strategy for the minimum degree algorithm. Each cycle of the basic MDA arbitrarily selects the next node for elimination from the group of minimum degree nodes in the current quotient graph. In terms of an elimination graph modelling, a minimum degree node, x, is chosen because its adjacency set will form the smallest clique upon x's elimination. This approximates the minimum fill criteria by minimizing the worst case of possible fill damage resulting from an elimination. The worst case occurs when no edges already exist amongst the members of adj[x) before x's elimination. Thus with respect to the local minimization of fill, the best minimum degree candidate to choose for elimination is the node which has the most edges already amongst members of its adjacency set. The primary tiebreaking strategy to be discussed in the remainder of this thesis is based upon this observation. At any stage of the modelled factorization, the proposed tiebreaking strategy directs that the node whose elimination would result in the fewest fill entries in the factor should be selected from the group of minimum degree candidates. As shown above, this corresponds to choosing the candidate with the smallest deficiency. For example, Figure 3.2 illustrates two portions of an elimination graph. Assuming that the minimum degree of all nodes in the elimination graph is 4, both x and y are candidates for elimination. However, in this case the new tiebreaking strategy would choose node x because it has the smaller deficiency set. The elimination of x would produce only 2 fill edges while 4 fills would be produced if y were selected. The tiebreaking scheme can be viewed in an alternate fashion. As mentioned previously, a rigorous local minimization of fill at each stage of the elimination 62 Deficiency = {{2,4},{2,3}} {{1,4},{2,3},{2,4},{3,4}} Figure 3.2: Deficiency Tiebreaking Example process proves very costly. An application of the MDA's minimum degree cri-terion can be thought of as a step in which the set of all uneliminated nodes is reduced to a more managable subset, within which a minimum fill node is more likely to be found. The tiebreaking strategy then applies the minimum fill criterion to this smaller set, hopefully finding a minimum or near minimum fill candidate. It should be noted that even with the new tiebreaking strategy, the MDA is still not deterministic. In practice many nodes may have the same minimum degree and the same minimal level of deficiency. Chapter 4 discusses approaches considered for developing secondary tiebreaking strategies to assist in the node selection process from amongst this restricted set of candidates. However, un-less otherwise stated, the selection of the elimination node from candidates of minimum degree and minimal deficiency will be arbitrary. Finally, much of the discussion of deficiency in this section has been centered 63 about elimination graphs. As mentioned, this approach was selected to avoid the added complexities introduced by quotient graphs. During the remainder of this chapter, the proposed tiebreaking scheme will be fully integrated into the MDA using the quotient graph model enhanced by indistinguishable nodes. 3.5 Deficiency Calculations As defined in Section 3.4, deficiency of a node is a set of edges. To actually implement deficiencies in this form would be inefficient when all that is really required, by the enhanced MDA, is the number of edges in a particular node's deficiency. When it does not create confusion the term deficiency will also be used to refer the number of edges in the set. In this section a preliminary algorithm for the calculation of deficiencies will be developed for a quotient graph environment without uneliminated supernodes. The calculation of a node's deficiency is fairly straight forward. A count of the number of edges required to make the reachable set of a node, x, into a clique in the corresponding elimination graph is required. From an implementation standpoint, it is easier to count the edges which already exist amongst the nodes of x's elimination graph adjacency set. In quotient graph terms, this corresponds to the number of pairs of nodes in x's reachable set which are connected by a path of strictly eliminated nodes. This value is essentially the complement of deficiency and will be referred to as a node's connectivity. Once its connectivity is calculated, a node's deficiency can be determined by subtracting the actual connectivity observed, from the maximum connectivity value possible. The maximum possible connectivity value for a particular node is easily cal-culated using its degree value. If x is a node of degree, deg(x), in an quotient graph, the maximum level of connectivity which can exist between the nodes of its reachable set is maxconn(x) = (deg(x) * (deg(x) — l))/2. (3.2) Equation 3.2 can be derived by realizing that each node in x's reachable set could 64 be connected, by a path of eliminated nodes, to at most deg(x) — 1 nodes in the same set. (If two nodes are adjacent, they may be considered connected by a path of 0 eliminated nodes.) The formula introduced in equation 3.2 is obtained by considering all deg(x) members of the reachable set and recognizing that this process counts all paths twice. In a quotient graph environment, when two nodes are connected by a path of uneliminated nodes, they must be in each other's reachable sets. Thus to calulate the deficiency of a node x, the reachable set of each node in x's reachable set can be searched for other nodes which are also members of x's reachable set. This observation is formalized in the following pseudocode algorithm calculating the connectivity of node x, conn(x), in a quotient graph Gq = (Xq,Eq). Let S = the set of eliminated nodes 1. conn(x) := 0 2. For each y 6 reachGi[x, S) conn(x) := conn(x) -f- \reachG<, (x, S) D reachGq (y, S) \ 3. conn(x):= conn(x)/2 In step 3 the connectivity value is corrected by a factor of 1/2 because the intersection process of step 2 counts each connecting path twice. The deficiency of x, def(x), can be calculated from def(x) = maxconn(x) — conn(x). (3.3) As can be seen from equation 3.3, the minimization of deficiency corre-sponds to a maximization of connectivity. This permits the reformulation of the tiebreaking strategy in terms of connectivies. Choosing a node with maximal connectivity from the group of minimum degree nodes is equivalent to selecting a node with minimal deficiency. As will be discussed in Chapter 5, the implemen-tations developed for this thesis base tiebreaking on connectivity values because slightly fewer calculations are required for their determination. 65 3.6 Connectivity Lists The MDA tiebreaking strategy outlined in Section 3.4 results in a node of min-imum degree and minimal deficiency being chosen for elimination at each stage of the process. The efficiency of the enhanced algorithm is very sensitive to the selection process used to identify an acceptable candidate. For example, it would be very costly to recalculate the deficiencies of the entire group of minimum de-gree nodes at each stage of the process. Experience shows that the group of minimum degree candidates at any particular step of the elimination process can be quite large. It would also be unacceptable to store the current deficiency of all uneliminated nodes throughout the simulated factorization. If this were a cost effective alternative, a practical implementation of Rose's minimum deficiency algorithm would be possible. The approach taken in this thesis is to maintain the deficiency values of a subset of the uneliminated nodes. This subset will always include the deficiencies of the entire set of minimum degree nodes, and perhaps for some nodes whose degree is close to the minimum. To select a node for elimination using the deficiency tiebreaking enhanced MDA selection process, the deficiency values of all possible minimum degree candidates must be available. Because of their relatively low degrees, many of the nodes not selected for elimination will soon, if not in the next step, be minimum degree nodes once again. To avoid costly recalculation, the deficiencies of many unsuccessful candidates are updated and stored. The deficiencies of higher degree nodes and some unsuccessful candidates will not be calculated and stored until it is closer to their elimination. This special subset of nodes will have their connectivities stored in a connec-tivity list. The list will be sorted by decreasing connectivity and the elimination node of each step will be selected using a sequential search for a minimum degree node beginning at the head of the list. The performance of the enhanced MDA is very sensitive to the implementation and maintenance of this list. Chapter 5 includes a more detailed discussion of the particular characteristics and imple-mentation of this data structure. 66 3 . 7 Deficiency and Uneliminated Supernodes The previous discussion has excluded the possibility that uneliminated supern-odes exist in one of the quotient graphs used to simulate the factorization. The introduction of coalesced sets of indistinguishable nodes produces added com-plications for the tiebreaking scheme and deficiency calculations. This section investigates what effect uneliminated supernodes have upon the deficiency cal-culation of a normal node, what is taken as the deficiency value of a supernode itself and how these decisions affect the proposed tiebreaking scheme. 3.7.1 N o r m a l D e f i c i e n c y C a l c u l a t i o n s W i t h S u p e r n o d e s The first algorithm, presented in Section 3.5, for determining a node's connectiv-ity (and subsequent deficiency), neglected the added complications introduced when an uneliminated supernode is encountered in the node's reachable set. As outlined in Section 2.7, when a supernode is found in a node's reachable set dur-ing a degree determination, all members of the indistinguishable set contribute towards the degree value. This is because the MDA considers a node's degree to represent the size of the clique which would be formed by its elimination from the corresponding elimination graph. While the MDA attempts to minimize the size of clique formed by each elimination, the tiebreaking scheme attempts to choose the node for which the fewest new edges are required to form the minimal clique. Thus when calulating a normal node's connectivity in a quotient graph, all supernodes encountered should be considered in their expanded form so that all paths between nodes explicitly and implicitly represented in the graph are taken into consideration. The following pseudocode algorithm presents a revised form of the connec-tivity calculation which allows for uneliminated supernodes. The description uses the function size, first introduced in Section 2.7.4, which returns the num-ber of nodes in a set after expanding all supernode members. A new function, superlength, is introduced and returns the total number of nodes in the indistin-guishable set represented by a single supernode. The superlength of a normal 67 uneliminated node is 1. It should be reiterated that this algorithm is presently only for the connectivity calculation of a normal node. The connectivity of su-pernodes themselves will be considered in the following subsection. Let Gq = (X9,Eq) = the current quotient graph S = the set of eliminated nodes conn(x) = the connectivity of node x superlength(x) = the number of indistinguishable nodes represented by x size(W) =the number of members in set W, with each supernode x £\V considered in expanded form 1. conn(x) := 0 2. For each y G reachat(x, S) (a) conn(x) := conn(x) + superlength{y) * size(reachaq(x, S) D reachGi{y, S)) (b) If superlength(y) > 1 then conn(x) := conn(x) + superlength(y) * (superlength(y) — 1) 3. conn(x):= conn(x)/2 The basic accumulation step of the connectivity calculation is performed in step 2a. In this step the expanded size of the intersection between the reachable sets of x and y is multiplied by y's superlength so that the connectivity con-tribution of all members in y's indistinguishable set are considered. The set of nodes coalesced to form a supernode are indistinguishable, meaning that in the corresponding elimination graph all members of the supernode are adjacent to the nodes of the intersection, not just the representative. For example, in Fig-ure 3.3 if the connectivity of node 6 is being calculated and y is the supernode {1,2,3}, the expanded form of the supernode shown in the corresponding elimi-nation graph demonstrates that all three indistinguishable nodes are adjacent to 68 Quotient Graph Corresponding Elimination Graph Figure 3.3: Supernode Expansion for a Connectivity Calculation node 5 and thereby contribute to the connectivity of node 6. For similar reasons the size function is also used to count the number of nodes in the intersection, so that all members of each supernode are considered belonging to y's reachable set. For example, in Figure 3.3 when y is assigned node 5 during the calculation of the connectivity of node 6, it is desired that all three nodes {1,2,3} be counted towards the connectivity result. In step 2b if node y is a supernode its internal connectivity is added to the growing result. All pairs of nodes from an indistinguishable set can be consid-ered connected by paths of eliminated nodes. Therefore in the corresponding elimination graph the members of a supernode form a clique. Each member of the clique is adjacent to x in the same elimination graph, resulting in a con-tribution toward x's connectivity equal to the maximum connectivity of a node whose degree equals y's superlength. The factor of one half is not included be-cause step 2a counts each path between members of x's reachable set twice. The double counting is corrected in step 3. As a trivial example, in Figure 3.3 nodes 4 and 6 both have a maximal connectivity of 6, while the connectivity of node 5 is 9. 69 3.7.2 The Deficiency, Degree and Elimination of Supernodes With the introduction of the new tiebreaking strategy, it is no longer clear if supernodes can be considered as a single unit when chosen for elimination. The resolution of this uncertainty will dictate whether the external or true degree of a supernode should be considered by the selection process in the tiebreaking MDA. In addition the formulation of a supernode's deficiency must be considered to decide how the other indistinguishable nodes represented by the supernode should influence its own deficiency calculation. Assume for the moment that the nature of a supernode's degree and de-ficiency have been settled upon. In the normal MDA (arbitrary tiebreaking) when a supernode is selected for elimination, all represented nodes can be elim-inated in sequence, as described in Section 2.7. This was permitted because after the elimination of each member from the indistinguishable set, the unelim-inated members of the set remained indistinguishable and of minimum degree. This allows their immediate elimination in accordance with the MDA selection criteria. The introduction of the tiebreaking scheme may hamper the elimination of a supernode as a unit if strict obedience to the new scheme is maintained. Al-though all remaining nodes of the coalesced set will be indistinguishable and of minimum degree after representative's elimination, they may not be of minimal deficiency as well. Momentarily ignoring the constraints of minimum degree, consider the elimination of the representative of the supernode illustrated in Figure 3.3. (Node 1 is assumed to be the representative.) The resulting elim-ination graph is shown in Figure 3.4. After the elimination of node 1, nodes 2 and 3 certainly have remained indistinguishable. More importantly, however, the adjacency set of each member of the remaining indistinguishable set {2, 3} now forms a clique. In this case the elimination of the entire indistinguishable set, originally represented by the supernode, cannot result in more fill than if a representative were eliminated alone. Although this is a very simple case, it mo-7 0 Figure 3.4: Elimination of a Supernode Representative tivates the following theorem which allows the generalization of this observation to all uneliminated supernodes. Theorem 2 Let Y represent a set of I indistinguishable, uneliminated nodes in the quotient graph G q = ( X q , E q ) , which have not been coalesced to form a supernode. (Y C Xq.) In addition let defGq\x) refer to the size of x's deficiency in graph G] and S the set of eliminated nodes. Suppose that one node y* G y is chosen by the MDA for elimination from G\. If y* ts eliminated from Gq to form G q + 1 > then for all remaining nodes of the indistinguishable set, y,- G Y — {y*}, defG* (y.) = 0. Proof: The elimination graph model is used during the course of this proof. This model was chosen because of its equivalence to quotient graphs and the ease with which adjacency sets and fill edges can be discussed in comparison to reachable sets and new paths between uneliminated nodes. Let Gi = (Xi,Ei) and G i + 1 — ( X i + i , i7t-+i) be elimination graphs correspond-ing to the quotient graphs G q and G q + l respectively. In quotient graphs, two nodes x and y are said to be indistinguishable if reachQi (x, S) U {x} = reachG*(y, S) U {y}. The nodes x and y are also indistinguishable in the corresponding elimination graphs. Using the special equivalence of reachable sets and adjacency sets, the definition of indistinguishable nodes can be rewritten as 71 adjG.(x) U {x} = adjGi(y) u iv}-Consider Y to represent the set of £ indistinguishable nodes {yl5 y2, •. •, Vt} C Xi and let T = o-djcXvi) — Y. The set T represents the set of all nodes adjacent to a member of Y, excluding the members of Y themselves. By the alternate definition of indistinguishable nodes given above, the membership of set T is independent of the particular y,- chosen. Firstly, it should be noted from the alternate definition of indistinguishable nodes that in G,- any pair of nodes yn,ym G Y must be adjacent. Thus the members of Y are all pairwise adjacent and form a clique. No new fill edges are possible between the members of Y themselves. As well, independent of the particular node chosen for elimination, no new fill edges are possible between a pair of edges {i,-,y,}, where U G T and y,- G Y. By the definition of the set T, each member of Y is already adjacent to all members of T. Thus when a member of Y , y*, is chosen for elimination, the only possible source of new edges is amongst pairs of nodes in T. When y* is eliminated, it is removed from the elimination graph along with all associated edges and all nodes in adja^yk) are made pairwise adjacent. No new edges can be added amongst nodes of Y or between nodes in T and Y, but edges may be added to T, depending on rfe/c?,(yt), to make it a clique. In any event, after the elimination all possible edges must exist amongst the members of T, Y — {y^ } and between all pairs with one node from each of these sets. As a result, for all possible t, i 7^ k, the set adjai+l(yi) U {y,} is a clique and for any node y,- G Y — {y*}, defGi+M = 0- • Without the results of Theorem 2 it might have been necessary to eliminate each member of a supernode separately. After the removal of each indistinguish-able node, the connectivity of the smaller supernode would have had to have been reevaluated and reinserted into the connectivity list, in preparation for the se-lection of the next minimum degree node. Having to follow such a scheme would essentially negate most of the advantages observed for uneliminated supernodes. Fortunately, Theorem 2 allows the complete integration of the MDA, using 72 quotient graphs and uneliminated supernodes, with the proposed tiebreaking strategy. When a supernode is selected for elimination it can be treated as a single unit, and eliminated with one step of mass elimination. Consider the quotient graph G\ = (Xf, Ej) in which uneliminated supernodes have been formed and the corresponding quotient graph G\ = [X\, Ej), in which the supernodes are fully expanded. Suppose that a supernode has been selected from G\ for elimination by the tiebreaking MDA. The effects of the supernode's mass elimination from G\ can be followed by considering the consecutive elim-ination of its members from G\. After the elimination of the first member of the indistinguishable set from G\, it has already been shown that the remaining members of the set are indistinguishable and still of minimum degree. In ad-dition, Theorem 2 guarantees that the remaining nodes in the indistinguishable set are still of minimum deficiency. Another member of the indistinguishable set can then be identified as having minimum degree and minimum deficiency and be selected for elimination. Repeated application of Theorem 2, after each elimination, allows the sequential elimination of each indistinguishable member in turn. With the deficiency of each indistinguishable node zero, except for per-haps the representative, only the first node of the set can actually produce fill upon elimination. Having shown that the members of a supernode can be con-secutively eliminated from G\ in accordance with the tiebreaking form of the MDA, it is possible to eliminate the supernode from G\ in a single step of mass elimination. This allows the significant advantages of uneliminated supernodes to be fully realized by a MDA enhanced by the proposed tiebreaking strategy. Theorem 2 allows uneliminated supernodes to be treated as a single unit in the tiebreaking version of the MDA. This suggests the use of external degrees, first introduced in Section 2.8. As previously mentioned, the external degree strategy was independently formulated using a slightly different motivational approach. The normal algorithm, without uneliminated supernodes, attempts to locally limit fill by selecting the node for elimination which has the smallest reachable set. This limits the number of nodes between which new fill paths can be produced. (A new fill path in a quotient graph is equivalent to a new fill edge 73 in the corresponding elimination graph.) As discussed in the proof of Theorem 2, during the elimination of the members of an uneliminated supernode, no new fill paths associated with the uneliminated members of the indistinguishable set are possible. The only set of nodes between which fill paths are possible is the reachable set of the supernode in compressed form. Thus if a supernode is going to be treated as a single unit and deleted using a single step of mass elimination, the degree of a supernode should be considered in its compressed form. In many respects the use of external degrees permits the elimination of a higher degree node at the cost of a lower degree node. As far as the tiebreaking strategy is concerned, all that matters is the number of new fill entries a supernode's elimination will cause. As discussed in the previous paragraph, any new fill paths created must be amongst members of the supernode's reachable set. Thus the deficiency of the representative is the same regardless of whether or not the other members of the indistinguishable set are considered. As a result the deficiency of a supernode can be calculated from its compressed structure, as though the representative was a normal uneliminated node in the quotient graph. In particular the connectivity of a supernode can be calculated using the algorithm presented in Section 3.7.1. In summary, the tiebreaking version of the MDA will manipulate unelimi-nated supernodes in their collapsed form, using the external degree and external connectivity approaches, as discussed above, during the elimination node selec-tion process. It should be reiterated, however, that considering an uneliminated supernode as a single node when calculating the degree and connectivity of other nodes (including other supernodes) would result in deviations from the intended form of the tiebreaking minimum degree algorithm. 3.8 Degree and Connectivity Updating The introduction of the deficiency tiebreaking extension has increased the com-plexity of updating required after each elimination step of the MDA. In the original quotient graph version of the algorithm, it was necessary only to update 74 the degrees of those nodes whose reachable sets were affected by a particular elimination. For example, if node t was eliminated from Gq, the subset of nodes requiring degree updates would have consisted of the set reacha<i(t, S). The de-velopment of a tiebreaking strategy based upon deficiency, however, has broad-ened the group of nodes affected by an elimination beyond that of the eliminated node's reachable set. The subset of uneliminated nodes needing degree updat-ing is unchanged, but now deficiency (or connectivity) updating must also be considered. Whether or not a particular node needs connectivity updating depends in part upon the particular form of connectivity lists implemented. As discussed in Section 3.6, the connectivity and hence deficiency, of each uneliminated node is not necessarily stored at all stages of the process. Obviously, while a node's connectivity is not being stored in the connectivity list, this particular node will not require connectivity updating. The discussion in the remainder of this section essentially ignores this possiblity and it is assumed that all affected nodes need connectivity updating. The resulting updating algorithm is easily modified to reflect inclusion constraints of a particular connectivity list maintenance scheme. (See Chapter 5.) Consider the elimination of node t from the portion of the quotient graph, Gq, shown in Figure 3.5. With respect to connectivity and degree updating, the remaining nodes of the quotient graph can be grouped into three categories. Letting S represent the set of eliminated node in Gq, labelled as "E" nodes, the first category consists of all nodes in reachai(t, S). As previously mentioned, the nodes of this set have their own reachable sets altered by the elimination of t and therefore need their degree reculated. A node's reachable set also plays a central role in the calculation of its connectivity. Consequently the nodes of this first category also require a reevaluation of their connectivity. In Figure 3.5, examples of such nodes are a, b or c. The second category of uneliminated nodes are those which require a recal-culation of their connectivity value but not being members of reachaq(t, S), have unaffected reachable sets. Formally, the members of this category consist of all 75 Eliminate Figure 3.5: Connectivity and Degree Updating nodes in the set (J {reachGq (x, S)) - (reachG<,(t, S) U {t}). (3.4) z£reachaq (t,S) Because of their distance from the elimination node t, members of this class of nodes will often be referred to as satellite nodes. From the formal definition given above, each reachable set of a node in this second group must be adjacent to at least one member of reachGg(t, S). Because the reachable sets of these nodes change when t is eliminated, each category two node has members in its reachable set which may change their contribution towards its connectivity. Nodes x, y and z of Figure 3.5 are examples of these so called satellite nodes. It should be noted that because members of this group are not in reachGq(t, S), it is impossible for fs elimination to decrease their level of connectivity. Their connectivity may remain unchanged or increase when t becomes an eliminated node. Finally, the third category of uneliminated nodes are those which do not have their degree or connectivity affected by the elimination. Each of these nodes is insulated from the elimination node by at least two uneliminated nodes in all connecting paths. In Figure 3.5, examples of this final class of nodes are m and 76 n. A keen observer might recognize that node x of Figure 3.5 is shown in the reachable set of only one member of the category one set and will not suffer a connectivity change. If the connectivity of node x is to have changed, a new edge or eliminated node bridge must be formed between two members of reachai [x, S) upon Vs elimination. Because of the reciprocating nature of reachable sets, this requires that two nodes in reach,Qq(x,S) must have their reachable sets affected by i's elimination, not just one. As a result the membership of nodes in category two could be refined to include only those nodes which have two members of reacha<i(t,S) in their reachable set. However, to identify those nodes which meet this stricter definition is difficult and could require a great deal of overhead storage and computation. In the updating algorithm described below the original definition of the satellite nodes of category two is used to produce a very simple, but relatively efficient updating scheme. For all nodes of category one, it is necessary to completely recalculate their connectivity. This is not necessary, however, for those uneliminated nodes in-cluded in category two. Instead of recalculating these connectivities values, it is possible to modify the current value of a satellite's connectivity to reflect the change in connectivity resulting from only those nodes with altered reach-able sets. In step 2a of the connectivity calculation algorithm presented in Sec-tion 3.7.1, the contribution of each node in x's reachable set towards its total connectivity is accumulated. For example, before t's elimination in the graph of Figure 3.5, node b contributes to y's connectivity in the following manner. conn(y) := conn(y) + 1/2 * superlength(b) *size(reacha<i{y, S) f~l reach,Gi{b, S)) (3.5) The factor of 1/2 is brought into this step to avoid double counting in subsequent reformulations. When t is eliminated from Gq to form the new quotient graph Gql, the reachable set of node b is altered, possibly changing the connectivity of node y. If the reachable set of b in Gq is still available, the connectivity of y can 77 be modified in the following manner to reflect the change in 6's reachable set. conn(y) := conn(y) — 1/2 * superlengthai (6) * size(reachai (y,S) n reach,Qq(b, S)) + 1/2 * superlengthGqi(b) *size(reachGqt(y, S') n reaches(6, S')) (3.6) If such a correction is performed for each node of y's reachable set whose own reachable set has been altered by t's elimination, the connectivity updating of node y can be accomplished without using the calculation algorithm outlined in Section 3.7.1. As previously mentioned, the connectivity of satellite nodes may increase or remain unchanged after the elimination of a selected node t. This observation of the connectivity total can be extended to each connectivity contri-bution by a category one node. Once again consider y to be a category two node and b a category one node in its reachable set. The correction of y's connectivity by node b can only increase the connectivity total or leave it unchanged. The potential difference in connectivity is dependent upon the change in 6's reachable set upon tfs elimination. Obviously 6's new reachable set in Gql will not include node t. A satellite node is not in f's reachable set, and as a result this change cannot affect y's connectivity value. Since only an elimination can completely remove a member from a node's reachable set, all other nodes will remain in 6's reachable set. Because y's reachable set is unchanged, 6's contribution to y's connectivity cannot be reduced by t's elimination. The elimination of t can increase the number of different nodes in 6's reachable set, however, by providing a new eliminated node bridge to additional unelim-inated nodes. It is this potential increase in the reachable set that may cause an increase in y's connectivity. Based on these observations, the update step of equation 3.6 can be simplified by introducing the difference set diff(b) = reachaq'(b, S') — reachaq(b, S) = reachaq(t, S) — reachoq(b, S) and by assuming that the superlengths of nodes have not changed. conn(y) := conn(y) + 1/2 * superlength(b) * 78 *size(dif f(b) f l reachGqi(b, S')) (3.7) The assumption that superlengths have not changed requires that the updating of satellite connectivity values are performed before the search for new indistin-guishable sets. As will be discussed in Chapter 5, this is a realistic implementa-tion assumption. To formalize the discussion presented in this section, the following algorithm is proposed to coordinate all necessary degree and connectivity updating required after each elimination. It is assumed that the selected node, t, has already been eliminated from the quotient graph but that the reachable sets of all category one nodes before i's elimination are available. If a node's connectivity is changed it must, of course, be reinserted into the connectivity list with its new value, although this is not detailed by the algorithm's pseudocode. The recalculation of a node's connectivity in step 2b is assumed to be carried out following the algorithm of Section 3.7.1. Let Dx = the degree of node x conn(x) = the connectivity of node x t — the node eliminated from graph Gq to form the current quotient graph Gql S = the set of eliminated nodes excluding t super length(x) = the number of indistinguishable nodes represented by x size{W) =the number of members in set W, with each supernode x G W considered in expanded form diff{b) = a variable for storing the difference set of b 1. For each b G reachGq(t, S) (a) diff{b) := reachG<,(t,S) — reachGq(b,S) (b) For each y G (reachGq(b, S) - (reachGq(t, S) U {t})) conn(y) := conn(y) + 1/2 * superlength{b) * size(diff(b) n reachGql(y, S U {<})) 79 2. For each 6 G reachGq(t, S) (a) Dt := size(reach,Gi'{b,S U {*})) (b) recalculate the connectivity of 6 The updating of category one and two nodes is performed separately so that in the enhanced MDA, to be discussed in the following section, the search for indistinguishable set can be performed between the two stages of the updating process. The modification of satellite connectivities is carried out before the formation of new supernodes so that the superlengths are unchanged from the previous graph. The recalculation of degrees and connectivities of nodes in cate-gory one are performed after the search so that advantage can be taken of newly formed supernodes. At this point it should also be emphasized that with the introduction of con-nectivities, supernodes have an additional advantage. Since only the deficiency of a supernode's representative is ever required, the formation of supernodes increases the efficiency of the updating steps outlined above by reducing the number of nodes requiring updating. In practice, the modification of the connectivity values for category two nodes, as outlined above, is more efficient than recalculating their values. The cost of a connectivity calculation is essentially based upon the number of intersections that must be performed. In a total reevaluation one intersection is performed for each member of the node's reachable set. The worst case for the modification scheme outlined above is when the node's reachable set consists entirely of category one nodes. Even in this case, however, the total number of intersections required is still comparable to the requirements of a total recalculation. Each modification requires a single intersection. In more normal circumstances fewer intersections are required for the modification scheme, even taking into consideration the formation of difference set. In addition all intersections involving a difference set are more cheaply calculated because in general the set has fewer members than 6's reachable set and never more. 80 3.9 M D A With Deficiency Tiebreaking The tiebreaking version of the MDA to be presented in this section is based upon a modified form of the algorithm discussed in Section 2.7.4, which incorporated indistinguishable nodes into the quotient graph model. The introduction of de-ficiency tiebreaking mainly affects the minimum degree node selection and the updating portions of this algorithm. Following the tiebreaking extension, a node from the minimum degree set with maximal connectivity, in comparison to other minimum degree candidates, is selected for elimination at each stage. The external degree and external con-nectivity is considered for all uneliminated supernodes. The original tiebreaking scheme was presented in terms of deficiencies. However, the use of connectivities is more conducive to an implementation and the maximization of connectivity values corresponds to a minimization of deficiency. The degree updating section of the original algorithm has been expanded to handle connectivity updating as described in Section 3.8. For reasons discussed in this latter section, the search for indistinguishable sets is performed between the two major blocks of updat-ing. The connectivity recalculation called upon in the second block is assumed to follow the algorithm oulined in Section 3.7.1. As previously mentioned, to maintain the connectivity of every uneliminated node throughout the simulated factorization would be unacceptable. Section 3.6 outlined the connectivity list which will underlie the tiebreaking MDA. This list will only store a limited number of node-connectivities pairs. In Chapter 5, an effective implementation of the connectivity list will be discussed, detailing its precise role in the tiebreaking MDA. At this point it should be reemphasized, however, that connectivity updates will actually be performed for only those nodes which are included in the connectivity list. Finally, in the enhanced algorithm, the new quotient graph is shown as be-ing created from the original graph and the new component partitioning of S. Successive quotient graphs are not actually formed with reference to the original graph but to their predecessor. A precise algorithm detailing this transformation 81 is presented in Chapter 5. The following pseudocode description of the tiebreaking MDA uses functions and terminology introduced for previous versions of the algorithm. The use of any new variables is briefly described in the declaration preamble. MDA Enhanced With Deficiency Tiebreaking Let Dx = the degree of node x minD = set of minimum degree nodes conn(x) — the current connectivity value of node x S = the set of eliminated nodes G (= {X, E)) — the graph of the system's original coefficient matrix Gq (= (Xq,Eq)) = the current quotient graph saverch = storage for the reachable sets of several nodes saverch(x) = stored reachable set of node x diff{x) = the difference set for node z expand(x) = {x} U {all other members of the supernode represented by x} size{W) =the number of members in set W, with each supernode x 6 W in expanded form superlength(x) = the number of uneliminated nodes represented by x 1. S := 0, Gq := G 2. Initialize the connectivity list 3. For each node i £ l Dk := \adjG(k)\ 4. While S ^X do (a) minD := {x \ x (E Xq — C(S) and Dx = miniexi-c(s){Di)} (b) Choose a node t £ minD, for which conn(t) = maxi e m,„£)(conn(z)) 82 (c) saverch(t) := reachai{t,C(S)) (d) For each x G reachGi(t,C(S)) saverch(x) := reachGq(x,C(S)) (e) S := S U expared(i) (The nodes of a mass elimination are recorded in an arbitrary order.) (f) G» := G/Q(S) (g) For each x G soverc/i(i) i. diff[b) := sauerc/i(t) — saverc/i(x) ii. For each y G (saverch(x) — (saverch(t) U {£ } ) ) conn(y) := conn(y) + 1/2 * super/engt/j.(x) * size(diff[x) PI reachaq(y, S U {£ } ) ) (h) Identify all sets of indistinguishable nodes amongst the members of saverch(t) and form a supernode for each set. Update Gq and Xq appropriately. (i) For each x G saverch(t) i. Dx := size{reacha<i{x, C(S)) ii. recalculate the connectivity of x Figure 3.6 illustrates the application of this enhanced algorithm to a small example. In each successive quotient graph of the simulating sequence, the min-imum degree nodes are identified by a small asterisk beside each potential candi-date, along with its current connectivity value. The application of the tiebreaking MDA to this graph results in three fill elements in the corresponding factor L. Each new fill is illustrated by a dashed fill path between two uneliminated nodes in the appropriate quotient graph. As previously stressed, in the tiebreaking algorithm the choice between maximal connectivity, minimum degree nodes is arbitrary. For this particular example, the level of fill is not affected by the precise manner in which these secondary ties are resolved. The application of the normal MDA with arbitrary primary tiebreaking to the same problem, how-ever, could produce up to a maximum of five fills, depending on the particular 83 minimum degree ordering chosen. Although Figure 3.6 is clearly a trivial exam-ple, the fill reducing potential and increased stability of the tiebreaking MDA is demonstrated. Figure 3.6 also provides additional illustrations of the manner in which in-distinguishable nodes are handled by the algorithm. For example, nodes c and d are not formed into a supernode after the elimination of b, because the iden-tification scheme introduced in Section 2.7.3 requires a second eliminated node to be adjacent to the indistinguishable set members. After the elimination of node i, however, nodes m and h are identified as indistinguishable and coalesced to form a single uneliminated supernode represented by m. Using external de-grees, the minimum degree node in the subsequent quotient graph is m. The external connectivity of supernodes ignores all other coalesced members and m's connectivity value is correctly shown as 1. 3.10 Compatibility of Tiebreaking With Additional M D A Extensions In Section 2.9 two important extensions to the MDA were discussed. Although outmatched nodes and multiple elimination are not included in the implementa-tions developed for this thesis, their possible incorporation into the tiebreaking MDA is briefly discussed in this section. The details of the compatability and effi-ciency of such enhancements are not finalized, but a general discussion, intended to identify the major issues, is included. 3.10.1 Tiebreaking and Incomplete Updating The introduction of outmatched nodes to the normal MDA allowed the identifi-cation of groups of uneliminated nodes which could be eliminated after the node (or nodes) outmatching them and still produce a minimum degree ordering. This permitted the postponement of degree updates for outmatched nodes until the nodes outmatching them were eliminated. Such an approach is possible because 86 the degree of outmatched nodes is guaranteed to remain greater than or equal to the degree of the outmatching nodes during the elimination process. In the tiebreaking MDA, the dominant feature of the elimination node selec-tion process remains based upon nodal degrees. Thus it appears that a similar scheme using the identification of outmatched nodes could also be introduced to reduce the amount of updating required in the tiebreaking MDA. In addi-tion, perhaps the incomplete updating scheme could be extended to include the connectivities of outmatched nodes. After the elimination of a node, only those remaining uneliminated nodes not identified as being outmatched would be eligi-ble for degree or connectivity updating. Unfortunately, the application of such a scheme to the tiebreaking MDA may not improve the algorithm's efficiency and may also detrimentally affect the quality of orderings produced. A scheme was outlined in previous sections for updating a node whose con-nectivity, but not degree, was affected by an elimination. The scheme avoided a total recalculation of the connectivity of these nodes by modifying the connec-tivity contribution from each node in its reachable set whose degree was affected by the elimination. Such a scheme would not be possible if the connectivity updating of a category two (or satellite) node was delayed several eliminations. Instead a total recalculation would be required. In this fashion the advantages realized by incomplete connectivity updating would be partially reduced. In the tiebreaking form of the MDA the potential reduction in degree updates may also not be realized. Consider a node, x, not identified as outmatched by any other node. Suppose that after an elimination the connectivity of node x must be recalculated. To perform this calculation the reachable set of each node in reack(x,S) is required. There is no guarantee that some of the members of reach(x, S) will not be outmatched. The recalculation of x's connectivity forces the degree of outmatched members to be updated, because the formation of a reachable set is essentially equivalent to a degree determination. To determine whether the two problems outlined above make the efficiency of all incomplete updating schemes for the tiebreaking MDA unacceptable would re-87 quire additional study using actual encodings and realistic test problems. There is one additional obstacle, however, to the introduction of an extended incomplete updating scheme to the tiebreaking MDA. The degree of an outmatched node is guaranteed to be greater than or equal to the degree of those nodes outmatching them. As a result, it is possible that an outmatched node of minimum degree, with a larger connectivity value than the actual node selected for elimination, may be overlooked. If an incomplete updating scheme sufficiently disturbs the tiebreaking selection process in this fashion, the potential fill improvement expe-rienced by an undisturbed tiebreaking ordering may be lost. Although this ap-pears to severely limit the possibility of applying incomplete updating schemes, careful study with actual implementations and different problems would once again be required to specifically characterize the effect on ordering quality. 3.10.2 Tiebreaking and Multiple Elimination As will be discussed in Chapter 4, the deficiency tiebreaking scheme discussed in this chapter does not always isolate a single node for elimination. In practise, for many problems, several nodes of maximal connectivity may be found amongst the minimum degree candidates. Thus a tiebreaking version of Liu's multiple elimination scheme could eliminate an independent set of minimum degree nodes with maximal connectivity values. All connectivity and degree updating could be delayed until the entire set was eliminated. Not only would the number of degree determinations be reduced, but redundant connectivity recalculations could be avoided. Unfortunately, the multiple elimination would undoubtedly complicate the satellite connectivity modification scheme of Section 3.8, perhaps forcing the total recalculation of these nodes' connectivity values. Hopefully this efficiency loss would be more than compensated for, however, by the overall reduction in updating requirements. When the multiple elimination scheme was introduced to the normal MDA, it was noted that strictly minimum degree orderings might not be produced by the modified algorithm. The introduction of a similar scheme to the tiebreaking 88 MDA also has these problems, with the added complication that the minimum degree nodes may not be of maximal connectivity when eliminated. It is unclear to what extent this would actually affect the quality of orderings produced in comparison to the orderings based on a more strict observance of the tiebreaking strategy. However, based upon practical experience, it is believed that the order-ing quality would not be significantly affected, but of course additional research would be required to completely characterize the actual situation. Although this approach will not be pursued any further during the course of this thesis, the application of the modified multiple elimination scheme looks promising and certainly warrants further investigation. This concludes the discussion introducing the deficiency tiebreaking enhance-ment of the quotient graph form of the MDA. As described in this chapter, the tiebreaking scheme helps to remove the arbitrary nature of the node selection process when more than one minimum degree exists in the quotient graph. The next chapter introduces additional secondary tiebreaking strategies considered for application to node selections in which more than one minimum degree, min-imal deficiency candidate is present. 89 C h a p t e r 4 S e c o n d a r y T i e b r e a k i n g o f t h e M D A In its basic form, the MDA choses a node of minimum degree for elimination at each stage of the simulated factorization. In practice many minimum degree nodes exist in the modelling graph of any stage of the process and the choice of a particular node was arbitrary in the algorithm's original form. Chapter 3 explored the incorporation into the MDA of a scheme, based upon the defi-ciency values of nodes, intended to tiebreak the choice between minimum degree candidates and reduce the arbitrary nature of node selections. As previously mentioned, the enhanced algorithm was still not deterministic, as more than one node amongst the minimum degree candidates could have the same minimal deficiency value. Although the deficiency tiebreaking approach of Chapter 3 is the main focus of this thesis, secondary tiebreaking schemes designed to assist in the choice between minimum degree candidates of minimal deficiency are briefly investigated in this chapter. 4.1 Motivation and Goals For secondary tiebreaking schemes to be successful, there must be sufficient op-portunity to apply the additional tiebreaking criteria during the elimination pro-cess. A wide range of different test problems are discussed in Chapter 6. One 90 such problem arises from the discretization, using a nine point molecule, of a partial differential equation on a square grid, with 33 unknowns in each row or column. A standard lexicographic ordering of the grid's vertices is assumed for the corresponding graph of 1089 nodes. When the tiebreaking MDA algorithm of Chapter 3 was applied to this graph, 581 node selections were made during the course of the simulated factorization. (The selection of an uneliminated sup-ernode only counted once.) Of this total, 507 provided an opportunity for the application of a secondary tiebreaking scheme. In other words, in 87% of the node selections a second uneliminated node with identical degree and deficiency values existed in the particular quotient graph. The number of secondary tiebreaking opportunities is often very sensitive to the initial labelling applied to a graph. The node selection process of the tiebreak-ing MDA implementation (see Chapter 5) interacts with the lexicographic or-dering of the nine point problem to produce an unusually high percentage of secondary tiebreaking opportunities. Other initial orderings of the same prob-lem, however, also exhibit a large number of secondary opportunities. When the tiebreaking MDA was applied to five different random orderings of the same problem, in 55-70% of the node selections the deficiency tiebreaking strategy was not able to completely isolate a single node from a group of minimum de-gree candidates for elimination. In these cases an essentially arbitrary selection was made amongst the minimum degree, mimimal deficiency nodes. This general level of secondary tiebreaking opportunities is typical of the majority of problems discussed in Chapter 6. In a typical secondary tiebreaking opportunity identified above, more than two nodes are potential elimination candiates for the tiebreaking MDA. Often a whole block of nodes with the same minimal connectivity value exists within the minimum degree group. In the six versions of the previously discussed nine point problem, there is an average of 11 to 16 nodes in the minimum degree, minimal deficiency block for each node selection with secondary tiebreaking potential. These results are not typical of all test problems by any means. The average block size can vary quite dramatically, depending upon the particular topology 91 of the problem's nonzero structure and its size. However, for all the problems tested, the average block size never went below five nodes and for one 3025 equation problem, went as high as 589 nodes. The previous discussion has certainly established that many opportunities for secondary tiebreaking arise during the application of the tiebreaking MDA to the majority of test problems. In addition, on average large blocks of minimum degree minimal deficiency nodes were observed at each opportunity. Because in the enhanced MDA, node selection from such blocks of candidates is arbitrary, a multitude of different minimum degree, minimal deficiency orderings exist for each problem. A secondary tiebreaking scheme could break the ties using its new criteria and only permit the selection of particular orderings. The presence of secondary tiebreaking opportunities, however, does not necessarily imply that fill values vary significantly between different minimum degree, mimimal deficiency orderings. To gauge the potential gain of a general secondary tiebreaking scheme, once again the fill characteristics of different orderings must be investigated. In an actual implementation of the tiebreaking MDA, the choice between minimum degree, minimal deficiency candidates is not completely random, of course. One particular candidate is always selected, perhaps the first potential candidate in the initial ordering, for example. In a similar fashion to the discus-sion of Section 3.1, to see the possible effects of a secondary tiebreaking scheme, the ordering program can be applied to several random initial orderings of the same problem. Using different initial orderings changes the order in which node selections are made from each block of candidates and several different minimum degree, minimal deficiency orderings can be observed. For demonstrative purposes the nine point problem previously mentioned will be considered once again, although many addition examples can be found in Chapter 6. Table 4.1 summarizes the fill experienced by six minimum degree, minimal deficiency orderings of the nine point problem. The first ordering results from the application of the tiebreaking MDA program, tie of Chapter 5, to a standard lexicographic ordering of the problem, while the five remaining orderings are produced by randomly permuting 92 Ordering MDA Fill 1 14586 2 16510 3 15935 4 15702 5 15533 6 15506 Table 4.1: Fill Fluctuations of Minimum Degree Orderings Produced With De-ficiency Tiebreaking this initial labelling. Although each of the observed fill values is smaller than the best ordering produced by the normal MDA program, considerable variation in the quality of the orderings remains. There would be a very definite advantage to developing a secondary tiebreaking scheme which could consistently produce orderings with the reduced fill level. The central goals of a secondary tiebreaking scheme are essentially the same as those outlined in Section 3.2 for the primary tiebreaking scheme of Chapter3. It is desirable that fill level stability of orderings produced be increased in com-parison to those found using only primary tiebreaking. In addition, since all MDA versions are intended to be fill reducing ordering schemes, it is advanta-geous for the secondary tiebreaking orderings to exhibit a reduced level of fill. Although it is hoped that these goals will be realized for a wide variety of prob-lems, it is especially important for the secondary tiebreaking criteria to improve the quality of orderings for problems upon which the deficiency tiebreaking ex-perienced scheme less success. There are many approaches which could be pursued during the development of a secondary tiebreaking seheme. The strategies outlined in this chapter will attempt to meet the desired goals using strategies which emphasize the impor-tance of increasing the global nature of the complete algorithm. The tiebreaking MDA of Chapter 3 is based completely upon the approximate local minimization of fill at each elimination step. No consideration is made of how the elimination 93 of a particular node affects the uneliminated nodes of the remaining graph. The secondary tiebreaking schemes to be discussed choose at each step the minimum degree, minimal deficiency node at each step, whose elimination most benefits the uneliminated nodes of the remaining graph. The precise interpretation of what is viewed as being the most beneficial results in several different schemes. 4.2 Proposed Secondary Tiebreaking Strategies During the search for an effective secondary tiebreaking stategy, three main schemes were developed for implementation and testing. This section outlines the particular motivation and strategy of each proposal, including a general algorithmic overview of the corresponding enhancement of the tiebreaking MDA. To simplify the discussion, elimination graphs are used to describe the main features of each strategy and the introduction of uneliminated supernodes is considered only when absolutely necessary. 4 .2 .1 S t r a t e g y O n e In the current form of the MDA augmented by deficiency tiebreaking, a node with minimum deficiency amongst the minimum degree candidates is chosen for elimination. By definition, the deficiency set of a node is the set of edges which would be added to the elimination graph upon the node's elimination to make its adjacency set into a clique. A secondary tiebreaking strategy must select between nodes that meet the tiebreaking MDA selection criteria and as a result, whichever node is chosen, the same number of fill edges will be introduced into the subsequent elimination graph. Although a secondary tiebreaking scheme cannot affect the level of fill suffered at a particular elimination step, the introduction of the different sets of fill edges can change the number of additional fills produced during the elimination of the graph's remaining nodes. The first secondary tiebreaking scheme enhances the 94 existing strategy by selecting the minimum degree, minimal deficiency candidate whose set of new fill edges results in the largest increase in overall connectivity of the remaining graph. If the elimination of every candidate lowers the connectivity of the elimination graph, then the node resulting in the smallest decrease is selected. Using this additional criterion in the selection process, it is hoped that the general level of connectivity in each successive elimination graph will be kept higher than if random selections were made. By controlling the connectivity level in this fashion, it may be possible to subsequently select minimum degree nodes with larger connectivities and reduce the amount of fill experienced. The remainder of this subsection concentrates upon how to determine the effect of a candidate's elimination on the graph's overall level of connectivity without explicitly eliminating the node. Connectivity levels are used rather than deficiencies because they are more implementation oriented. As previously shown in Section 3.8, the nodes affected by an elimination can be divided into two distinct categories, as shown in Figure 4.1. The connectivity of both adjacency and satellite nodes may change after the elimination of the indicated node. With regards to this discussion, the important difference between the two groups is that the connectivity value of a satellite node cannot decrease, while the connectivity value of an adjacency node may increase, decrease or or remained unchanged. The contribution of each group to the overall change in connectivity will be considered separately. The change in overall connectivity of the graph, however, will not include the loss of a candidate's contribution, as this value is constant for all candidates of a particular elimination step. The change in total connectivity resulting from the altered connectivity of adjacency nodes is more complex than for satellite nodes and will be considered first. The connectivity level of a particular adjacency node depends upon the number of 3 edge cycles it forms with other uneliminated nodes. When the candidate is eliminated, some existing cycles may be lost and other new cycles may be introduced. An adjacency node's cycles or triangles can be classified according to the category of the other two nodes. They may be satellite nodes, other adjacency nodes or before elimination, the candidate itself. Figure 4.1 95 Kev: Node to be eliminated Adjacency nodes Satellite nodes Figure 4.1: Classification of Elimination Affected Nodes provides an example of each triangle class and can be used for reference during the following discussion. The first class of triangles consists of all three member cycles in which an ad-jacent node is associated with two satellite nodes. All triangles of this type, and hence their contribution to the adjacency node's connectivity, are not affected by the associated candidate's elimination. In addition, no new triangles of this class can be formed by the elimination, because by definition satellites cannot directly participate with any of the new fill edges. As a result, the number of triangles in this class does not affect the overall level of connectivity and further consideration is not warranted. In a second class of triangles, an adjacency node is associated with another ad-jacency node and a satellite node. Once again, the triangles of this class already present in the graph are not affected by the candidate's elimination. However, new triangles of this class can be formed when the elimination step produces a new fill edge between two adjacency nodes, both of which are adjacent to the 96 Figure 4.2: Connectivity and Class Three Triangles same satellite node. For each additional satellite node to which they are both adjacent, another triangle of this type is formed. Since the potential set of new fill edges, NE, can be determined without actually eliminating the candidate, the potential increase in the overall connectivity contributed by adjacency nodes, through this class of triangles, can be determined. Let SAT be the set of satellite nodes for a particular candidate. 2 * ( I n adj{y)) n SAT\) {x,v}eNE The factor of 2 is introduced because each new triangle actually increases the connectivity of both associated adjacency nodes. In the third set of triangles, each adjacency node is associated with the can-didate and one other adjacency node. Obviously, these triangles may only be present before the elimination transformation. As a result, their contribution to the connectivity values of adjacency set members is completely removed upon the candidate's elimination. Fortunately the direct counting of these triangles can be avoided because of their special relationship with the candidate's connec-tivity. Each unit of the candidate's connectivity corresponds to one edge between a pair of adjacency nodes. As demonstrated in Figure 4.2, regardless of how the edges are distributed between the the adjacency nodes, there is a one to one cor-respondence between each edge and a class three triangle. Both candidates have a connectivity of 3 and exhibit a total of 3 class three triangles in their partial 97 graphs. Each of these triangles contributes one unit of connectivity to each of its associated adjacency nodes. As a result, the overall loss of adjacency node con-nectivity experienced, when the candidate's elimination removes all class three triangles, can be simply represented by the following formula. Aconn = —2 * conn(candidate) A fourth and final group of triangles affects the connectivity contributions of adjacency nodes. Within this classification, all three nodes associated with a triangle must be adjacency nodes. When the candidate's degree is less than three, the formation of class four triangles is impossible and need not be con-sidered. However, for all other candidates class four triangles may exist before and certainly after the candidate's elimination. As previously discussed, the elimination of a candidate results in all members of its adjacency set becoming pairwise adjacent in the new elimination graph. This means that each adja-cency node will be adjacent to every possible pair of different adjacency nodes, resulting in the presence of a maximum number of class four triangles. If deg is the candidate's degree, each of the deg adjacency nodes will be associated with l/2(deg — l)(deg — 2) unique class four triangles. Each triangle associated with a particular adjacency node contributes one unit of connectivity towards the node's connectivity value. As a result, the total potential connectivity of adja-cency nodes after the candidate's elimination, attributed to class four triangles, can be simply expressed as l/2(deg)(deg — l)(deg — 2). To determine the anticipated change in connectivity contributed by class four trangles, it is necessary to obtain the number of triangles originally present amongst the candidate's adjacency set members. Unfortunately, this total can-not be obtained simply from the candidate's connectivity value, and a time consuming search of the adjacency set is necessary. (See Chapter 5 for addi-tional discussion of this search.) With each triangle contributing towards the connectivity of three nodes, the total connectivity change attributed to class four triangles can be expressed as follows. Aconn = l/2(deg)deg — l)(deg — 2) 98 —3( original number of class 4 triangles) (4.1) The potential change in the connectivity of satellite nodes, resulting from the elimination of a candidate, is more easily predicted. By definition a satellite node may not be adjacent to the candidate itself. As a result, any connectivity triangles originally associated with a satellite node cannot be affected. Additional contributions to its connectivity are possible, however, if a new edge is formed between a pair of adjacency nodes, both of which are adjacent to the satellite node. These new triangles also affect the connectivity of adjacency nodes, and were identified as the second class of triangles in the previous discussion. To include the potential increase in satellite connectivity, it is simply necessary to change the factor of 2 in the Aconn formula for class two nodes, to a 3. In this fashion, the contribution to the connectivity of each of the triangle's three associated nodes would be recorded, rather than for the two adjacency nodes alone. Combining the results of the previous discussion, the effect of a candidate's elimination on the graph's connectivity can be summarized by a single for-mula. The candidate's degree is signified by deg, C4TRI is meant to repre-sent the current number of class four nodes associated with the candidate and conn(candidate) represents the candidate's current connectivity value. In addi-tion, as before let NE be the set of new fill edges the candidate's elimination would produce and SAT be the set of satellite nodes. Aconntotai(candidate) = l/2(deg)(deg — l)(deg — 2) — 3 * C4TRI —2 * conn(candidate) +3( J2 \{adj{x)nadj{y))nSAT\) (4.2) {x,V}eNE As expected, depending on the balance of the various terms, the total connec-tivity may increase, decrease or remain unchanged. The comparison of Aconntotai for each candiate allows the resolution of the secondary tiebreaking criterion and the selection of an elimination node. In the event of several candidates possessing the same maximal Aconntotai, ties are 99 arbitrarily broken. Throughout the previous discussion uneliminated supernodes have not been considered. They play an important role in the quotient graph model used by the tiebreaking MDA and will certainly affect calculations described above. The secondary tiebreaking scheme described in this section is attempting to reduce the level of fill experienced, by controlling the level of deficiency or connectivity in the graph. The trend in connectivity values can be sufficienctly predicted by considering the supernodes in expanded form, although the resulting Aconntotai will not always predict the actual change in connectivity with absolute accu-racy. In the tiebreaking MDA external connectivities are used for supernodes and these values ignore the internal connectivity of the node. The incentive for using supernodes in their expanded form is that their introduction can be easily handled by the Aconntotai calculations, while treating supernodes in their col-lapsed form would complicate the calculation unnecessarily. If the candidate is a supernode itself, the number of nodes it represents is taken into account when considering the connectivity of the members in its adjacency set. However, it is still considered as a single unit with respect to elimination and the difference in internal connectivity between supernodes is ignored. With respect to the Aconntotai calculations, considering the supernodes in their expanded state simply means that each connectivity triangle must be con-sidered in its expanded form as well. In other words, a triangle between three supernodes actually represents a whole set of connectivity triangles, equal in size to the product of the superlengths of the three supernodes. This change is in-corporated into the Aconntotai formula given in equation 4.2 and reachable sets replace adjacency sets to complete the transformation to a form compatible with the quotient graph model. The symbol ca is used to refer to the candidate and once again Gq is considered to represent the current quotient graph. 100 Hreachai (y, S)) Pi SAT) * superlengtk(x) * superlength(y)) (4-3) No change to the first term of the formula is necessary because the candi-date's degree already takes into account the superlengths of its adjacency set members. When a new class three node is found, C4TRI must be incremented by the product of the triangle's three superlengths. The candidate's connectivity already includes the superlengths of the adjacency nodes but must be multiplied by its own superlength so that the number of class three triangles is correctly identified. Finally, to ensure that the change in the number of class four trian-gles is correctly determined, the size function of the previous chapter and the superlengths of x and y are introduced to the equation's final term. 4.2.2 Strategy Two If a tiebreaking scheme is to have any noticable effect on the quality of order-ings produced, the elimination of each block of minimum degree nodes must be disturbed in some fashion. In the strategy outlined in the previous subsection, among all potential candidates the node chosen for elimination had to have the most positive effect on the connectivity of the remaining graph. As a result, a node could be selected based on the fact that its elimination would increase the connectivity level of a group of higher degree nodes, whose elimination was not imminent. This would not directly affect the nodes in the minimum degree block. The secondary tiebreaking strategy outlined in this subsection attempts to narrow the focus of the previous secondary tiebreaking scheme by directing more attention towards the potential minimum degree candidates themselves. The new secondary tiebreaking scheme is essentially a refinement of the strat-egy discussed in section 4.2.1. Instead of calculating the effect of a candidate's elimination on the connectivity of all remaining uneliminated nodes, only those minimum degree satellites and adjacency set members, whose degree after the elimination will be less than or equal to the candidate's minimum degree, are considered. The minimum degree, miminal deficiency candidate whose elimina-101 tion has the most positive effect on the connectivity of this subset of unelimi-nated nodes, will be selected. Once again, if several candidates could be selected according to this criterion, ties are arbitrarily broken. An alternative form of this secondary tiebreaking scheme was also considered in which the connectivity change associated with each candidate was averaged over the number of mini-mum (or lower) degree nodes contributing to the value. A supernode increments the count by its superlength. It is hoped with the new emphasis of both schemes, that the impact of the new fill edges will be directed at the connectivity of mini-mum degree nodes and that the fill experienced by their subsequent elimination can be reduced. To determine the anticipated change in the connectivity of minimum degree nodes for each candidate, a modified version of the Aconntotai equation of the previous scheme can be used. The calculation may still be based upon the same triangle classification scheme, but the connectivity effects of each triangle will only be considered for those associated nodes whose degree, after the elimination, would be no larger than the candidate's pre-elimination degree. The details of such a modification are straight forward, not requiring any special insight. Each specific change will not be detailed at this point because it would not enhance the current discussion and could cause unnecessary confusion. Finally, a mechanism for determining the degree of each adjacency node after the proposed elimination is required so that minimum degree nodes can be identified. Such an identification scheme forms the basis of the third secondary tiebreaking strategy and will be introduced in the following subsection. 4.2.3 Strategy Three The previously outlined secondary tiebreaking strategies have both concentrated upon how a candidate's elimination would affect the connectivity of the remaining nodes in the graph. It was hoped by controlling the connectivity level, through a secondary tiebreaking strategy, that the fill sufferred upon future eliminations could be reduced. Such a scheme can be thought of as directing its efforts towards 102 the assistance of the primary tiebreaking scheme based upon connectivities. If in general, higher connectivity levels are maintained, perhaps the minimum degree node chosen can have a higher connectivity value. Connectivities, however, only form the basis of the algorithm's tiebreaking scheme. The strategy presented in this subsection attempts to refocus attention on the degree of the graph's remaining nodes, which is of course, the basis of the most dominant feature of the selection process. The tiebreaking MDA is a heuristic for the local minimization of fill, which in turn serves as an approximation to the global minimization of fill. The node se-lection process is restricted to minimum degree candidates at each step, to limit the maximum amount of fill that can be experienced upon the selected node's elimination. The primary tiebreaking strategy performs a local minimization of fill from amongst these potential candidates to complete the selection process. Another possible approach to the secondary tiebreaking problem is to select the minimum degree, minimal deficiency node whose elimination results in the greatest decrease or smallest increase in the degree sum of the graph's remaining nodes. By attempting to control the degrees of uneliminated nodes in this fash-ion, it might be possible to help limit the maximum level of fill experienced in future eliminations. Unfortunately, it can be easily shown that the elimination of nodes with the same degree and connectivity has the same effect upon the over-all degree of the remaining graph, unless their superlengths differ. A secondary tiebreaking scheme based upon the superlengths of candidates was attempted and found to be totally ineffectual. During the application of the MDA to several of the test problems discussed in Chapter 6, the degree values of elimination candidates were traced. In each case the members from blocks of minimum degree nodes are eliminated essentially in sequence, with very few variations in the graph's minimum degree. Occasionally the minimum degree dropped for a few eliminations, but the general trend was to exhaust all the nodes in the current minimum degree block. Once all nodes at a particular minimum degree value have been eliminated, higher degree values nodes, with greater fill potential, must be selected. In general, blocks of higher 103 and higher degree nodes were eliminated until the simulated factorization was complete. Although the effect upon the graph's degree sum can differ very little be-tween minimum degree, minimal deficiency candidates, their elimination can have a substantially different influence upon the distribution of degree within the remaining graph. At each stage of the process, selecting the candidate re-sulting in the largest increase or smallest decrease in the number of minimum degree nodes helps to prolong the elimination of lower degree nodes. In general it is more desirable to eliminate a minimum degree node with its smaller fill producing potential. In addition, by keeping the minimum degree block as large as possible, the primary tiebreaking scheme can be applied more effectively. The elimination of a particular candidate may result in a new minimum degree node with exceptionally high connectivity. By maintaining the minimum degree block in this expanded form, the increased numbers can result in a lower probability that such a node could be overlooked. At each step of the elimination process there will be perhaps more minimum degree nodes in the block, hopefully with a wider range of connectivity values. To summarize, the secondary tiebreaking strategy proposed in this section selects the minimum degree, maximal connectivity candidate whose elimination, amongst all such candidates, results in the most positive change in the size of the minimum degree block of nodes. Minimum degree is intended to refer to the degree shared by all candidates before the elimination. If any member of the selected candidate's adjacency set acquires a smaller degree than this value, they are included in the count. Because the estimated changes in the minimum degree block size for each candidate is made without explicitly eliminating the particular node, it would be impractical to attempt the identification of those adjacency set members which will be grouped into indistinguishable sets. Although existing uneliminated supernodes are considered in their collapsed form by using their external degree, no attempt is made to correct the accounting process to include the effects of supernodes which would be formed after the candidate's elimination. The elimination of a proposed candidate can only affect the degrees of nodes 104 in its adjacency set. In the quotient graph model, this corresponds to the nodes of the candidate's reachable set. By comparing the number of minimum degree nodes in this group before and after the elimination, the change in block size (including any new lower degree nodes) can be determined for each candidate. Let the change in the number of minimum degree nodes AMD be denned in the following manner, with ca representing the particular candidate. The degree of a node x before and after the candidate's proposed elimination is represented by deg(x) and deg'(x) respectively. S represents the set of eliminated nodes excluding the candidate. AMD = \{x | x G reach(ca,S) and (deg'(x) < deg(ca)) }\ — \{x | x G reach(ca,S) and (deg(x) = deg(ca)) }| (4.4) It is obviously not possible to explicitly eliminate each candidate to determine its AMD(ca) value. A mechanism for calculating the new degree of each affected node is needed, which can work with the current quotient graph. The formula presented by equation 4.5 can be used to determine the new degree for each member, x, of the candidate's reachable set. deg'(x) = deg(x) — superlength(ca) + deg(ca) — superlength(x) — ^2 superlength(y) (4.5) y£(reach(z,S)nreach.(ca,S) The basis of x's new degree is its current degree corrected for the loss of the candidate's contribution. In a new quotient graph resulting from the candidate's elimination, x would have all members of reach(ca, S) in its own reachable set, reach(x, Sl){ca}), except itself of course. The effect of this change upon x's new degree value is characterized by the final three terms of the equation. The comparison of AMD values for each proposed candidate allows the res-olution of the secondary tiebreaking scheme and the selection of the elimination node. When only one minimum degree, minimal deficiency candidate exists, its AMD value need not be calculated. However, if several candidates have the same maximal AMD value, ties are once again arbitrarily broken. This concludes the discussion of all tiebreaking strategies developed for the 105 minimum degree algorithm. The following chapter outlines the implementation of the MDA and its enhanced versions, as well as a brief description of the symbolic factorization, numerical factorization, and triangular solution routines developed for testing. 106 Chapter 5 Implementation Considerations A number of programs were developed to test the tiebreaking strategies proposed in the previous chapters of this thesis. This chapter will discuss the implementa-tion of these programs, detailing important design features, data structures and algorithms not previously outlined. Although the implementation of secondary tiebreaking schemes and routines for symbolic factorization, explicit factoriza-tion, and solution of the system are considered, the main focus of this chapter is upon the modules developed for the normal MDA and the MDA with deficiency tiebreaking. All code for these modules was written in Pascal. As a result, all data structures and code fragments included in this chapter are described using Pascal syntax. 5.1 Implementation Overview Essentially four different independent programs were developed for testing the tiebreaking strategies of Chapters 3 and 4. For convenience these modules will simply be referred to as norm, tie, sec and solve in subsequent discussion. The first three modules are implementations of the previously discussed or-dering algorithms. The MDA based upon a quotient graph model, with une-liminated supernodes and an arbitrary selection amongst minimum degree can-didates is encoded in the first module, norm. This program is based upon the algorithm presented in Section 2.7.4, with the external degree modification of 107 Section 2.8. The incomplete degree updating and multiple elimination schemes of Section 2.9 were not incorporated into the implementation: In the second module, tie, the deficiency tiebreaking strategy of Chapter 3 is used to enhance the normal MDA. The implementation of this module is based upon the algo-rithm outlined in Section 3.9. There are, of course, many similarities between the norm and tie modules. To make comparisons of their relative performance more meaningful, whenever possible the two modules were developed in parallel. The final ordering module, sec, actually represents a family of ordering programs, cor-responding to the different secondary tiebreaking strategies investigated. Each program is based upon the tie module with an additional secondary tiebreaking scheme, as outlined in Chapter 4, applied during the node selection process. The final module, solve, is actually composed of two main sub-modules. The first performs the explicit Cholesky factorization of the reordered system, and the second solves the system based upon this triangular decomposition. The two sub-modules are only grouped into a single module for testing convenience and are actually implemented independently. In Section 1.1 the solution method of sparse, positive definite and symmetric matrices was described as having four distinct and independent steps. A complete solution system must include, in some manner, each step in the proper sequence. The previously described modules encompass three of the four steps but no mention was made of symbolic factorization. For the purposes of testing the ordering algorithms, the symbolic factorization was not implemented as a single, independent unit. This step was actually split between the ordering modules and the solve module. If desired, the nonzero structure of the factor L is easily determined as a side effect during the simulated factorization of the ordering module, while L's data structure, necessary for the explicit factorization, is set up by the solve module. The implementation of the solution process could be organized in this fashion because it was known that only MDA orderings were to be considered. The overall goal of the implementation was to get a complete system running, so that the effects of different minimum degree orderings upon the execution times of the explicit factorization and solve steps could be observed. 108 Very little attention was directed towards the symbolic factorization. As a result, although the actual symbolic factorization is reasonably time efficient, additional storage is required beyond that needed for more conventional methods. (See [GL81] for example.) The major theme of this thesis concentrates upon the efficient deficiency tiebreaking of the normal MDA. As a result, most of the chapter's remaining discussion will be of the norm and tie implementations. In Section 5.5, however, the important aspects of the secondary tiebreaking implementations will be out-lined. Finally, Section 5.6 will briefly discuss the algorithms and data structures chosen for the solve module. 5.2 Common Implementation Details of the Norm and Tie Modules As mentioned in the previous section, the norm and tie modules were developed in parallel so that performance comparisons between the two ordering schemes would be more meaningful. As a result, many aspects of the two implementations are very similar. This section discusses the data structures, algorithms and important design decisions common to both modules. 5.2.1 Common Data Structures The close parallelism of the two implementations dictates that the majority of data structures used are common to both modules. The central feature of any MDA version's implementation is the manner in which the quotient graphs of the simulated factorization are stored. Adjacency lists were chosen as the basis of the representation, with the current adjacency set for each active node in the current quotient graph recorded. Each member of an adjacency list is represented by an integer node number, which corresponds to the node's position in the system's original ordering. All node numbers are in the range 1,..., N, where N is the total number of nodes in the original quotient graph of the system. A single 109 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 index a d j l i s t 3 4 5 7 4 6 1 M 2 1 6 7 2 5 1 5 ist of pointers into adjlist nodenumber Figure 5.1: Adjacency Lists integer array, adjlist , is used as a communal storage area for all adjacency lists. Initially the adjacency sets for the nodes of the original quotient graph are listed one after the other in contiguous blocks of the array. The blocks are arranged sequentially according to the order dictated by the graph's original labelling. During the elimination process, however, no effort is made to maintain any particular order amongst the members of a particular node's adjacency list. As shown in Figure 5.1, a pointer into the adjl ist is stored separately for each node, marking the beginning of its particular adjacency list. The end of each adjacency set can be found using the adjacency pointer which indexes the beginning of the following list. For an elimination graph modelling of the factorization process, no upper bound can be placed on the size of a node's adjacency list and as a result the storage scheme described above would be impractical. Quotient graphs were used as the modelling basis of all MDA implementations, however. Equations 2.15 and 2.14 guarantee that the space originally allocated to each particular node will meet its storage requirements for the entire elimination process. As mentioned in 110 Section 2.6.5, to meet the storage needs of an eliminated supernode's adjacency set, it is often necessary to combine the storage of all coalesced members. Further discussion outlining the specific implementation details of such an amalgamation are included in Section 5.2.3. The list of adjacency pointers into adjlist is not stored in its own integer array. Instead an adjacency pointer is but one field of a record base type used to declare a central data structure storing information on each particular node of the original graph. The nodelist array, indexed by node numbers in the range 1,.. .,N, is a collection of records of the following type. nodeinfotype = record ignore: boolean; degree: integer; superlink: integer; superlength: integer; adjptr: integer; end; Three additional fields, which are included in each record of the tie module's nodelist array, will be discussed in Section 5.4. Each of the common record fields mentioned above, however, is briefly described in the following list. ignore: When true this field indicates that the node is no longer an active member of the current quotient graph. degree: This field records the value of the node's degree in the current quotient graph. superlink: This field is used to link up the node with the next member of an uneliminated supernode by recording its node number. A superlink value of 0 is used to mark the end of a list of supernode members or to indicate that a node is not involved in an uneliminated supernode. I l l superlength: When a node is the representative of an uneliminated supernode, this field records the number of members making up the indistinguishable set. A value of 1 is recorded for an uneliminated node which is not a member of a supernode. adjptr: This field is an integer indicating the start of the node's adjacency list within the adjlist array. To record the new permutation produced by the MDA version, both imple-mentations use an integer array of length N to record the order in which the nodes of the original quotient graph are eliminated. Finally, the set of eliminated nodes, previous referred to as the set S, is internally represented as a boolean array, elimset, indexed by the set of node numbers 1 through N. This rep-resentation was selected because a query of a structure defined using Pascal's builtin set declaration was found to often be very inefficient in comparison to accessing a boolean array. At the start of the elimination process all N values of elimset are initialized to false and as nodes are eliminated the appropriate entry is changed to true. When a supernode is eliminated, all members of the indistinguishable set are marked as eliminated and each member is added to the array recording the new permutation. It should be emphasized that a node's elimination status, as recorded by elimset, is not equivalent to its ignore sta-tus. At any stage after the initial elimination, both eliminated and uneliminated nodes may have their ignore field set to true. 5.2.2 Successive Quotient Graph Formation The descriptions of the quotient graph MDA and the tiebreaking MDA in Chap-ters 2 and 3 respectively, did not present a formal algorithm detailing the creation of sucessive quotient graphs during the elimination process. In each version of the algorithm, the new quotient graphs formed during an elimination step were shown as being created from the quotient graph of the original system and the new component partitioning, C(S). As previously discussed, this method is impractical and each successive quotient graph can be created by a simple trans-112 formation applied to its predecessor. The following algorithm serves as the basis of successive quotient graph formation in all ordering algorithms implemented. The quotient graph G] is formed from its predecessor, G]^, upon the elimination of node t. <S,_i is intended to represent the set of eliminated nodes after i — 1 steps of the process. Finally, the function expand(t) once again returns the set of nodes, including t, which have been previously coalesced to form a supernode represented by t. (The algorithm presented is an adapted form of the algorithm presented in [GL80].) 1. NewSN := adja^t) n C(S,_i) - The set of nodes with which t will be combined to form a single eliminated supernode. 2. R\= reachGq_i{t,C{Si_l)) - The set of uneliminated nodes whose adjacency set may be altered by i's elimination. 3. t := expand(t) U NewSN - Create the new eliminated supernode by coalescing t with all adjacent eliminated nodes. Represent the new supernode, t, by t in the current quotient graph. 4. C(Si) := (C(5,-_i) - NewSN) U {t} - Update the component partitioning. 5. adjGi(7) := R - Now that all members of NewSN have been combined with t to form a single eliminated supernode, all previously reachable nodes are now adjacent to t. 6. For each y G R adjG*.{y) := {t} U adj&.^y) - {NewSN U {*}) 113 - Update the adjacency sets of all affected members. All references to NewSN members are removed because t now represents the newly coalesced supernode. In the actual implementation, no additional data structures are introduced to represent the current component partitioning, C(5,). This set is simply an abstraction used in the algorithm's presentation to represent the set of eliminated nodes and supernodes which remain active in the current quotient graph. All members of C(Si) are recorded as eliminated by elimset and have their ignore flag set to false. As a result, to perform the intersection of step 1, the adjacency set of t is scanned looking for nodes with an elimset entry assigned true and an ignore value of false. After the reachable set of t is saved in step 2 of the algorithm, the following three steps constitute the actual formation of the new eliminated supernode. The previous MDA step, S :— SDexpand(t), has already marked all nodes represented by t as eliminated, by setting their elimset entry to true. The updating of the component partitioning is completed by setting the ignore flag of each NewSN member to true. The ignore flag of t remains false because it will represent the new supernode, as indicated by step 3. Finally, in step 4, the adjacency set of the representative, t, is updated to reflect the absorption of all eliminated nodes or supernodes previously adjacent. The final step of the transformation process describes the updating of adja-cency sets for all nodes, y, which were in i's reachable set before the elimination step. It must be ensured that t is referenced in each y's adjacency set. In ad-dition, all references to other members of the supernode must be removed from each adj(y), so that the nodes of NewSN are not included in any active node's adjacency list. In an actual implementation of this algorithm, the NewSN mem-bers may be either explicitly or implicitly removed. If the nodes are to be physically removed from the affected lists, each node's adjacency set is scanned from left to right for members of NewSN U {t}. The first node number of this set which is identified is replaced by t. During the 114 remainder of the search, when a member of NewSN U {t} is found, all remaining nodes in the list are shifted left by one position in the data structure to effect the node's removal. If after such an updating process, a node's adjacency set no longer requires the complete block of space, a 0 is used to mark the end of the current list. The motivation for using this explicit scheme is to avoid the build up of inactive nodes, which might downgrade the efficiency of subsequent adjacency list traversals. The alternative implementation approach is to implicitly remove the NewSN members by simply setting their ignore flags to true. Throughout the remain-der of the elimination process, all nodes with true ignore fields will be passed over during adjacency list traversals. No further changes are necessary for those nodes, y, which were already adjacent to t prior to the elimination. The adja-cency lists of each remaining node of the affected group, however, is traversed until the first ignored node is located, replacing it with Vs node number. In this fashion much of the costly updating required by the first scheme is avoided, although the performance improvement will be at least partially negated by less efficient adjacency set traversals. Both schemes were implemented for the tie and norm modules and tested on a wide selection of the test problems discussed in Chapter 6. In all cases, the programs using the implicit scheme were slightly more efficient, exhibiting savings of 3 to 6% in execution time. As a result, the implicit adjacency set updating scheme was chosen for all implementations. A small example in Figure 5.2 illustrates the use of the quotient graph trans-formation algorithm. The six steps of the algorithm are illustrated when node 4 is eliminated. (MDA node selection criteria were disregarded for the example.) The effects of the transformation upon the adjacency lists are followed. Node numbers are not shown in the adjacency sets of 1, 2 or 3 after the elimination because the storage for these nodes is no longer required. (Section 5.2.3 will discuss the storage requirement of eliminated supernodes in detail.) 115 Adjl ist 1 |4 |8 |9 2 4 5 3 14 17 [8 4 1 3 5 5 2 4 6 7 6 [ 5 J 7 L L L L 8 [TT3~ 9|T] 1. NewSN:= {1,2,3} 2. R:= {5,7,8,9} 3. {1,2,3,4} will be coalesced to form a single supernode represented by 4. 4. C(S, ):= ({1,2,3} - {1,2,3}) U {4} 5. adj(4):= R 6. adj(8):= {4} adj(9):= {4} adj(5):= {4,6,7} adj(7):= {4,5} 1[ 2 r 3_ 4 5 7 8 9 5 4 |6 [7 } Unused Kev: altered 8 g g | ^ o r e 6 9 m Figure 5.2: Quotient Graph Transformation Example 116 5.2.3 Implementation of Eliminated and Uneliminated Supernodes In the quotient graph form of the MDA with indistinguishable nodes, both elim-inated and uneliminated supernodes play an important role in the elimination process. Although the two supernode classifications share many common char-acteristics, the implementation of the two abstract structures differ in many respects. Table 5.1 compares the significant implementation issues of the two supernode forms. The as yet undiscussed details of supernode implementation are considered in the remainder of this section. Eliminated Supernode Adjacency Lists Figure 5.2 depicts the formation of a supernode for which only the representa-tive's block of adjacency list storage is required to record the new supernode's adjacency set. This characteristic is not common to all eliminated supernodes. Equation2.15 only guarantees that sufficient storage will be available when stor-age of all coalesced members is considered. In practice, it is common for an eliminated supernode to require more storage than the block contributed by the representative alone. In Section 2.6.5 it was shown that there was sufficient stor-age for the adjacency set as well as for a set of pointers which could be used to unify the separate blocks into a single storage unit. In the actual MDA im-plementations, the adjacency set of the newly formed eliminated supernode is copied into the available storage, beginning with the representative's block. If the last location of this storage unit is encountered and more than one node number remains to be stored, the negated node number of another member of the newly formed supernode is placed in this position. The storage of the sup-ernode's adjacency set then continues with this new block of storage. As long as more storage is required, additional units of adjlist, belonging to other in-distinguishable set members, are linked up to the unified structure. If the last member of the adjacency set is placed in a block's final storage location, the pro-cess is complete. Otherwise a 0 is placed in the next storage location to signify 117 Eliminated Supernodes Uneliminated Supernodes Can require extra adjacency storage beyond that of the representative. Identification is straight for-ward and complete. New nonrepresentative mem-bers of the supernode are implicitly removed from the quotient graph but some ad-ditional adjacency set updat-ing is required for affected lists. Traversal of its adjacency list may require accessing more than one block of the adja-cency list storage. Non-representative members play no further part in the elimination process. • Requires only the adjacency storage originally allocated to the representative. • Identification is difficult and may be incomplete • Formation of a supernode from an identified indistin-guishable set only requires the setting of nonrepresenta-tive ignore fields to true. • Traversal of its adjacency list only requires searching one adjacency list block. • Non-representative members continue to play a role in the elimination process until eliminated themselves. Table 5.1: Supernode Comparison 118 the end of the adjacency list. No attempt is made to record the node numbers of coalesced members whose adjacency list storage is not required by the eliminated supernode's adjacency set. The storage of these nodes will lie dormant for the remainder of the elimina-tion process. Once the new quotient graph has been formed, each uneliminated supernode can be treated as a single eliminated node with a possibly expanded adjacency list storage structure. Equation 2.15 can be reapplied to this new quotient graph, as though the supernode's adjacency list were stored in a single contiguous block of adjlist. As a result the unused blocks of storage will never be required again. When uneliminated supernodes are formed, the set of eliminated nodes to be combined into the new supernode may include previously coalesced, unelimi-nated supernodes. Special care must be taken that all storage blocks are made available to the new supernode and not just the first storage block used by each absorbed member. In both the norm and tie modules all storage blocks used by an absorbed supernode are considered before calling upon the storage of the next coalesced member. Figure 5.3 demonstrates the storage of adjacency lists for eliminated supern-odes as described above. The nodes of this simple graph are considered elim-inated following the sequence suggested by their original labelling. Node 1 is already considered eliminated in the first graph illustrated. All changes to the adjacency list entries are shown in hollow type and ignored nodes are indicated by crossing out the appropriate storage location. Formation of Uneliminated Supernodes Unlike their eliminated cousins, the identification of a group of nodes which can be coalesced to form an uneliminated supernode is very difficult. Section 2.7.3 outlined a strategy used by both the norm and tie modules after each elimination step to identify indistinguishable sets amongst the nodes adjacent to the newly formed eliminated supernode. (The term "eliminated supernode" may also refer 119 1 |2 |3 |4 |6 2 Emu 4 rr[2iT 5 [4(6 7 IT 6 7 0 3 4-1 5 E E 6 [Ffr 7 IT Key: altered ignored 4 6 7 I K E 7 3 Unused Figure 5.3: Uneliminated Supernode Adjacency List Storage Example 120 to a single eliminated node.) Let t and Q represent the newest supernode and any other eliminated supernode of the current quotient graph, Gq, respectively. It was shown that a subset, with more than one member, of adjaq{t) nadJGi(Q) was indistinguishable if for each of its members, y, adjaq(y) C adja<i(t) U adJGi[Q) U {t,Q}. All sets of indistinguishable nodes amongst the adjacency set of t can be found by applying the rule to each possible Q for which the intersection is nonempty. The following algorithm outlines the organization of such a search and the formation of uneliminated supernodes from identified reachable sets. It is assumed that the set of eliminated nodes (or supernodes), snlist, adjacent to at least one member of adjai(t) has been previously identified (t S' snlist). In the tie module this is carried out during the connectivity updating of satellite nodes, while a special search is required in the norm module. The set merge is assumed to represent the set of candidates for an indistinguishable set. 1. For each Q G snlist (a) merge := adjGi{t) n adjG*(Q) - Identify all potential members of the indistinguishable set. (b) For each y G merge if adjGi{y) <f. adjGi(t) U adjGi{Q) U {Q,t} merge := merge — {y} - Remove all nodes which are not indistinguishable. (c) If \merge\ > 1 y := merge - Select a representative from the indistinguishable set and form the new uneliminated supernode. (d) deg(y) \= size(reach,G<i(y)) - Update the external degree of the new supernode. Central to the efficient implementation of this algorithm is the manner in which the intersections of step la are performed. The adjacency lists of t and Q 121 may not be sorted, eliminating the possibility of using a simultaneous scan of both sets to compare membership. Instead, before the search for indistinguishable sets begins, all active members of adjai(t) are marked by negating a selected field of their nodelist record. For the sake of this discussion, consider it to be their degree field. (N is subtracted from each value so that the degree values may be recovered.) The intersection merge can be produced for each Q by scanning the set adjGq(Q), looking for active nodes with negative degree values. The set merge is stored in a local integer array. In the second step of the algorithm each member of the intersection must have its adjacency set compared to adjGi{t) U adjaq(Q) U {Q,t}. To facilitate this comparison, during the formation of merge in the previous step, each active member of adjaq(Q) not included in the intersection also has its degree value negated. Once the intersection has been established, this results in the complete identification of the set adjai{t) U adja<i{Q). A single traversal of y's adjacency set will decide whether the subset relationship is satisfied. A node can be re-moved from merge by negating the recorded value of its node number recorded in the local integer array. The original flagging of adjaq(t) can be recovered by first adding N to all degree values of adjai(t) members. If the representative and all members of the merge array not included in the indistinguishable set (identified by negative node numbers) are then reflagged, the preparation for the next intersection is complete. Once all Q G snlist have been considered, the set adjGq{t) is traversed and the actual degree values of all active members recovered. In the third step of the algorithm, if the indistinguishable set consists of more than one member, the group is coalesced to form a single uneliminated supernode. The first positive node number found in merge is selected as the new supernode's representative. The adjacency sets of the representative and all adjacency sets of nodes adjacent to at least one member of the indistinguishable set must be updated to effect the quotient graph's transformation. In a similar fashion to the eliminated supernode formation, this is accomplished by the implicit removal of all absorbed indistinguishable set members. Each non-representative member 122 of the new supernode has its ignore field set to true. However, unlike the eliminated supernode case, no additional explicit updating of adjacency sets is required. At first glance this limited updating may seem inadequate. In many cases, an eliminated or uneliminated node might be adjacent to a member of the indis-tinguishable set, but not to the representative itself. After the ignore field up-dating, this adjacency association would be lost. As pointed out in Section 2.7.2, however, only the reachable sets of uneliminated nodes must be accurate to allow for the proper execution of the MDA. Since the neighborhood sets of all indis-tinguishable members are equivalent, the representative's reachable set correctly represents the new supernode's association with other uneliminated nodes. In addition, all uneliminated nodes with an indistinguishable member in its reach-able set prior to the supernode formation must also, by the definition of indis-tinguishable nodes, be able to reach the representative. After the updating is performed, the representative will remain a member of the node's reachable set and represent all coalesced members. To clarify this important issue an additional example is presented in Fig-ure 5.4. Nodes {x,y,z} form an indistinguishable set. If x is chosen as the representative and y and z are marked as ignored, the modified quotient graph can be illustrated as shown by G9', with nodes y, z and all associated edges removed. With x considered as representing the entire indistinguishable set, the reachable sets of all uneliminated nodes are unchanged by the modification. However, to ensure that the membership of the indistinguishable set is not lost, the absorbed nodes x and y must be recorded in some fashion. When an eliminated supernode is formed the absorbed nodes play no fur-ther part in the elimination process. For uneliminated supernodes, however, the membership of the group is necessary for the correct calculation of the degrees of nodes in the supernode's reachable set and so that all members can be recorded in the new permutation, when the supernode is selected for elimination. The size of a uneliminated supernode and its membership are recorded using the su-123 124 perlength and superlink fields of the representative. The superlength of a normal node is one, while superlength of a supernode representative is in-cremented by the superlength of each absorbed node. In this fashion the total number of nodes in the coalesced set includes the additional members represented by an absorbed supernode. The superlink field of each member of the new supernode is used to link up the nodes into a linked list. The representative is the head of the list, and each member stores the node number of the next member of the list in its superlink field. The superlink value of the last member is set to 0, to signify the end of the list. If a supernode is absorbed into the new supernode, its entire linked list of members is added to the linked list of the new supernode. At this point it should be reemphasized that, unlike eliminated supernodes, only the representative's adjacency set is required by an uneliminated supernode during the remainder of the elimination process. The adjacency list storage blocks of all absorbed nodes lie dormant, and are not even required when the supernode is eliminated. Once the supernode has been formed, its degree is updated before starting the search for another indistinguishable set using a vew member of snlist. The function size introduced in earlier chapters totals the superlengths of all members of the reachable set. The external degree is used for all supernodes. Adjacency List Traversal The traversal of both eliminated and uneliminated nodes is central to a suc-cessful implementation of the normal or tiebreaking MDA. In both norm and tie no distinction is made between single nodes and supernodes with respect to traversal. A single uneliminated node is considered a supernode consisting of only one member, while a single eliminated node is considered an eliminated supernode using only one adjacency storage block. However, the more complex storage requirements of an uneliminated supernode results in a different traver-sal algorithm being necessary for each of the two supernode classifications. The 125 following two program fragments compare the basic skeletons of two traversal algorithms. Uneliminated Supernode Traversal i:= nodelist[unelimnode].adjptr while (i < nodelist[unelimnode + 1].adjptr) do if not (nodelist[unelimnode].ignore) then {*adjlist[i] is an active node adjacent to unelimnode*} i:= i + 1; end; {*while*} Eliminated Supernode Traversal i:= nodelist[elimnode].adjptr marker:= nodelist[elimnode + 1].adjptr while (adjlist[i] <> 0) and (i < marker) do if adjlist[i] < then begin {Switch scan to next block of storage. *} marker:= abs(nodelist[abs(adjlist[i] ) + 1].adjptr); i:= nodelist[abs(adjlist[i])].adjptr; end; {*if**} else begin i f not (nodelist[elimnode].ignore) then {*adjlist[i] is an active node adjacent to elimnode*} i:= i + 1; end; {*else*} end; {*while*} 126 5.2.4 Implementation of Reachable Sets The formation of reachable sets is an important aspect of any MDA version. The following pseudocode outline describes the basic algorithm forming the basis of reachable set routines in both the norm and tie modules. The reachable set of node x is constructed using the current quotient graph Gq. 1. reachGq(x) := 0 2. for each y € adjGq{x) if (y is eliminated) and (y ^ x) then reachGq(x) := reachGq(x) U adjGq(y) else {* an uneliminated node*} reachGq(x) := reachGq(x) U {y} The basis of the reachable set formation is a traversal of x's current adjacency list. Each active node (or supernode) is considered separately. If an eliminated node is encountered, its adjacency set is traversed and each active node added to the reachable set, before considering the next node in adjGq(x). Quotient graphs ensure that all active members of an uneliminated node's adjacency set must be eliminated. If the member of adjGq(x) happens to be active and uneliminated, then it is simply added to the growing reachable set. During the traversal of the various adjacency sets, it is very possible that the same node could be added to the reachable set more than once. If node duplication was permitted in reachable sets, the efficiency of many areas of the algorithm could be significantly impaired. Most importantly, calculating a node's degree would be very difficult. Several different implementation modifications of the basic algorithm described above were considered to avoid such redundancies. The most efficient alternative was identified to be setting the ignore flag of each new node added to the reachable set to true. In this fashion, if the same node was encountered in a second traversal, it would no longer be eligible for addition to the reachable set. The degree of node x can then be found by summing the superlengths of the nodes added to the reachable set without worry about 127 duplication. Once the reachable set is complete, a single traversal of its members is used to reset the altered ignore field values. No attempt is made in either implementation to sort the reachable set entries by node number. In an early implementation it was found that the costs of producing the sorted reachable sets far outweighed the advantages which sorted lists can bring to calculating intersections. The final reachable set implementation issue to be considered is the manner in which the list of node numbers representing a reachable set should be stored. In early versions of both the norm and tie modules, Pascal pointer types and linked lists were used to allow the dynamic allocation of storage for reachable sets. Each reachable set was represented by a single linked list, with each independently al-located unit of the linked list storing one node number. Unfortunately, it was observed that a very substantial proportion of the ordering times for both pro-grams was associated with the bookkeeping necessary for the dynamic memory management. The effort required to dereference pointers was reasonable, but the CPU requirements for allocating and freeing linked list nodes, using the Pas-cal functions new and dispose, on the SUN 3/501 used for testing, were totally unacceptable. Both modules were modified to remove the use of pointers in reachable set storage. The norm module currently uses a local integer array of length N to temporarily store a reachable set while it is required by the elimination process. In the tie module a more complex data structure, composed of two larger in-teger arrays, is used to meet the longer term and more flexible reachable set storage required by the enhanced algorithm. This scheme will be fully detailed in Section 5.4.4, but essentially linked lists are mimicked using integer pointers, allowing the dynamic storage of a large number of reachable sets in the single structure. Although the reachable set data structures were totally restructured for both modules, relatively few changes were actually required in each pro-gram's code. Substantial savings, however, were observed in the execution times for both ordering modules on a wide selection of the test problems discussed in 1 SUN 3/50 is a trademark of SUN Microsystems, Inc. 128 Chapter 6. Improvements ranging between 50 and 60% were observed for the norm module, while for the tie module the savings varied between 30 and 40%. After this testing was conducted, several modifications were made to these ear-lier versions of the tie and norm modules. Although these changes may slightly alter the range of improvements observed, substantial savings are most definitely realized by the removal of true pointers and linked lists. It should be noted that this data structure comparison is also very dependent upon the particular system and compiler used. 5.2.5 Lower Adjacency Structure Formation As mentioned in Section 5.1, the symbolic factorization step was not implemented as a separate unit. The ordering modules norm, tie and sec are responsible for providing the factor's nonzero structure to the solve program, which sets up the necessary data structures for the explicit factorization. As outlined during the discussion of the elimination graph model in Sec-tion 2.4.1, the adjacency set of a node just prior to its elimination represents the nonzero structure of the factor's corresponding column. In quotient graph terms this corresponds to a node's reachable set. If these reachable sets are recorded as each node of the system is eliminated, the lower adjacency structure of the filled matrix, F, is obtained. This structure differs from a normal adjacency structure in that each node's adjacency list includes only nodes which follow it in the new permutation. The nonzero structure of the permuted system's factor, L, can be obtained directly from this condensed structure. When a supernode is eliminated or is present in a node's reachable set, spe-cial care must be taken to ensure that all nodes are recorded. Each supernode in the selected node's reachable set must be expanded so that all members of the indistinguishable set, which would be adjacent to the node in the correspond-ing elimination graph, are included. If the selected node is itself a supernode representative, the remainder of the supernode's members must also be included in the expanded reachable set. In addition, when a supernode representative is 129 eliminated, the other members of the coalesced set are also eliminated according to the order in which they appear in the representative's superline list. Each of their reachable sets is identical to the representative's reachable set except that those indistinguishable nodes ordered previously are excluded. The lower adjacency structure is not stored by the ordering programs. As each node is eliminated, its reachable set is expanded and sorted using a quicksort routine. (The sets must be sorted for the solve program.) The node's node number and sorted reachable set are then output, with the intention they be passed to the solve program using a Unix pipe. The order in which the adjacency information for the system's set of nodes is printed is also used by the solve program to recover the new permutation. 5.3 Distinct Features of the Norm Module The main differences between the norm and tie modules originate from the man-ner in which each algorithm chooses a minimum degree node for elimination. This section introduces the remaining data structures of the norm module, detailing their use in the unique node selection process of this ordering program. The im-plementation detail of the selection process is essential for a clear understanding of the precise manner in which the MDA's arbitrary choice between minimum degree nodes is resolved. The selection scheme closely follows the method used by George and Liu [GL81]. Central to the selection process of the norm module are the two arrays perm and nodelocation. Each is an integer array indexed by integers in the sub-range 1 to N. During the elimination process the first data structure stores the set of node numbers {1,... ,iV}, recording both the order in which nodes have been eliminated and the set of nodes yet to be selected. The second array, node-location, records the position of each of the system's N nodes within the perm array. Both arrays are initialized with the sequence of integers 1,2,..., TY, so that the original ordering of the system's matrix is recorded when the simulated factorization begins. 130 New permutation Uneliminated nodes perm: nodelocation: -4 1 2 3 4 5 • 6 7 8 9 10 6 1 5 7 2 3 4 8 9 10 1 2 3 4 5 6 7 8 9 10 2 5 6 7 3 1 4 8 9 10 index node numbers node number position in perm Figure 5.5: Node Selection in the norm Module Three integer variables lowest, nextlowest and stscan are used in con-junction with the perm and nodelocation arrays to control the node selection process. The smallest degree of any uneliminated node in the current quotient graph is recorded by lowest. During the identification of each node's degree at the start of the program, lowest is initialized with the smallest degree value encountered. As nodes are eliminated during the course of the simulated factor-ization, the new degree values calculated are used to update lowest. As the elimination process proceeds and nodes of lowest degree are selected for elimination, the array perm becomes divided into two regions. The first region records the eliminated nodes in order of their selection. The second region records the set of nodes which remain uneliminated. As a small example, Figure 5.5 illustrates the entries of the perm and nodelocation arrays after 5 nodes have been eliminated from a quotient graph originally of 10 nodes. When a node of lowest degree is to be selected for elimination, a sequential search of perm begins at the uneliminated node referenced by stscan. When no nodes have been eliminated, stscan is initialized to position 1. However, to avoid the inefficiences of researching portions of perm during subsequent node selections, when a node of lowest degree is selected, stscan records its position in perm. This value may then be used as a starting point for the next search for a minimum degree node. The first node of lowest degree encountered is selected for elimination. This node trades positions with the first entry of the uneliminated set, increasing the length of the eliminated region by one. Appropriate updating of the nodeloca-131 tion array accompanies the exchange of values. Except for one special case, to be subsequently discussed, the third integer variable nextlowest is used to record the next smallest degree value encountered during node selection searches of the perm array. At the beginning of the program nextlowest is assigned the value N — 1. During subsequent node selection searches, nextlowest is updated by the degree of all unsuccessful candidates so that it contains the lowest degree of all uneliminated nodes in the portion of perm already searched. If a node selection search encounters the end of the perm array without locating a lowest degree node, the search is reinitialized by setting stscan to the first position of perm's second region and lowest to the value stored by nextlowest. Nextlowest is assigned the value N — 1 and the search continues with the new threshold value contained by lowest. When a node is selected by this process, its elimination affects the degrees of all nodes in its reachable set. The lowest and nextlowest variables must be updated with these new degree values. In certain cases stscan is modified as well. The updating process for a node x with a new degree deg is described in the following code fragment. i f (deg < nextlowest) then nextlowest:= deg; i f (deg <= lowest) then begin nextlowest:= lowest lowest:3 deg; stscan:- nodelocation[x] end; If one of the affected node's degrees is less than or equal to lowest, stscan is updated with the node's location in perm so that the next node selection search can immediately identify a lowest degree node. When the new degree is equal to lowest, nextlowest will be assigned the same value as lowest because the stscan reassignment may have skipped over some lowest degree nodes. The 132 next reinitialized scan of perm must check for these nodes using the same value for lowest. The nodelocation array is also required when a supernode is selected for elimination. All members of the supernode must be recorded in the sequence of eliminated nodes, not just the representative. The nodelocation array provides the positions of these coalesced members and avoids additional searches of perm's second region. The supernode list of the representative is traversed and a node number switch conducted for each indistinguishable member. 5.4 Distinct Features of the Tie Module The introduction of connectivity values to tiebreak between minimum degree elimination candidates results in a substantial increase in the complexity of the enhanced MDA's implementation. This section describes the unique features of the tie module and then outlines a modification of the tiebreaking MDA presented in Section 3.9 which dramatically increases the program's efficiency. 5.4.1 Connectivity Calculation and Updating The original formulation of the primary tiebreaking scheme used deficiency val-ues to assist in the node selection process. However, all elimination candidates have the same degree, allowing the use of maximal connectivity values in place of minimal deficiency, which reduces the number of arithmetic operations required. In addition, the algorithm for connectivity calculation, presented in Section 3.7.1, avoids double counting of connectivity by halving the connectivity total in its final step. In the actual tie implementation this step is ignored, so that unnec-essary divisions operations were avoided. All connectivity values considered are twice their defined value. New connectivity values are calculated using a routine based upon the algo-rithm of Section 3.7.1 which was modified as outlined above. The single most important implementation feature of the calculation process is the intersection 133 of reachable sets. The approach adopted in this implementation is to flag all active members of the reachable set of the node x, whose connectivity is to be calculated. In this case the flagging is achieved by subtracting N from each node's superlink field to negate the stored node numbers. Each of the inter-section values can then be calculated by scanning the reachable set of each active member in reach(x, S) and summing the superlengths of all nodes with negative superlink values. Upon the completion of all traversals, the superlink values of reach(x, S) members are restored to their original values. The updating of the connectivity values for satellite nodes proceeds as out-lined in Section 3.8. This algorithm also requires the calculation of the size of an intersection between two reachable sets. In this case, however, it is the difference set which is flagged with negated superlink fields, and the connectivity of each satellite is corrected using a single traversal of its reachable set. The difference set itself is constructed using a similar scheme. If the difference set rcha — rchb is desired, all active nodes of rchb are flagged and the difference set constructed of those active nodes of rcha without a negative superlink value. 5.4.2 Connectivity Lists As first discussed in Section 3.6, it would be impractical to maintain an updated connectivity value for each uneliminated, active node throughout the elimination process. The alternative proposed was to keep the connectivity values up to date for a smaller subset of uneliminated nodes. This subset is guaranteed to always include the complete set of minimum degree candidates. To assist in the node selection process, this subset of nodes is grouped together into a central connec-tivity list, sorted according to decreasing connectivity values. It is important to reemphasize that after a node is eliminated, connectivity updating, as discussed in Section 3.8, is only conducted for nodes which will remain members of the connectivity list. The connectivity list was not implemented as a completely separate data structure and is actually contained within the confines of an enhanced nodelist. 134 The basic data structure defined in Section 5.2.1 is augmented by three additional fields in each node's record. connectivity: integer; conlink, prevcon: integer; The connectivity field of each record is used to store the connectivity value of the particular node when it is a member of the connectivity list. A value of zero is recorded for all those nodes not included in the list and for all nonrepresentative members of an uneliminated supernode. (Recall that at most a one connectivity value is maintained for each supernode.) The second field, conlink, is used to link up the members of the current connectivity list. Each list member stores the node number of the following entry, with the last node storing a 0 to flag the end of the list. All other nodes, including nonrepresentative supernode members, are flagged by a conlink value of -1. The third field, prevcon, stores the node number of the previous node in the connectivity list. The node at the head of the connectivity list stores the value 0 in this field. If the connectivity list is not implemented as a doubly linked list in this fashion, the deletion of nodes from the connectivity list becomes very inefficient, when the ordering program is applied to very large graphs. For example, if tie with singly linked connectivity lists is applied to the 10,000 node five point problem of Chapter 6, an increase in execution time of approximately 30% is observed. As the elimination process proceeds, degree and connectivity values change, and as a result some nodes are added to the connectivity list while others may need to be removed. To further increase the efficiency of the deletion and inser-tion operations required by these changes, an index for the connectivity list was introduced. The index consists of a linked list of index nodes, with one node recording the position and length of each block of the connectivity list with a common connectivity value. Only one node with a particular connectivity value may exist in the connectivity list, but it is more common for several nodes to make up each block as discussed in Chapter 4. Each index node is represented by a record of the following type. 135 indexnode = record listlength: integer; convalue: integer; subhead, subtail: integer; next: indexlinker; end; The indexlinker type is a previously declared pointer to an indexnode. The fields of this record type can be briefly described as follows. listlength: This field records the number of nodes in the constant connectivity block of the connectivity list. convalue: The connectivity value of the particular group of nodes referred to by the node. subhead,subtail: These two fields record the node number of the first and last nodes in the portion of the connectivity list with convalue. next: This field is a pointer used to link up the nodes of the index. The index nodes are sorted within the linked list of the index by decreasing convalue. In this fashion, the index lies above the connectivity list providing relatively quick access to any connectivity region of the connectivity list. If the last node with a particular connectivity is removed from the connectivity list, the corresponding node of the index is also removed. In a similar fashion, when a node with a new connectivity value is added, a new index node is also introduced. Otherwise insertions into and deletions from the connectivity list are accompanied by appropriate updating of an existing node of the index. When a node is inserted into the connectivity list and a new index member is not required, it is added to the connectivity sublist after the current subhead node, to avoid excessive updating of the subhead field of index nodes. A variable of type indexlinker stores the head of the index's linked list. 136 The performance of the tiebreaking MDA implementation is very sensitive to the precise manner in which membership of the connectivity list is regulated. An integer variable, referred to as maxcondeg in the actual tie module code, was introduced to manage the restricted subset of nodes included in the connectivity list. The connectivity list is guaranteed to include all active uneliminated nodes, whose degree is less than or equal to maxcondeg. When the program begins, maxcondeg is set equal to the lowest degree value of any node found in the quotient graph of the original system. The initial connectivity list is constructed from all nodes, and their corresponding connec-tivities, which possess this minimum degree. As nodes are eliminated from the sequence of quotient graphs, they are also removed from the connectivity list. In addition, when an uneliminated supernode is formed, the connectivity list membership is updated to ensure that all nonrepresentative nodes are no longer included. If the connectivity list becomes empty during the elimination process, maxcondeg is increased to the new minimum degree value, and the connectivity list is augmented with all active, uneliminated nodes of this degree. As the elimination process proceeds, the degrees of active uneliminated nodes can, of course, vary quite substantially from their initial values. If the degree of a node becomes less than or equal to the current value of maxcondeg, its connectivity is calculated and the node inserted into the connectivity list. This is necessary to ensure that the connectivity list is complete for all nodes with a degree value less than or equal to maxcondeg. However, if the degree of a list member rises above the current maxcondeg value, the node is removed from the connectivity list. Referring to Section 3.8, if the node were to remain in the connectivity list, its connectivity value would have to be totally recalculated. In addition, the node would have to be deleted from the connectivity list and then reinserted with its new value. Essentially the same amount of work is required if the node were to be removed from the connectivity list and reinserted when it is of minimum degree once again later in the eliminaton process. However, by removing the connectivity list entry, all additional connectivity updating, which would have been required before the node once again became a minimum degree 137 candidate, is also avoided. It is important not to overlook this aspect of the connectivity list mainte-nance scheme. Ignoring this aspect severely reduces the overall efficiency of the tiebreaking MDA implementation. Consider, for example, the nine point prob-lem of Chapter 6 with 1089 nodes. During its simulated factorization, if all nodes introduced into the connectivity list remain members until eliminated or absorbed into a supernode as a nonrepresentative member, 72% of the connectiv-ity updates and 96% of the connectivity recalculations are performed on nodes with degrees larger than maxcondeg. This translates into an increase in the over-all execution time by a factor of approximately 2. If the graph's minimum degree falls below maxcondeg, however, it does not pay to lower the maxcondeg value to match this change. Figure 5.6 illustrates the minimum degree observed at each step of the elimi-nation process for the nine point problem mentioned in the previous paragraph. The degree of each node selected for elimination is plotted against the total num-ber of node selections made until that point. The total number of selections is used, rather than the total number of nodes eliminated, so that the graph is not distorted by the mass elimination of supernodes. Whether a node selected repre-sents one or several different nodes has no direct bearing upon the maintenance of the connectivity list structure and its use in the node selection process. The graph in Figure 5.6 typifies the general trend of degree values for nodes selected for elimination in all test problems considered by Chapter 6. As previ-ously mentioned, the trend during the elimination process is towards increasing degree, with blocks of nodes possessing a common degree eliminated without interuption. The trend is not always exhibited as crisply by other test prob-lems, many of which have nodes eliminated with a degree value less than the current maxcondeg value. Table 5.2 records the number of such lower degree node selections for a small subset of the Chapter 6 test problems. The num-ber of nodes eliminated with a degree less than the current maxcondeg value varies between problems, but in each case this class of node selections was in the minority. In addition, it was found that when the minimum degree dropped 138 35 30 <D 25 (D i . CD <P Q 20 C o c 15-UJ 10 i i i i i i — i — i — i — i — i — i 0 60 100 150 200 250 300 350 400 450 600 550 600 Selection Number Figure 5.6: Selection Degree below the maxcondeg level, in general, relatively few nodes were eliminated at this reduced value before the minimum degree of the quotient graph once again equalled maxcondeg. From these observations it is clear that the maxcondeg value should not be lowered when the minimum degree of the graph is temporarily reduced. The connectivity list is kept intact, and all necessary updating is performed in antic-Problem Name Total Node Selec-tions Selections With de-gree < maxcondeg nine point - 16641 8453 2 five point - 1600 1218 2 685 BUS 627 57 3hole 215 10 ERIS 1176 908 212 Table 5.2: Node Selections With Degree < Maxcondeg 139 ipation of the minimum degree's return to the maxcondeg level. The alternative of lowering maxcondeg to match the new minimum degree is obviously less ef-ficient considering the small number of nodes typically eliminated (usually 1 or 2) with the lower degree value. A total connectivity recalculation for all nodes whose degree remained equal to the original maxcondeg value would be required when the value of maxcondeg was increased once again. As a result, the only time maxcondeg is changed is in response to an empty connectivity list. In this case the degree of all remaining uneliminated nodes is greater than maxcondeg. The value of maxcondeg is increased and the connectivity list reformed. In this fashion, the value of maxcondeg shadows the degree increases of the general trend observed in Figure 5.6. It is not difficult to create an artificial example in which the connectivity list consists of all uneliminated nodes of the current quotient graph throughout the elimination process, or a problem in which the overlying index becomes com-parable in length to the connectivity list itself. The cost of maintaining such a lengthy connectivity list would be very high, and such a lengthy index would not improve the efficiency of connectivity list access. Fortunately, during the appli-cation of the tiebreaking MDA to the more practical examples of Chapter 6, the connectivity lists and indices did not exhibit these undesirable characteristics. In general the connectivity list consisted of only a small fraction of the uneliminated nodes and the number of different connectivity values was usually very small. Figures 5.7 and 5.8 exhibit plots of connectivity list length and index length against the number of nodes selected for elimination when the tie module was applied to the nine point problem with 1089 nodes. The connectivity list never grows to include more than 68% of the original graph's nodes, and this peak level is reduced to zero within 200 eliminations. On average, the connectivity list consists of approximately 160 of the graph's original set of nodes. Although these levels of connectivity list length are considered acceptable, the majority of the remaining test problems considered do not display such a relatively high peak in their connectivity list length. The plot of the index length is, however, typical of most problems. A relatively small number of different connectivity values are 140 70Q-> 600-Figure 5.7: Connectivity List Length - nine point 1089 Figure 5.8: Connectivity Index Length - nine point 1089 141 observed amongst the minimum degree nodes at any point of the elimination process. 5.4.3 The Elimination Node Selection Process As in the norm module, the minimum degree of all active, uneliminated nodes in the current quotient graph is recorded in an integer variable referred to as lowest. The value recorded by lowest is updated after each elimination. The search for a minimum degree, maximal connectivity node begins at the head of the current connectivity list. The first node encountered with a degree value equal to lowest is selected for elimination. As a result, if there is a group of equally qualified candidates, the tie is broken by selecting the node which happens to be the first candidate recorded in the connectivity list. The node selected is guaranteed to have a maximal connectivity value, because the connectivity list is sorted by decreasing connectivity. As shown above, lowest usually equals maxcondeg but occasionally may briefly fall below this level. As a result, it is possible for a nonempty connectivity list to have no nodes of degree lowest. To help lowest make the transition to the proper minimum degree value, during each scan of the connectivity list the smallest degree value encountered is recorded, so that at maximum two scans are required to find a minimum degree node. In a similar manner, when maxcondeg is incremented and elimset is scanned to construct a new connectivity list, the smallest degree value of active, unelim-inated nodes considered is recorded. If after the first scan the connectivity list remains empty, maxcondeg can be set to the correct value and a final scan of elimset conducted to produce the connectivity list. 5.4.4 Reachable Set Storage The tie module's treatment of reachable set storage differs quite substantially from the temporary storage used by the norm module. As a direct result of the introduction of connectivity values, the tiebreaking MDA implementation 142 exhibits a large increase in the number of reachable set requests. To calculate the connectivity of node x, its reachable set and the reachable set of each active, uneliminated member of x's reachable set are required. To avoid the redundant calculation of reachable sets, long term storage of their membership was intro-duced. Once a node's reachable set has been requested, it is stored until the node is eliminated or absorbed into a supernode as a nonrepresentative member. Throughout the elimination process, all stored sets are maintained so that they accurately represent each node's reachable set in the current quotient graph. When a member of a stored set is eliminated, the affected reachable set is explic-itly updated without totally redetermining its membership, as will be described in Section 5.4.5. However, if a reachable set member is absorbed into an unelim-inated supernode, the node is implicitly removed from the set when its ignore flag is set to true. The data structure selected for reachable set storage needs to be very flexible to successfully accommodate the expansion and contraction of reachable sets. As described in Section 5.2.4, the allocation and freeing costs of storage referenced by pointers makes this option less desirable. Instead linked lists were mimicked using a single large array of the following record type, for the communal storage of all recorded reachable sets. (If a translation of the module into Fortran is desired, using these integer arrays also make a translation of the code more straightforward.) The storage array is referred to as rchstor in the actual implementation. l istrec = record node: 1..maxnodes; rchptr: 0..maxrchstor end; The two integer constants maxnodes and maxrchstor are used by the pro-gram to represent the maximum number of nodes and maximum allotment of of reachable set storage. The array is maxrchstor in length and indexed by integer in the range 1. .maxrchstor. Each record of the array can act as a 143 single node of a linked list. The field node stores the node number of the partic-ular reachable set member, while the field rchptr stores an index into rchstor corresponding to the next record used by the reachable set's linked list. A flag EORch is assigned to the latter field if the record is the last member of the list. This structure is used to simultaneously store all previously requested reach-able sets of nodes which have not been eliminated or absorbed into an unelimi-nated supernode. To provide quick access to the different sets, a new field was introduced to the records of the nodelist array. reachptr: 0 ..maxrchstor; When a node's reachable set is stored in the communal structure, its reachptr field stores the index value corresponding to the first record of the linked list representing its reachable set. Otherwise reachptr is assigned EORch to indicate that a reachable set is not currently stored for this node. At any point of the elimination process all storage not used by reachable sets is organized into a linked list of free space. Initially the free space linked list encompasses all records of the rchstor structure. As reachable sets are requested and expansion of existing sets is required later in the process, storage nodes are allocated from the head of the free space list. If a node is eliminated or absorbed into an uneliminated supernode, its reachable set is no longer required and any space allocated to its reachable set is returned to the free space list. The introduction of rchstor for the storage of reachable sets substantially in-creases the storage requirements of the tiebreaking MDA from the requirements of normal algorithm. The maximum level of reachable set storage required for each of the test problems considered is recorded in Table 6.4 of Chapter 6. Al-though a large amount of storage is required in most problems, in each case the requirements are less than the storage needed to represent the lower adjacency structure of the factor L. As a result the storage requirements of the tiebreaking MDA are comparable to that of the symbolic factorization step, which requires at least the simultaneous storage of A's adjacency structure and the storage structure used for the factor L during the factorization. 144 Unfortunately, without prior experience the size of the rchstor array re-quired for a particular problem cannot be accurately predicted before the appli-cation of the tie module. A general alloction rule is discussed in Chapter 6 to assist in the allocation of rchstor memory for practical problems. However, it is always possible to create an artificial problem which serves as a counter-example to any rule developed. The elimination graph model experienced similar problems when allocating storage for adjacency sets. In general it could not be guaranteed that sufficient storage was allocated without attempting to order the particular problem. These difficulties resulted in the rejection of the elimination graph model as a basis of MDA implementations. The storage of reachable sets, however, does not present such a formidable problem. If a shortage of storage is encountered, all saved reachable sets do not have to be maintained. Unlike the elimination graph's adjacency set, the storage allocated to a reachable set can be returned to the free space list. The necessary information to build a reachable set is always stored implicitly within the current quotient graph. The reachable sets were only stored in the tie module to help reduce execution times of the resulting implementation. Although they will not be discussed in detail in this thesis, a number of possi-ble schemes could be introduced to alleviate the storage difficulties of the existing implementation. The most obvious scheme is to introduce a more sophisticated routine to manage the storage of rchstor. If the free space list was empty and more space required, reachable sets not currently in use could be selected for destruction. Various criteria for this selection process could be developed. The only restriction is that before each elimination step the reachable sets of the se-lected node and all nodes, whose connectivity may change after the elimination, must be stored. Using such a scheme it would be possible to reduce the storage allocation when memory was a problem, or increase the allocation if a faster execution time was desired. In addition, such an enhancement would essentially alleviate the problem of whether a particular storage allocation was sufficient to allow the ordering program to run to completion on a given problem. 145 A second option is to return to a linked list format for reachable sets, using true pointers, but to do garbage collection without actually returning unused storage to the heap. A free space list of unused linked list nodes could be kept and used for a new reachable set. As discussed in Section 5.2.4, it was not the dereferencing, but the allocation and freeing costs, which were the major obstacle of true pointer use. 5.4.5 Degree and Reachable Set Updating When a node, t, is eliminated from the current quotient graph the sets, and hence degrees, of all active members, x, of reacha<i(t, S) are affected. Because the connectivity of t must have been previously calculated for it to have been selected for elimination, the reachable sets of each node x must be recorded in rchstor. To update the affected degrees and reachable sets of each x, the reachable sets could be completely recalculated using the new quotient graph. An alternative approach is possible, however, in which each reacha<i(x, S) in rchstor is simply augmented by any new members resulting from t's elimination. In step 4g of the enhanced MDA algorithm presented in Section 3.9, a dif-ference set, diff[x) := saverch(t) — saverch(x), is calculated for each member of fs old reachable set during the updating of satellite connectivities. The set saverch(t) is the reachable set of t just prior to its elimination, while saverch(x) is the reachable set of x stored in rchstor and awaiting updating. As described in Section 3.8, diff[x) represents the new members of x's reachable set, which are now reachable through the eliminated node t. Instead of totally recalculat-ing each reachable set, the difference sets can be used to augment the stored reachable set of each node x. The difference set updating approach was selected for the implementation of the tiebreaking MDA. The degree updating shown in step 4(i)i of the Section 3.8 algorithm is expanded to include reachable set updating and moved to step 4g, which was originally composed entirely of connectivity updating. The modified step is illustrated in the following revised fragment of the original algorithm. 146 (g) For each x 6 saverch(t) i. diff[b) := saverch(i) — saverch(x) ii. reachGq(x, C(S)) := (saverch(x) U diff[x)) — {t} iii. Dx := size(reachGq(x,C(S)) iv. For each y £ (saverch(x) — (saverch(t) U {t})) conn(y) := conn(y) + 1/2* superlength(x) * size(diff(x) D reachGq(y, S U {t})) In the actual encoding of the tie module, the reachable set and degree updat-ing steps shown above are processed simultaneously. The difference set is added one node at a time to x's existing reachable set stored in rchstor. As each node is added to the list, the degree of x is incremented by each additional member's superlength. Once the entire difference set has been added in this fashion, the augmented reachable set is scanned to remove all ignored nodes and the newly eliminated node t. When t is removed, its superlength is subtracted from x's degree value to complete all necessary updating. The ignored entries in a reach-able set are produced when a supernode is formed and all absorbed members are implicitly removed from affected reachable sets, by setting their ignore flags to true. If such nodes are allowed to build up in stored reachable sets, eventually an unacceptably high proportion of the rchstor storage allocation is wasted. In addition, reachable set traversals can become very inefficient towards the end of the elimination process. The only other degree and reachable set updating that is carried out during an application of the tiebreaking MDA occurs after the formation of a new une-liminated supernode. In the norm module this updating is performed by totally recalculating the new supernode's reachable set. In the tie module, the super-node formation algorithm described in Section 5.2.3 has been modified so that the representative's reachable set is implicitly updated using the ignore field. The representative's degree is updated by subtracting from its current degree value the superlength of each absorbed node. If the explicit recalculation is used in place of the updating schemes outlined 147 above, the execution times for tie, quoted in Chapter 6, can be expected to increase 14 to 18% for most problems. 5.5 Implementation of Secondary Tiebreaking Each member of the sec family of programs implements the deficiency tiebreak-ing MDA enhanced by one of the three secondary tiebreaking schemes detailed in Chapter 4. All three of the programs are based upon the tie module, with a completely independent set of routines introduced to perform the secondary tiebreaking. The only change made to the actual encoding of the the tie module is in the elimination node selection routine. When a node is selected for elim-ination, the potential candidate is passed to the central secondary tiebreaking routine. This routine coordinates the application of the particular secondary criterion, considering all nodes in the remainder of the connectivity list with the same degree and connectivity values as the original candidate. The particular minimum degree, minimal deficiency node selected by the secondary routines is then passed back to the node selection routine of the tie module, becoming the next elimination node. The detailed discussion of Chapter 4 outlined the important aspects of each secondary tiebreaking scheme. Because implementation of these strategies closely follows the algorithms previously presented, an extended discussion of additional implementation details is not warranted. Instead, the remainder of this section sketches a very brief outline of the implementation of each strategy's central feature. In both strategies one and two, the calculation of the potential connectivity change for each candidate depends upon the identification of triangles amongst the members of reach(candidate, S) and satellite nodes. Independent of whether or not edges already exist between members of reach(candidate, S), the intersec-tion of the reachable sets for each pair of reach(candidate, S) members must be performed. If an edge already exist between the two members, class three trian-gles need to be counted. In this case, only other members of reach(candidate, S) 148 count towards the intersection. Otherwise, when an edge does not already exist between the two members, only satellite nodes contribute towards the intersec-tion. To perform these pairwise intersections, the members of reach(candidate, S) are first marked with negated degree fields. In turn the reachable set of each reach(candidate, S) member is marked using negated superlink fields. Before the superlink fields of the node's reachable set members are reset, the neces-sary intersections are performed by scanning the reachable set of each remaining member of reach(candidate, S), looking for nodes whose degree and superlink fields are flagged appropriately. Two different marking schemes were used so that satellite nodes could be easily differentiated from reach(candidate, S) members. The cases in which an edge already exists between the two members is identified by the superlink marking. Once all intersections have been performed in this fashion, affected degree fields are reset to their original values. To calculate the potential change in the number of minimum degree nodes for each candidate, the implementation of strategy three relies upon two scans of the candidate's reachable set. During the first scan, active members of the reachable set are marked using superlink negation and a count is made of the number of minimum degree nodes encountered. In the second scan, the new degree of each active node is calculated using equation 4.5 of Chapter 4, and the potential number of minimum degree nodes after the elimination is recorded. The flagging of reach(candidate, S) members is required, so that the superlength summation of the equation's fifth term can be performed easily by scanning the member's reachable set. 5.6 Implementation of the Solve Module The solve program actually consists of three distinct sub-modules. Each is rela-tively independent and they were only implemented as a single unit to simplify the testing. The first sub-module completes the symbolic factorization, initial-izing storage in preparation for the explicit factorization. The second and third 149 sub-module perform the factorization and solve steps of the solution process. Three lower adjacency structures of the original system are used by the sym-bolic factorization portion of the program. Two of these are different lower adjacency structures of the original adjacency structure. The first is constructed according to the original node labelling and is used to identify the original po-sition, within the system's matrix, of each off-diagonal entry input. The second is constructed using the new node ordering produced by the ordering module and helps to reorganize the off-diagonal entries in preparation for placement into the factor's data structure. Finally, the lower adjacency structure of the factor, passed to the module from the ordering program, is used to set up the factor's storage and place the off-diagonal nonzero entries in the appropriate positions. The adjacency list data structures of Section 5.2.1 were used to store each of the three lower adjacency lists. In the more conventional approach to symbolic factorization, the factor's lower adjacency structure is not explicitly formed in its entirety. In their book, George and Liu presented [GL81] a number of different data structures and algorithms to perform the explicit Cholesky factorization and subsequent solution of large, symmetric, sparse positive definite systems. No new research was conducted in this area during the course of this thesis. The factor and solve sub-modules are based upon a selection of the approaches discussed by George and Liu. Each important data structure and algorithm selected is mentioned below, but for a detailed discussion the reader is referred to [GL81]. The system's original matrix before factorization and the factor L after the Cholesky decomposition are both stored in the same data structure. The un-compressed scheme of George and Liu was selected. The nonzero columns of the factor's off-diagonal entries are stored one after the other in a single real array. A second integer array stores a set of indexes marking the beginning of each column, while a third array contains the row indexes for each off-diagonal entry. Finally, the diagonal entries are stored in a fourth array of type real. The same data structure can be used to store the original matrix and the 150 factor because the inner product form of Cholesky factorization was used to create L. In this form of the factorization process, L is created column by column. The creation of each column depends only upon previously calculated columns and nonzero entries of the original matrix's corresponding column. Finally, in the solve sub-module the solution of the two lower triangular systems, LPy = Pb and LTx = y, is calculated using the outer product and inner product forms of triangular solution respectively. These triangular solution methods were selected because the column oriented data structure made the rows of LT and columns of L more accessible. This concludes discussion of the implementation issues of concern to the programs developed for this thesis. In the following chapter the characteristics of these programs are investigated by applying them to a wide range of different problems. 151 Chapter 6 Results and Analysis This chapter summarizes the testing conducted with the three ordering modules norm, tie and sec. Each was evaluated by applying it to a wide variety of test problems. In addition, the effect of an ordering's quality upon the entire solution process was investigated using the solve module on a subset of the test problems. In the first section of this chapter the testing environment is detailed, followed by a section introducing the sparse matrix problems and nonzero structures used for testing the routines of Chapter 5. In the following section the fill and ex-ecution timings observed for the normal MDA and the tiebreaking MDA are contrasted. In addition, the potential effect of a fill reduced ordering upon the execution time of the factor and solve steps of the solution process is demon-strated. Finally, the limited success of the secondary tiebreaking schemes of Chapter 4 is outlined. 6.1 Testing Environment All testing was carried on a SUN 3/50, running BSD4.2 UNIX 1. An Ethernet2 is used to connect the machine to a local network, providing access to common file servers. All code for the norm, tie, sec and solve modules was written in SUN Pascal, which is an extended version of the Berkley Pascal distributed l T JNIX is a trademark of AT&T Laboratories 2Ethernet is a trademark of the Xerox Corporation 152 with BSD4.2 UNIX. Each of the four modules is compiled as several separate submodules and linked together using external calls. The only compiler option used in each case was a request for a partial optimization, which results in partial evaluation of boolean expressions whenever possible. All timings of the modules were carried out using the built in SUN Pascal function clock. This function returns the number of CPU milliseconds which have been devoted to the current process. Finally, throughout all testing it was ensured that the SUN 3/50 was devoted to a single user. 6.2 Test Problems A wide variety of symmetric, sparse matrices was selected for the extensive test-ing of the three ordering algorithms conducted. The first two groups of problems arise from the solution of partial differential equations on an nxra grid. Each problem results in a graph of N — (n + l ) 2 nodes. The first set of problems arises from the solution of the Poisson equation on the unit square, using Dirich-let boundary conditions and a five point finite differencing molecule. The four values n = 19, 29, 39 and 99, were selected to produce graphs of 400, 900, 1600 and 10000 nodes respectively. The second set of problems arise from the application of finite elements to a partial differential equation. The resulting system has a banded matrix, which is equivalent in structure to a matrix arising from the application of finite dif-ferences to a differential equation using a nine point molecule. Only grids with n = 2* were considered. The four values t = 4, 5, 6, and 7 were selected to produce graphs consisting of 289, 1089, 4225 and 16641 nodes respectively. For both the five and nine point problems, the initial labelling of their corre-sponding graph was based upon a standard lexicographic ordering of the grid's unknowns. As well, in each case the nonzero values of the appropriate positive definite system were available, permitting investigation of the complete solution process for these two groups of problems. 153 Figure 6.1: The Initial Mesh of fullspider For the next graph, however, only the nonzero structure of the matrix was available. The nonzero structure of the graph was created by randomly selecting a size, t, for each node's initial adjacency set and by choosing t nodes of the graph at random to be members of its adjacency set. The specific range of values from which t may be selected is decided upon prior to the graph's creation so that the density of the resulting structure can be controlled. The 400 node graph created in this fashion is simply referred to as the random graph in subsequent discussion and its particular characteristics are summarized in Table 6.2. The next set of problems consists of the nonzero structures of three systems used by George and Liu in testing their implementations [GL78a,GL81]. The graphs are typical of those that may be found in the study of heat conduction or structural analysis. The triangular mesh structures illustrated in Figures 6.1, 6.2 and 6.3 can be used, in the following manner, to derive the fullspider, 6hole and Shole problems respectively. Each triangle of the basic structure is subdivided by a factor t, resulting in t2 smaller triangles. To demonstrate this technique, one of the larger triangles of the fullspider's basic mesh has been subdivided with t = 5. The vertices of each triangle correspond to the nodes in the graph of the 154 wwv Figure 6.2: The Initial Mesh of 6hole Figure 6.3: The Initial Mesh of Shole 155 system's matrix, while the sides of each triangle correspond to the graph's edges. To create the so called fullspider problem, each of the triangles in the basic mesh of Figure 6.1 was subdivided by the factor values recorded in each triangle of the illustration's top left square. This subdivision pattern was duplicated in the other two squares of the mesh. Those triangles without a value were not subdivided any further. Originally George and Liu referred to the graphs arising from this basic mesh as Graded L, or simply L-shaped, problems. In their testing, however, all triangles were subdivided using the same factor t. As shown in Figure 6.1, different values of t were selected for this thesis' 324 node version of the problem. The graph was renamed fullspider to avoid confusion between the slightly different problems. In the last group of problems to be studied, a true Graded L problem of 3025 nodes is included. The Shole and dhole problems were created by subdividing the triangles of their basic meshes by a constant factor of t = 3. This results in two graphs, both of which are composed of 316 nodes. An arbitrary initial ordering was selected for the fullspider, Shole and 6hole graphs. The final group of 14 test problems was selected from the set of sparse matri-ces collected by the Harwell and Boeing Computer Services. Table 6.1 contains a list of the chosen problems and a short description briefly outlining the origin of each system. The name listed for each problem is the original key assigned to the system in the Harwell-Boeing collection and will serve to identify each prob-lem for the remainder of this chapter. In addition, the file number under which the problem is stored within the collection is also included. Approximately half these test problems had only their nonzero structure specified, without a record of the matrix's actual values. All matrices selected are symmetric. Table 6.2 summarizes the basic characteristics of the initial graph for each of the 26 test problems. To avoid confusion, each column of entries, excluding name, is briefly described in the following list. N : The number of nodes in the initial graph of the problem's matrix. 156 Name File # Description ERIS1176 3 Symmetric Pattern of Erisman, Summer 1973 BCSPWR09 4 Symmetric Structure Representation of Western US Power Network BCSSTK09 5 Symmetric Stiffness Matrix, Square Plate Clamped BCSSTK10 5 Symmetric Stiffness Matrix, Buckling of Hot Washer CAN1072 6 Symmetric Pattern from Cannes, Lucien Marro, June 1981 DWT2680 7 Symmetric Connection Table from DTNSRDC, Washington LSHP3025 11 Symmetric Matrix form A. George's L-shaped Prob-lems 1978 NOS3 13 Symmetric Matrix, FE Approximation to Bihar-monic Operator on Plate PLAT1919 20 Splatzman Symmetric Finite Difference, Three Ocean Model BLCKHOLE 23 Connecitivity Structure of a Geodesic Dome on a Coarse Base BCSSTK19 24 Symmetric Stiffness Matrix, Part of a Suspension Bridge 685BUS 28 Admittance Matrix, 685 Bus Power System, D.F.Tylavsky, July 1985 1138BUS 28 Admittance Matrix, 1138 Bus Power System, D.F.Tylavsky, July 1985 BCSSTK26 30 Symmetric Stiffness Matrix, Reactor Containment Floor-Section Table 6.1: Harwell-Boeing Test Problems Selected 157 Nonzeros: The number of off-diagonal nonzeros in the lower triangular portion of the system's original matrix. This value equals the number of edges in the initial graph. Density: The nonzero entry density of the original matrix as calculated by (2 * (Nonzeros) + N)/N2. This value gives the fraction of the matrix's entries which are nonzero. Average Degree: The average degree of nodes in the original graph is calcu-lated by 2 * (Nonzeros) /N. Initial Conn: The sum of connectivity values for all nodes of the initial graph. Maximum Conn: The theoretical maximum initial connectivity which can be observed for a graph with the same number of nodes and the same degree distribution. Conn %: Initial connectivity as a percentage of the maximum connectivity. It should be emphasized that the values in the nonzeros column do not repre-sent the size of the adjacency list required to store the graph. The required array is actually twice as large as the nonzeros value recorded. The nonzeros value, as defined above, was selected for presentation to allow for easier comparison with the fill values reported for the factor, L, of each problem, using a particular ordering. The number of nonzero entries in L which are not new fill entries is recorded by nonzeros. Pascal does not allow the dynamic allocation of array data structures. The size of the nodelist, adj l i s t and rchstor arrays, for example, must be fixed before compilation of the appropriate modules. Ideally sizes of array data areas would be individually tailored to each of the test problems, to avoid wasted mem-ory allocation. Unfortunately this requires a time consuming recompilation of all test modules for each problem. To streamline the testing process, the majority of problems were divided into three groups, with the data structure requirements of the largest problem in the group used for all members. Table 6.3 records the 158 Name N Nonzeros Density Average Initial Max Conn Degree Conn Conn % 5Pt 400 400 760 1.20E-2 3.80 0 4328 0 5Pt 900 900 1740 5.41E-3 3.87 0 10088 0 5Pt 1600 1600 3120 3.06E-3 3.90 0 18248 0 5Pt 10000 10000 19800 1.62E-4 3.96 0 117608 0 9Pt 289 289 1056 2.87E-2 7.31 6144 13824 44 9Pt 1089 1089 4160 7.93E-3 7.64 24567 56320 44 9Pt 4225 4225 16512 2.09E-3 7.82 98304 227328 43 9Pt 16641 16641 65792 5.35E-3 7.91 393216 913408 43 random graph 400 497 8.71E-3 2.49 0 2068 0 fullspider 324 859 1.95E-2 5.30 3222 8050 40 3hole 316 822 1.96E-2 5.22 3024 7276 42 6hole 316 825 1.97E-2 5.22 3024 7276 42 ERIS1176 1176 8688 1.34E-2 14.8 844344 898250 94 BCSPWR09 1723 2394 2.19E-3 2.78 1044 13714 7.6 BCSSTK09 1083 8677 1.57E-2 16.02 94692 278828 34 BCSSTK10 1086 10492 1.87E-2 19.32 244176 418588 58 CAN 1072 1072 5686 1.08E-2 10.6 61062 142454 43 DWT2680 2680 11173 3.48E-3 8.34 70500 173202 41 LSHP3025 3025 8904 2.28E-3 5.89 35280 87682 40 NOS3 960 7442 1.72E-2 15.5 115332 221658 52 PLAT1919 1919 15240 8.80E-3 15.9 244692 469494 52 BLCKHOLE 2132 6370 3.27E-3 5.98 25752 65326 39 BCSSTK19 817 3018 1.03E-2 7.39 21060 42904 49 685BUS 685 1282 6.92E-3 3.74 1980 9232 21 1138BUS 1138 1458 3.13E-3 2.56 768 8252 9.3 BCSSTK26 1922 14207 8.21E-3 14.8 239592 453956 53 Table 6.2: Test Problem Characteristics 159 sizes of the data structures within the ordering modules, used during the testing of each different problem. For those problems which were used to demonstrate the entire solution process, the sizes of the solve module data structures were also included. A single compilation of all modules with data areas large enough for all test problems was not used because the length of the storage arrays has a significant effect upon the timings of the ordering modules observed. For example, if the maximum number of nodes and maximum adjacency list size are increased from 1000 and 3500 to 3000 and 150000 respectively, the ordering time of the normal MDA on the 289 node nine point problem is increased by 20%. Although by grouping the test problems into smaller sets the impact of this problem has been reduced, rigorous comparison of ordering times should only be made within the confines of a particular test problem. Otherwise, execution time comparisons should be very general. 6.3 Primary Tiebreaking Test Results This section presents the testing results of the tiebreaking MDA. A comparison of the normal and tiebreaking MDA orderings is made in the first subsection. In the following subsection a discussion of the effect of different orderings on timings for the remainder of the solution process is presented for a limited subset of the test problems. 6.3.1 Ordering Test Results So that comparison could be made between the tiebreaking MDA and the origi-nal quotient graph MDA, both the norm and tie modules were applied to all test problems introduced in the previous section. The goals of the new tiebreaking strategy were to reduce the fill observed for the MDA ordering and to increase the fill stability of orderings for problems affected by different initial labellings. To study the effectiveness of the tiebreaking MDA with respect to either goal, 160 Order Solve Name Max Nodes Max Adjacency Max Reach Max Nodes Max Nonzeros 5Pt 400 1000 3500 5000 1000 10500 5Pt 900 1000 3500 5000 1000 10500 5Pt 1600 2000 18000 20000 1650 31500 5Pt 10000 10001 45000 55000 10001 205000 9Pt 289 1000 3500 5000 1000 10500 9Pt1089 2000 18000 20000 1650 31500 9Pt 4225 4226 40000 55000 4226 135000 9Pt 16641 16642 150000 185000 - -random graph 1000 3500 5000 - -fullspider 1000 3500 5000 - -3hole 1000 3500 5000 - -6hole 1000 3500 5000 - -ERIS1176 2000 18000 20000 - -BCSPWR09 2000 18000 20000 - -BCSSTK09 3200 30500 40000 1100 63000 BCSSTK10 1100 24000 24000 - -CAN1072 2000 18000 20000 - -DWT2680 3200 30500 40000 - -LSHP3025 3200 30500 40000 - -NOS3 2000 18000 20000 1650 31500 PLAT1919 3200 30500 40000 - -BLCKHOLE 3200 30500 40000 - -BCSSTK19 2000 18000 20000 - -685BUS 1000 3500 5000 - -1138BUS 1200 3000 5000 - -BCSSTK26 3200 30500 40000 - -Table 6.3: Array Constants Used for Testing 161 it is necessary to consider more than one initial labelling for each problem. As a result, the testing of each problem was expanded to include five random rela-bellings of the graph as well as the original initial ordering. Table 6.4 records the test results observed when the norm and tie modules were applied to all initial labellings of each test problem. The table is divided into two regions, one for the norm ordering program and one for tie, with the majority of data recorded for the tiebreaking MDA. The following list provides a brief description of each field. Fill: The number of new nonzero entries introduced to the factor L. Time: The number of CPU seconds required to produce the ordering. Node Selections: The number of node selections made by the tie module. Low Degree Selections: The number of node selections made for which de-gree < maxcondeg. Tiebreaking Opportunities: The number of node selections for which there is more than one node in the graph with the same minimum degree value. Actual Tiebreakings: The number of node selections for which there is an-other node in the graph with the same degree as the selected node but which has a smaller connectivity value. Maximum Reach Storage: The maximum number of reach storage entries used during the execution of tie. Elimination Number: The number of eliminations after which the maximum reach storage level of the previous column was attained. The timings of the norm and tie programs, based upon the built-in SUN Pascal function clock, include the initialization of the nodelist array, the ini-tialization of the reachable set storage in tie, and the CPU time required to produce the new ordering. The timings do not include the input of the system's 162 adjacency lists, the initialization of the adjlist array, report generation or the piping of the factor's lower adjacency structure to the solve module. Finally, the values recorded in the node selections, tiebreaking opportunities and actual tiebreaking columns were calculated by considering all uneliminated supernodes as single nodes. As a result, it is normal for the node selection count to be sig-nificantly less than the number of nodes in the original graph. The number of tiebreaking opportunities and actual tiebreakings should be compared with this value rather than to N. Table 6.5 provides a summary of the effect of the tiebreaking enhancement of the MDA upon the levels of fill observed for all 26 problems. The mean fill value and standard deviation, s, for both the norm and tie modules was calculated using the fill values of all six inital orderings for each problem. The final column of the table shows the percentage by which the mean fill value of the normal algorithm is reduced by the tiebreaking enhancement. Of all test problems, the MDA enhanced with deficiency tiebreaking was the most successful when applied to the five and nine point problem sets. The mean fill value for the five point problems was improved by 7.1 to 22%, while the mean fill was improved by 6.2 to 22.2% for the nine point point problems. The fill improvement is the most pronounced for the largest problem in each class, with the tie fill levels for all initial orderings well below their norm counterparts. 163 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: 5Pt 400 2406 3.52 2497 4.65 313 2 297 245 1661 268 2655 3.43 2393 4.67 314 2 297 254 1626 271 2687 3.42 2411 4.72 315 2 304 262 1594 273 2614 3.48 2401 4.77 315 2 298 255 1581 268 2792 3.55 2496 4.72 315 2 295 250 1641 263 2645 3.47 2486 4.60 313 2 295 250 1603 261 N ame: 5Pt 900 7124 8.52 7117 11.3 691 3 669 571 4096 573 8334 8.47 7488 11.8 693 2 663 574 4299 557 7950 8.68 7007 11.9 694 2 669 602 4173 580 7877 8.73 7246 11.7 689 2 658 589 4193 578 7982 8.48 7369 11.8 694 2 670 592 4169 568 8120 8.68 7181 11.7 692 2 669 591 4155 571 Name: 5 Pt 1600 16062 18.6 15456 21.5 1218 2 1192 1056 7819 1017 17400 18.7 14927 22.3 1226 2 1192 1094 7778 1015 17246 18.5 14691 22.3 1221 3 1186 1089 7851 1019 16888 18.8 15539 22.3 1225 2 1189 1077 7792 1000 17653 18.9 15087 22.1 1219 2 1186 1094 7762 1011 17301 18.6 15108 22.2 1218 2 1183 1087 7749 1021 Name: 5Pt 10000 176761 260.9 143013 173.0 7533 2 7478 7094 52729 6267 180571 259.6 142947 178.5 7549 4 7483 7208 52787 6302 184646 261.0 140470 176.6 7539 4 7473 7190 52847 6288 184605 261.3 143181 177.7 7552 3 7484 7201 52840 6295 183681 261.1 141309 176.4 7550 3 7479 7222 52500 6214 182746 260.0 141796 176.6 7549 2 7190 7477 52623 6278 N ame: 9Pt 289 2499 2.45 2278 3.85 165 1 157 111 1922 108 2603 2.58 2452 3.88 169 3 153 118 1948 117 2492 2.48 2368 3.87 164 2 150 110 1987 110 2572 2.47 2554 3.97 171 2 157 122 1883 115 2676 2.55 2375 4.00 170 2 154 119 1952 117 2612 2.58 2476 4.00 168 2 152 116 1981 115 Table 6.4: Tie and Norm Ordering Testing 164 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: 9Pt 1089 17249 12.8 14586 18.5 581 1 570 463 9576 332 18145 12.3 16510 19.1 590 2 557 471 9695 343 18675 13.2 15935 18.8 589 2 560 470 9431 334 18332 13.4 15702 18.4 592 3 561 477 9539 332 19223 13.5 15533 18.6 589 2 557 468 9577 346 18749 13.3 15506 18.9 590 2 560 472 9595 348 Name: 91 Pt 4225 99644 64.5 83742 90.5 2181 1 2167 1935 41640 1164 108222 68.0 91681 93.1 2203 2 2153 1964 41862 1195 113026 68.7 91072 92.9 2208 2 2155 1965 41744 1166 111345 68.8 91597 93.1 2200 2 2141 1949 41861 1189 110660 68.7 91818 93.1 2203 3 2147 1960 41999 1181 115643 68.7 91431 93.1 2201 4 2151 1962 41906 1191 Name: 9Pt 16641 537287 491.5 445136 399.9 8453 2 8431 7947 173352 4364 645592 525.7 498378 417.7 8490 5 8388 8002 173545 4409 643565 526.3 506837 420.3 8506 6 8415 8025 173368 4372 652882 524.1 497788 416.9 8514 3 8418 8028 173358 4348 650326 524.3 508769 419.5 8502 4 8407 8024 173753 4398 648232 524.1 483757 416.9 8487 7 8382 8015 173820 4409 Name: ranc om graph 986 3.07 978 3.58 385 15 364 43 994 364 973 3.02 957 3.58 387 16 370 44 932 365 978 3.10 970 3.70 387 18 363 45 952 366 952 3.13 973 3.48 384 15 365 46 960 363 960 2.90 957 3.55 386 16 363 42 900 362 1010 3.27 961 3.65 387 17 370 51 930 366 Name: fullspider 2163 2.42 2149 3.48 219 19 189 98 1739 133 2210 2.58 2133 3.47 221 18 189 102 1746 131 2188 2.47 2140 3.62 218 18 187 94 1679 130 2255 2.52 2210 3.70 221 18 190 93 1717 133 2200 2.60 2188 3.55 217 18 187 93 1720 132 2279 2.65 2174 3.48 217 17 188 102 1730 129 Table 6.4: Continued 165 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: 3hole 1597 2.05 1485 2.78 215 10 197 108 1500 87 1581 2.10 1527 2.87 220 12 220 104 1494 86 1595 2.08 1527 2.83 217 8 201 121 1492 84 1608 2.03 1525 2.97 222 10 205 127 1496 85 1564 2.10 1526 2.80 217 7 201 132 1496 84 1589 2.10 1555 2.83 216 12 196 115 1498 82 Name: 6hole 1706 2.20 1654 2.67 215 4 203 123 1491 96 1757 2.17 1697 2.95 224 6 209 131 1482 95 1729 2.12 1703 2.83 220 4 207 134 1486 95 1783 2.23 1686 2.78 219 5 205 128 1497 95 1848 2.12 1670 2.88 222 3 207 130 1480 96 1682 2.13 1692 2.82 220 4 207 135 1481 94 N ame: ERIS1176 619 25.7 567 24.9 908 212 724 259 6558 987 632 24.7 566 24.9 912 215 727 262 6558 987 626 25.9 538 24.8 910 214 726 261 6558 987 716 28.8 565 24.9 911 215 725 263 6558 987 663 25.0 541 24.8 909 210 730 261 6558 987 636 24.5 541 24.8 911 215 726 261 6558 987 Name: BCSPWR09 2167 14.5 2103 10.2 1690 107 1593 549 3201 476 2141 14.6 2093 10.2 1694 108 1596 552 3201 476 2157 14.8 2085 9.95 1692 101 1601 559 3201 476 2139 14.6 2061 9.97 1690 103 1597 552 3201 476 2124 14.7 2088 9.85 1686 101 1592 553 3201 476 2164 14.8 2097 10.1 1694 103 1597 555 3201 476 Name: BCSSTK09 54062 48.8 45597 81.3 485 19 421 386 36335 410 48495 40.1 38837 62.2 466 14 397 391 31091 370 48047 39.8 45098 74.3 475 13 407 388 33897 369 46016 40.8 46994 77.6 465 8 400 388 34811 349 49151 42.7 37501 67.7 473 11 422 410 32254 358 43150 37.9 39437 64.4 462 9 404 390 31584 355 Table 6.4: Continued 166 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: BCSSTK10 11990 23.9 11792 37.6 539 284 271 169 13814 209 11714 23.4 11057 38.4 549 282 271 194 13488 212 11511 22.7 11381 33.1 508 268 253 154 13816 204 11805 23.7 10630 40.6 576 329 274 183 14001 191 11766 23.0 11457 34.0 522 276 256 162 13627 212 11378 23.5 11060 35.5 530 288 256 167 13810 199 N [ame: CAN1072 13738 22.3 13741 29.4 640 27 583 552 9862 187 14313 22.1 13809 27.8 640 24 588 554 9869 187 13870 21.9 13688 27.1 640 28 579 553 9910 189 13688 21.9 13532 28.5 640 25 578 525 9875 187 13916 22.0 13963 28.0 640 26 584 555 9871 189 14686 22.6 13990 28.2 640 27 578 529 9859 187 Name: DWT2680 39864 62.9 39570 81.5 1575 46 1497 1312 26706 1023 39654 59.2 38774 83.2 1582 43 1501 1329 26218 863 40641 59.6 38823 81.9 1599 52 1513 1355 26796 1014 41158 62.5 39058 82.9 1589 44 1505 1337 26620 1031 41420 64.3 39790 81.6 1544 41 1499 1338 26264 914 40884 63.1 38134 80.7 1592 47 1511 1338 26244 790 Name: LSHP3025 59714 30.8 58256 45.3 1645 5 1594 886 20532 934 65400 39.5 62727 54.3 1787 5 1726 1015 23762 1255 64634 39.1 62307 54.2 1777 6 1718 1006 23553 1253 65091 40.0 61805 53.9 1784 6 1725 1004 23403 1254 63447 39.2 61460 53.6 1779 5 1719 999 23363 1246 64867 39.6 63103 53.4 1775 3 1715 1008 23564 1209 Name: NOS3 22789 20.2 20509 30.7 455 142 290 270 15232 195 23085 20.7 20118 30.3 457 143 290 268 14772 203 23570 21.6 20999 30.9 452 138 293 272 14495 215 22296 20.6 20909 30.7 453 140 289 266 14816 201 23310 20.8 21191 31.5 469 152 290 273 14926 212 21771 20.6 20698 30.8 447 137 285 267 15067 205 Table 6.4: Continued 167 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: PLAT1919 50442 41.0 50127 68.9 813 61 700 633 37506 510 49627 42.6 52588 70.7 808 63 695 643 37567 509 49018 42.6 52515 69.8 813 54 703 632 37368 505 49352 43.7 50775 68.1 811 60 694 633 37208 505 49930 43.2 49754 69.5 794 58 682 627 37540 515 50138 44.4 50226 69.0 808 59 698 639 37516 509 Name: BLCKHOLE 48140 28.7 43256 35.1 1180 4 1128 549 15394 580 47860 29.0 46610 40.9 1266 6 1212 604 17256 888 47210 29.3 46658 41.3 1280 6 1218 606 17539 880 46787 29.1 46626 41.6 1285 4 1234 622 17501 882 47205 29.1 46521 41.1 1275 5 1215 601 17416 882 46118 28.8 44470 40.1 1261 5 1207 599 17060 892 Name: BCSSTK19 3796 8.88 3719 10.2 539 153 387 300 5832 237 3845 8.62 4417 10.2 521 122 392 294 5810 225 4103 8.70 4146 9.90 523 137 385 282 5856 233 4117 8.98 3986 10.4 536 153 385 297 5806 226 4224 8.97 3991 9.83 531 143 385 276 5790 229 4220 8.75 4326 10.1 521 118 395 294 5816 223 • Vame: 685BUS 1661 5.13 1643 5.83 627 57 568 352 1777 192 1698 5.28 1595 5.68 628 60 568 353 1771 191 1668 5.35 1588 5.58 625 58 567 356 1777 192 1791 5.32 1651 5.67 627 57 568 353 1771 191 1689 5.18 1619 5.72 626 56 568 353 1771 191 1710 5.22 1586 5.68 626 62 562 348 1777 192 ame: ' L138BUS 667 6.38 643 4.82 1133 70 1066 276 1884 467 681 6.30 650 4.78 1131 66 1062 275 1884 467 652 6.38 644 4.78 1131 65 1063 274 1884 467 669 6.38 641 4.80 1132 67 1064 277 1884 467 669 6.33 641 4.80 1132 67 1064 277 1884 467 675 6.33 647 4.73 1131 66 1062 276 1884 467 Table 6.4: Continued 168 Norm Tie Fill Time Fill Time N.S. L.D.S. T.O. A.T. M.R.S. E.N. Name: BCSSTK26 26333 79.7 26401 86.8 1078 391 664 596 20597 626 26753 77.4 26928 89.6 1080 396 666 602 20527 628 27443 81.5 27438 91.0 1082 400 666 599 20392 627 27522 79.1 27227 91.7 1083 401 666 598 20367 630 26089 75.5 27846 89.3 1080 391 667 599 20410 628 26607 78.4 26019 89.6 1081 339 666 602 20542 628 Table 6.4: Continued These values represent significant reductions in fill levels. In each class of problems the fill improvements observed increased as larger and larger graphs were considered. This trend was paralleled by an increasing percentage of tiebreaking opportunities in which the deficiency tiebreaking strategy actually contributed towards a node selection. For both the five point problem of 10000 nodes and the nine point problem of 16641 nodes, in approximately 95% of all tiebreaking opportunities, there were nodes of minimum degree with differing connectivity values. Although six different orderings is a relatively small sample of the initial la-bellings which are possible for each graph, the standard deviation values recorded in Table 6.5 for the five point problems show that for each system the tiebreaking MDA has also increased the fill stability of the orderings. The standard devia-tion values have been reduced by 43 to 63%. The fill values for the tiebreaking ordering of the nine point problems also showed an increased stability except for the 289 node problem. For both the five and nine point problem sets, the CPU time required by the tie module was comparable to the requirements of the norm module. One very encouraging trend observed in these timings is the general reduction in the CPU requirements of the tie module with respect to the norm module as the size of the problems increase. In fact for the largest problem in each set, the tiebreak-169 Norm Tie Name Mean Fill s Mean Fill s I^mprovement 5Pt 400 2633 127 2447 50.5 7.1 5Pt 900 7898 412 7235 174 8.4 5Pt 1600 17092 562 15135 319 11.4 5Pt 10000 182168 3049 142119 1104 22.0 9Pt 289 2576 70.7 2417 96.9 6.2 9Pt 1089 18396 674 15629 630 15.0 9Pt 4225 109757 5537 90224 3186 17.8 9Pt 16641 629647 45368 490111 23742 22.2 random graph 977 20.5 966 8.9 1.1 fullspider 2216 43.3 2166 30.1 2.3 3hole 1589 15.2 1524 22.4 4.1 6hole 1751 59.6 1684 18.4 3.8 ERIS1176 648 36.3 553 14.3 14.7 BCSPWR09 2149 16.7 2088 14.6 2.8 BCSSTK09 48154 3623 42244 4097 12.3 BCSSTK10 11811 260 11486 383 2.8 CAN1072 14032 391 13787 173 1.8 DWT2680 40604 707 39025 597 3.9 LSHP3025 63859 2138 61626 1709 3.5 NOS3 22804 670 20737 385 9.1 PLAT1919 49751 524 50996 1248 -2.5 BLCKHOLE 47220 729 45690 1467 3.2 BCSSTK19 4068 167 4098 254 -0.7 685BUS 1703 46.9 1614 28.5 5.2 1138BUS 689 9.7 644 3.6 6.5 BGSSTK26 26791 583 26977 676 -0.7 Table 6.5: Fill Summary 170 ing algorithm required approximately 20 to 30% less CPU time. Even though fill improvements are increasing as larger problems are considered, the relative execution time of the tie module is decreasing. It possible for the tiebreaking module to require less time because of the reachable set updating scheme intro-duced to the enhanced algorithm, which allows reachable set recalculations to be avoided. The success of the tie module on the five and nine point problems is starkly contrasted by the results for the random graph. Essentially no improvement in the quality of the orderings produced by the deficiency tiebreaking algorithm for this problem was observed. The ineffectiveness of the tiebreaking scheme on this problem is a direct result of the small number of tiebreaking opportunities in which the scheme was able to affect the node selection process. In only 12% of the tiebreaking opportunities was there differing connectivity values amongst the minimum degree candidates. Although the standard deviation is reduced for the tie module, there is not a large improvement in fill stability. The single fill value of 1010 produced by the norm module accounts for most of the difference. When this is not considered, the increase in stability according to the standard deviation values is reduced, with s = 13 for the norm module. These results are typical of random graphs. In general, graphs produced in this fashion lack the local nature of their adjacency structure, with very few short node cycles being present. They typically have very low initial connectivity values or no connectivity at all, as was the case for the random graph considered. As the elimination process proceeds, the connectivity levels of the graph's uneliminated nodes usually remains very small. The connectivity of nodes in several randomly produced graphs were traced through the factorization process. In each case the connectivity of essentially all nodes being selected was 0 until 80 to 90% of the nodes had been eliminated. At this point, the tiebreaking scheme applied to the remaining node selections has little effect on the overall fill level observed. Fortunately, very few realistic problems exhibit such a total lack of local nature to their graphs. The tie module was moderately successful on the 6hole, Shole and fullspider 171 grid problems. For the 6hole problem the mean fill level was reduced by 3.8% using the deficiency tiebreaking, and was accompanied by a large increase in fill level stability. The tie module reduced the fill values of the Shole problem by a comparable amount, but an increase in fill stability was not observed. Finally, the fullspider problem experienced a very slight improvement in fill values and or-dering stability. Although large fill reductions were not observed, the tie module produced an ordering with less fill than for the norm for each initial labelling. The moderate success of the tiebreaking MDA with these three problems is reflected by the number of tiebreaking opportunities in which the deficiency tiebreaking scheme could contribute to the node selection process. The fraction of tiebreaking opportunities in which deficiency tiebreaking was actually applied ranged from 52% for the fullspider, to 63% for the 6hole problem. These values correlate well with the observed performance of the tie module on these three problems. Once again the CPU time requirements of the tie module are compa-rable to those of norm. The tie module required 1.3 to 1.4 times as many CPU seconds as the norm module, which is similar to the increase observed for five and nine point problems of approximately the same size. The effect of the tiebreaking strategy on the problems selected from the Harwell-Boeing sparse matrix collection were mixed. For approximately one half of the problems, the tie module produced orderings with substantially re-duced levels of fill, in comparison to the norm module orderings. The reduction in the mean fill values ranged from 3.5% for the LSHPS025 problem to 14.7% for the ERIS1176 problem. The majority of this subset of problems also showed a substantial reduction in the standard deviation of fill values when going from the norm to tie ordering. The exception to this trend was the BCSSTK09 problem, which showed a 12.3% reduction in the mean fill value but experienced a slight increase in ordering stability. Once again the sample of initial orderings is small and the standard deviation values may not predict ordering stability trends with a great deal of accuracy. Of the remaining seven Harwell-Boeing test problems, four showed a very slight improvement in fill values. Their ordering stability increased or decreased 172 slightly, except for CAN1072, which showed a substantial reduction in the stan-dard deviation of fill values for the tie module. The tiebreaking strategy essen-tially had no effect upon the mean fill levels for the BCSSTK19 and BCSSTKS6 problems and both showed a slight reduction in ordering stability. Finally, for PLAT1919 an increase in the mean fill value of 2.5% for the tie module order-ing was accompanied by a large decrease in ordering stability as shown by the standard deviation values. Once again the CPU seconds required by the tie ordering module were com-parable to the time requirements of norm. For two problems, BCSPWR09 and 11S8BUS, the timings of the tie module were on average 28% and 25% smaller than the CPU seconds required for the norm module. The timings of tie and norm were essentially equal for ERIS1176, BCSSTK19 and 685BUS, while for the remaining nine problems the tie module required additional CPU time. The tim-ings increased by factors ranging from 1.2 to 1.7, with the mean at approximately 1.3. Within each of the Harwell-Boeing test problems, timings are essentially in-dependent of the initial ordering. This timing stability is also observed in all other test problems. The tiebreaking strategy has been clearly shown to be very effective on the five and nine point problems, and it is suspected that the scheme would be equally effective on problems arising from other discretization molecules or banded ma-trices. However, it is difficult to predict to what extent the tiebreaking MDA will reduce fill for more general problems, such as those selected from the Harwell-Boeing sparse matrix collection. It is clear that the initial graph of the system must have a certain local nature, missing from most random graphs, for the tiebreaking algorithm to have sufficient opportunities to influence the selection process and effect fill levels. The inital connectivity of a graph is often a good in-dication of this property but many exceptions do exist. The five point problems, for example, have a very high degree of local nature to their adjacency structure but have an initial connectivity of zero in each case. Connectivity only considers the number of three member cycles and totally ignores the four member cycles of these graphs, which quickly result in increased connectivity as the first elimina-173 tions take place. Other than this local structure, no other common characteristic of the inital graphs was identified upon which tie was most successful. There are too many variable and graph characteristics which together contribute towards the overall fill level of a given ordering. As outlined in Chapter 5, in the current implementation of the tie module, once a node's reachable set is requested, it is stored until the node is eliminated or absorbed into an uneliminated supernode as a nonrepresentative member. It is not difficult to think of an example for which the storage of reachable sets would quickly expand to unacceptable levels. However, for all 26 test problems, the reachable set storage requirements were well bounded and the length of the rchstor array required was comparable to the length of the communal adjacency list storage in each case. It must be remembered that each node number stored in the rchstor structure requires two integer storage locations (a pointer and node number), while a member of an adjacency list requires slightly more than one integer location. (The exact storage requirement depends upon the matrix's density.) For the five and nine point problems, the length of rchstor required for reachable set storage ranged from 0.94 to 1.3 times the length of the graph's initial adjacency lists. For the fullspider, Shole and 6hole problems, the length of the reachable set storage required was approximately the same or slightly smaller than the length of the appropriate adjacency lists. Finally, the ratio of the maximum number of rchstor records used to the length of the adjacency lists for the Harwell-Boeing test problems ranged from 0.38 for ERIS1176 to 2.0 for BCSSTK09. The latter problem was an exception, with the next highest reachable set storage requirement being for the BLCKHOLE problem with a ratio of 1.4. For the majority of Harwell-Boeing problems, however, the required length of the reachable set storage array was less than the size of the problem's adjacency lists. As a conservative rule of thumb, selecting the length of the reachable set storage to be 1.4 times the length of the adjacency list structure seems the most reasonable when very little is known about a new problem. As mentioned in Chapter 5, the reachable set storage scheme selected for 174 implementation is very simplistic. It represents the most storage intensive ex-treme of a whole spectrum of possible storage schemes. The favorable timing results observed for the tie module will allow for considerable experimentation. For many problems, such as the large five and nine point problems, the size of the reachable set storage would have to be substantially restricted, forcing the recalculation of reachable sets, before the tie module would require more CPU time than the norm module. In summary, the tiebreaking enhancement of the quotient graph form of the MDA has met, at least in part, the goals set out in Section 3.2 for more than two thirds of the test problems. The deficiency tiebreaking was found to be the most effective on the five and nine point problems, with reductions of up to 22% in the level of fill observed for two test cases. The CPU time requirements of the tie module are comparable to the norm module timings and the storage requirements for reachable sets is well bounded. 6.4 Factor and Solve Timings For a select group of problems, the effect of a fill reduced ordering upon the remainder of the solution process was investigated by comparing the factorization and solution step timings for different orderings. Only those problems were considered for which the nonzero values of the matrix were available and for which the tiebreaking strategy significantly reduced the observed levels of fill. For each of the nine problems considered, the system was solved using the norm ordering exhibiting the highest level of fill and the tie ordering with the lowest level of fill. Table 6.6 summarizes the solution times observed for the remainder of the solution process. Two lines of data are recorded for each problem. The norm ordering is used to produce the data of the first line, while the data illustrated in the second line was observed for the tie ordering. The fill levels in the third column of the table are used to identify the particular orderings used in the testing. The next two columns record the CPU seconds required to factor and 175 Name Fill Factor Time Solve Time Total Time 5Pt 400 norm 2792 5.93 1.42 14.7 tie 2393 4.68 1.28 12.7 5Pt 900 norm 8334 22.4 3.97 45.0 tie 7007 17.0 3.4 37.5 5Pt 1600 norm 17653 57.5 8.10 100.9 tie 14691 42.1 6.97 82.1 5Pt 10000 norm 184646 1287 79.6 1662 tie 140470 749.9 63.1 1068 9Pt 289 norm 2676 7.28 1.45 16.1 tie 2278 5.63 1.32 14.9 9Pt 1089 norm 19223 81.3 8.97 128.1 tie 14586 48.7 7.23 92.3 9Pt 4225 norm 115643 816.4 50.4 1055 tie 83742 416.6 38.3 625.7 BCSSTK09 norm 54062 529.9 24.0 642.4 tie 37501 269.0 17.7 357.7 NOS3 norm 23570 137.3 11.7 201.3 tie 20118 103.8 10.3 163.4 Table 6.6: Solution Time Summary 176 solve the two triangular systems. The final column of the table records the total number of CPU seconds required for the entire solution process, excluding the ordering time. This total includes the symbolic factorization, explicit factoriza-tion and solve steps. It should be emphasized that the symbolic factorization was not conducted in the conventional manner. In addition, the lower adjacency structure was transferred by a UNIX pipe between the norm or tie and solve modules. Both the output and input costs of this transfer are included in the total. The total time was only included to give a very general idea of the overall time requirements of the solution process. In each case, the CPU seconds saved in the factor and solve steps was sub-stantial when fill reduced orderings were used. As expected, due to the higher complexity of the factorization process, the reduction in the factor time began to outweigh the reduction in solve time as the size of problems increased. In all cases the savings in factorization time alone more than recovered any additional time required by the tie module over that needed by norm. In some larger problems the savings in factor time were larger than the tie ordering time itself. In addi-tion, for a given reduction in fill, the factorization time in most cases, is reduced by a larger factor. For example, a 24% reduction in fill for the 10000 node five point problem translates into a 42% reduction in the factorization time, while for BCSSTK09 a 30% reduction in fill decreases the corresponding factorization time by 50%. The precise manner in which an ordering of a given fill reduction affects the remainder of the solution process depends upon many factors, includ-ing the relative size of the fill value to the original number of nonzeros in the system's matrix. 6.5 Secondary Tiebreaking Test Results As described in Chapter 4, essentially three different secondary tiebreaking strategies were attempted. All three strategies were implemented as outlined in previous chapters and applied to the test problems described in Section 6.2. The first secondary tiebreaking strategy totally failed to meet any of the goals 177 set out in Section 4.1. There was no observed improvement in fill levels for the orderings of any of the test problems using the secondary tiebreaking algorithm in comparison to the MDA enhanced by deficiency tiebreaking alone. Nor was there any signifigant improvement in ordering stability. For larger problems the CPU time required by sec was 5 or 6 times the corresponding value for the tie module and the timing were very sensitive to the initial ordering of a problem. In addition, the times were increasing more rapidly than the tie or norm timings as larger problems were considered. When the execution of a small problem was traced, it was observed that the secondary tiebreaking strategy affected the node selection process in less than 10% of the cases in which more than one minimum degree, maximal connectivity node existed in the graph. Opportunities to apply the first secondary tiebreaking scheme were too infre-quent for the scheme to have a noticable effect on fill levels. In addition, when the scheme was able to participate in the node selection process, the effects of this first criterion were too broad. The second strategy attempted to rectify this situation by restricting attention to the change in connectivity which would be experienced by minimum degree nodes upon the candidate's elimination. This second strategy, however, had no more success than the first scheme and also dramatically increased the ordering times. When the second strategy was mod-ified by averaging the connectivity change for each proposed candidate over the number of minimum degree nodes, there was still no improvement in the quality of the orderings in comparison to those produced by the deficiency tiebreaking MDA. The third secondary tiebreaking scheme switched attention from connectiv-ity effects to the manner in which degrees of uneliminated nodes are affected by eliminations. When the deficiency tiebreaking MDA, tie, was augmented by this scheme to form the sec module, the resulting program was very successful on a limited subset of the test problems. Table 6.7 summarizes the test data accu-mulated for these problems. The six orderings of each problem used for testing the sec module are the same as those reported in Section 6.3.1, allowing direct comparison of fill values and timings. The secondary tiebreaking opportunities 178 recorded for each test case represents the number of node selections for which more than one node of minimum degree, maximal connectivity is present. The average scan records the average number of nodes tested during each application of the secondary tiebreaking criteria. In other words, it represents the aver-age number of minimum degree, maximal connectivity candidates present when there is an opportunity to apply the secondary tiebreaking criterion. For the remainder of the test problems, the sec ordering progam did not improve the quality of orderings produced in comparison to those of the tie module. The test results observed for the application of sec to these problems is recorded in Appendix A. The sec module experiences the most success upon the fill levels of the nine point problems. For each of these four problems, the fill levels are completely stabilized at a very low fill level. For the three smaller problems, the fill level differed by only one fill entry from the lowest fill level observed for the tie module, and changed by at most one fill between different initial orderings. The 16641 problem also exhibited a similar stabilization and the fill value was within 0.3% of the lowest fill value observed when tie was applied to the same group of initial orderings. The timings for the sec module, using the third secondary tiebreaking criterion, are not as high as for the other two strategies. However, CPU time requirements did increase, by a factor ranging from 1.4 for the 289 node problem to 3.0 for the 16641 node version, from those of tie on the same graphs. This increase in the relative time requirements is matched by an increase in the average number of nodes considered by each secondary tiebreaking. In addition, the timings were found to be relatively independent of the initial ordering, unlike the previous two strategies. When the tie ordering program was applied to the problem BCSSTK19, there was no apparent improvement in the quality of orderings from those of the norm module. However, in comparison to the norm orderings, a 9% reduction in the mean fill level for orderings of this problem was achieved by the sec module. In addition, the standard deviation is reduced from 167 for norm to 33 for the sec module. The ordering time required by the sec module is 3.0 and 2.6 times that 179 Name Fill Time STO Average Scan 9Pt 289 2279 5.32 127 8.7 2279 5.37 100 7.3 2280 5.37 121 8.1 2279 5.35 86 8.2 2279 5.37 101 7.0 2279 5.35 100 7.6 9Pt 1089 14587 29.4 507 14.3 14588 30.9 344 13.8 14588 30.1 325 15.2 14587 30.7 318 15.7 14587 30.0 415 10.7 14588 30.0 409 13.0 9Pt 4225 83743 186 2039 23.2 83744 197 1411 26.0 83744 206 1154 31.8 83744 199 1395 26.7 83744 205 1353 25.0 83744 201 1317 26.3 9Pt 16641 446307 1122 8175 39.7 446308 1305 5656 59.8 446307 1295 4577 90.0 446308 1220 4356 75.8 446308 1161 5497 64.2 446307 1350 6120 65.0 BCSSTK19 3689 26.8 337 37.5 3681 25.8 326 30.5 3644 26.1 322 28.8 3715 26.4 333 29.2 3654 26.6 331 29.6 3729 25.9 338 28.6 Table 6.7: Testing of Secondary Tiebreaking 180 required by norm and tie respectively on this problem. Despite the dramatic success of the third strategy upon a very restricted set of problems, in general the secondary tiebreaking strategies have failed to meet the goals set out in Section 4.1. The strategies were intended to improve the qualities of orderings by increasing the global nature of the algorithm. Unfortunately, the local nature of the minimum degree and maximal connectivity criteria proved to play too dominant a role in the node selection mechanism. The secondary schemes had little chance to significantly change the nature of the algorithm playing such a tertiary role. 181 Chapter 7 Concluding Remarks 7.1 Summary This thesis has discussed the ordering of sparse, symmetric, positive definite systems of linear equations using the minimum degree algorithm. The develop-ment of the MDA was traced from its origins in Markowitz's algorithm, through the elimination graph and reachable set models, to the popular quotient graph model. The introduction of indistinguishable sets in the form of uneliminated supernodes was considered, along with the external degree modification of the node selection process. Two additional extensions of the MDA, incomplete de-gree updating and multiple eliminatons, were briefly outlined, although neither was considered by the ordering algorithm implementations developed. In the original form of the MDA, the selection of an elimination node from a group of minimum degree candidates was arbitrary. However, it was shown that the precise manner in which these ties were broken could significantly affect the quality of orderings produced. A tiebreaking strategy based upon the deficiency of elimination candidates was introduced to remove some of the random nature of the selection process, and to improve the quality of orderings produced. The enhancement of the MDA by this additional selection criterion was dicussed in detail. It was found that the tiebreaking strategy could be very successfully integrated into the MDA using a quotient graph model with uneliminated su-pernodes. The discussion showed that the introduction of deficiency tiebreaking 182 did not negate the definite advantages of indistinguishable sets and that the mass elimination of selected supernodes was still permissible. An algorithm for the calculation of a node's deficiency was developed, based upon the connectivity of a node's reachable set. It was found that both the external degree and ex-ternal connectivity values should be used for an uneliminated supernode in the enhanced node selection process. Finally, a brief discussion of the compatability of deficiency tiebreaking with the multiple elimination and incomplete degree updating extensions was included. Even with the deficiency tiebreaking enhancement, it was observed that for many problems more than one potential elimination candidate still existed for a significant proportion of the node selections. The manner in which these ties were resolved once again proved to affect ordering quality. Three additional tiebreaking strategies were introduced to play a tertiary role to the minimum degree and minimum deficiency criteria of the node selection process. All three schemes attempted to increase the global nature of the enhanced MDA. The first two strategies focussed upon the connectivity values of nodes remaining in the quotient graph, while the third attempted to prolong the elimination of lower degree nodes. The implementation of the normal MDA and the MDA with deficiency tiebreaking was discussed in detail, stressing the performance sensitivity of the re-sulting code to important design decisions. For the tiebreaking MDA implemen-tation, the most sensitive areas were identified as connectivity list maintenance and connectivity and degree updating. A novel approach to degree updating was introduced to the tiebreaking MDA which avoided total degree recalculation for nodes affected by an elimination. A brief discussion of the secondary tiebreak-ing implementation was also included, highlighting the central implementation feature of each of the three strategies. For testing purposes the remainder of the solution process was implemented, although no new research was conducted in this area. The developed ordering progams were tested extensively on a wide range of symmetric, sparse systems. The deficiency tiebreaking extension of the MDA was 183 very successful, with the goals of reduced fill levels and increased fill stability met, at least in part, for more than two thirds of the test problems. For the remainder of the test problems the quality of orderings observed was affected very little, with no problems exhibiting orderings of substantially reduced quality. The deficiency tiebreaking extension was found to be most effective on the five and nine point problems, with mean fill levels reduced up to 22% for the largest problems in each class. The enhanced algorithm was unsuccessfully applied to a random graph which initially contained very few cycles consisting of a relatively small number of nodes. This problem emphasized the importance of a local nature, to a graph's adjacency structure, if an application of deficiency tiebreaking is to be successful. The CPU time requirements of the deficiency tiebreaking MDA implementation were comparable to the timings of the normal MDA implementation, and in several cases were substantially lower. The reachable set storage required by the deficiency tiebreaking scheme was comparable to the storage requirements of the adjacency list storage in each problem. For several problems, the substantial impact of fill reduced orderings on the remainder of the solution process was demonstrated. The third secondary tiebreaking strategy presented in Chapter 4 dramatically stabilized the fill levels of orderings for the nine point problems and reduced the fill levels of one symmetric stiffness matrix. Otherwise the secondary tiebreaking schemes failed to meet the goals of reduced fill levels and increased stability. Although these schemes appear to have little practical value in their current form, their development provided useful insight into the precise manner in which a node's elimination affects the remaining uneliminated nodes of a graph. 7.2 Future Work In the current form of the MDA enhanced by deficiency tiebreaking, no attempt was made to incorporate the multiple elimination and incomplete degree up-dating extensions. Although Section 3.10 identified the major issues concerning the compatability of the deficiency tiebreaking scheme with each extention, the 184 discussion is far from complete. Additional research is required to character-ize the effect of each extension on the efficiency of the deficiency tiebreaking strategy, and to decide whether the improved ordering quality exhibited can be maintained. Reachable set requests significantly increase upon the introduction of de-ficiency tiebreaking. In the current implementation, in order to improve the execution time efficiency of the enhanced ordering program, a storage inten-sive scheme was introduced to reduce the number of reachable set reformations required. As mentioned in Section 5.4.4, this approach represents only one ex-treme of a spectrum of possible schemes for reachable set treatment. The favor-able timings of the deficiency tiebreaking MDA program reported in Chapter 6 will permit less storage intensive schemes, in which more reachable sets need to be completely redetermined when requested. A more sophisticated and flexible storage scheme, which allows the level of reachable set storage to be adjusted to particular computing environments and problems, certainly warrants further investigation. For the group of test problems considered in Chapter 6, very few common characteristics could be identfied for the problems upon which the deficiency tiebreaking scheme was most successful. A reliable mechanism for classifying problems, prior to their ordering by a particular version of the MDA, would be very useful. Finally, even with the deficiency tiebreaking enhancement of the MDA, the algorithm remains non-deterministic. As a result, there is still much room for additional research into the tiebreaking alternatives for the minimum degree ordering algorithm. 185 Bibliography [DE73] B. Dembart and A. M. Erisman. Hybrid sparse-matrix methods. IEEE Transactions on Circuit Theory, 20(6):641-649, November 1973. [DER86] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Claredon Press, Oxford, 1986. [DR83] I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Soft-ware, 9(3):302-325, September 1983. [Duf84] I. S. Duff. Research directions in sparse matrix computations. In G. H. Golub, editor, Studies in Numerical Analysis, pages 83-139, The Mathematical Association of America, 1984. [Duf86] I. S. Duff. Comments on the Solution of Sparse Linear Equations. Technical Report, Computer Science and Systems Division, Harwell Laboratory, Oxfordshire, 1986. [ESS75] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Considerations in the design of software for sparse gaussian elimination. In J. R. Bunch and D. J. Rose, editors, Sparse Matrix Computations, pages 263-273, Academic Press, New York, 1975. [ESS81] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Algorithms and data structures for sparse symmetric gaussian elimination. Siam Jour-nal on Scientific and Statistical Computing, 2(2):225-237, 1981. 186 [Geo81] A. George. Direct solution of sparse positive definite systems. In I. S. Duff, editor, Sparse Matrices and Their Uses, pages 283-306, Academic Press, New York, 1981. [GG75] W. M. Gentleman and A. George. Sparse matrix software. In J. R. Bunch and D. J. Rose, editors, Sparse Matrix Computations, pages 243-261, Academic Press, New York, 1975. [GL78a] A. George and J. W. H. Liu. An automatic nested dissection algorithm for irregular finite element problems. Siam Journal of Numerical Anal-ysis, 15(5):1053-1069, October 1978. [GL78b] A. George and J. W. H. Liu. A quotient graph model for symmetric factorization. In I. S. Duff and G. W. Stewart, editors, Sparse Matrix Proceedings, pages 154-175, SIAM, 1978. [GL78c] A. George and J. W. H. Liu. User Guide for SPARSPAK: Waterloo Sparse Linear Equations Package. Research Report CS-78-30, Depart-ment of Computer Science, University of Waterloo, Waterloo, Ontario, 1978. [GL80] A. George and J. W. H. Liu. A fast implementation of the minimum degree algorithm using quotient graphs. ACM Transactions on Math-ematical Software, 6(3):337-358, September 1980. [GL81] A. George and J. W. H. Liu. Computer Solution of Large Sparse Pos-itive Definite Systems. Prentice-Hall, Inc., New Jersey, 1981. [GL87] A. George and J. W. H. Liu. The evolution of the minimum degree ordering algorithm. 1987. (to appear). [GR81] A. George and H. Rashwan. On symbolic factorization of partitioned sparse symmetric matrices. In A. Bjorck et al., editors, Large Scale Matrix Problems, pages 145-157, North Holland, 1981. [GvL83] G. H. Golub and C. F. van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 1983. 187 [HG72] H. Y. Hsieh and M. S. Ghausi. A probabilistic approach to optimal pivoting and prediction of fill-in for random sparse matrices. IEEE Transactions on Circuit Theory, 19(4):329-336, July 1972. [Liu85] J. W. H. Liu. Modification of the minimum-degree algorithm by multiple elimination. ACM Transactions on Mathematical Software, 11(2):141-153, June 1985. [Mar57] A. M. Markowitz. The elimination form of the inverse and its applica-tion to linear programming. Management Science, 3:255-269, 1957. [Ros72] D. J. Rose. A graph theory study of numerical of the numerical solution of sparse positive definite systems of linear equations. In R. C. Read, editor, Graph Theory and Computing, pages 183-217, Academic Press, New York, 1972. [TW67] W. F. Tinney and J. W. Walker. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11).-1801-1809, November 1967. [Var62] R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Inc., New Jersey, 1962. [Yan8l] M. Yannakakis. Computing the minimum fill-in is NP-complete. Siam Journal on Algebraic and Descrete Methods, 2(l):77-79, March 1981. [You71] D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1971. 188 Appendix A Additional Secondary Tiebreaking Testing Results This appendix records the secondary tiebreaking results of the third criterion which were not included in Chapter 6. 189 Name Fill Time STO Average Scan 5Pt 400 2450 6.58 240 11.8 2444 6.37 227 8.8 2450 6.38 232 9.3 2492 6.23 229 9.2 2457 6.37 226 9.5 2516 6.20 225 9.6 5Pt 900 7156 19.7 563 19.7 7022 17.4 409 20.4 7544 18.3 442 14.7 7381 18.3 452 15.4 7405 18.5 427 16.3 7658 18.0 481 14.5 5Pt 1600 15130 42.3 1051 26.8 14846 36.7 891 14.8 14924 35.8 788 17.4 15471 36.6 699 23.4 15328 35.2 699 21.2 15543 37.5 714 20.2 5Pt 10000 140740 680 6739 74.4 146485 434 4144 32.3 144102 495 4604 32.2 142858 417 4384 32.4 144145 521 4318 32.3 140912 400 4100 33.5 random graph 964 9.48 342 53.4 999 9.47 347 52.9 999 9.58 343 54.4 999 9.45 346 52.9 964 9.50 344 54.2 964 9.53 344 53.7 fullspider 2103 7.10 143 22.3 2217 6.80 140 18.9 2160 6.88 141 18.6 2207 7.15 145 19.3 2125 7.03 145 19.9 2277 6.88 142 18.3 Table A . l : Testing of Secondary Tiebreaking Criterion #3 190 Name Fill Time STO Average Scan 3hole 1525 6.35 169 22.9 1574 6.58 159 23.2 1544 6.55 164 22.1 1423 6.57 159 22.7 1538 6.58 162 22.8 1541 6.43 164 22.9 6hole 1717 5.90 174 24.5 1641 5.85 168 23.2 1725 5.83 170 22.8 1687 5.83 170 23.4 1716 5.83 169 24.0 1729 5.92 170 21.9 ERI.S1176 567 52.7 645 65.5 566 52.9 648 66.9 538 52.5 648 65.9 565 52.6 646 65.9 541 52.7 647 66.0 538 52.4 646 66.1 BCSPWR09 2083 76.8 1446 159 2081 76.6 1445 161 2086 77.0 1439 160 2091 76.4 1448 159 2111 76.4 1430 162 2123 76.3 1448 160 BCSSTK09 46243 97.4 245 5.0 45144 95.2 130 4.5 44705 95.6 235 4.7 43495 95.8 219 4.6 45144 95.4 185 3.8 44463 95.1 195 3.8 BCSSTK10 11732 61.0 259 19.8 11462 59.4 262 16.1 11821 58.2 237 18.9 11414 61.8 267 16.5 11862 57.2 239 17.4 11727 57.5 247 17.1 Table A . l : Continued 191 Name Fill Time STO Average Scan CAN1072 13836 123 490 93.9 13624 121 486 77.3 13664 132 475 79.7 13285 120 488 77.8 14432 123 485 82.7 13836 123 490 93.9 DWT2680 39484 144 1073 29.8 39022 143 995 31.1 39844 141 1014 29.5 39361 143 973 32.3 39730 143 979 33.7 39020 143 997 32.2 LSHP3025 63362 1189 1318 592 66225 1230 1383 485 67165 1231 1377 490 65906 1225 1379 490 65298 1226 1373 487 70346 1252 1393 486 N0S3 21536 42.7 215 9.6 21203 39.1 206 7.7 20208 38.9 207 7.4 21208 39.1 209 7.4 21533 40.2 212 8.3 20892 41.5 204 7.9 PLAT1919 51069 85.8 390 7.5 51645 87.7 385 7.9 50057 87.1 388 7.6 50617 83.7 381 8.1 50909 83.1 370 6.5 50766 86.5 386 8.5 BLCKHOLE 42698 520 898 477 48447 589 947 349 48873 589 959 344 48149 590 940 352 49204 619 937 350 48992 589 932 358 Table A. l : Continued 192 Name Fill Time STO Average Scan 685BUS 1630 11.9 7 462 31.4 1592 11.7 459 31.6 1586 11.5 453 31.2 1599 11.6 463 31.4 1645 11.8 469 31.3 1590 11.6 455 31.7 1138BUS 642 41.8 966 145 641 41.7 959 138 643 41.9 962 142 641 41.8 961 142 644 41.8 958 140 648 41.8 958 141 BCSSTK26 26363 123 487 20.1 27415 131 479 20.3 26415 124 484 19.6 27183 132 487 19.9 27805 126 483 19.8 27184 130 487 19.4 Table A . l : Continued
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Tiebreaking the minimum degree algorithm for ordering...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Tiebreaking the minimum degree algorithm for ordering sparse symmetric positive definite matrices Cavers, Ian Alfred 1987
pdf
Page Metadata
Item Metadata
Title | Tiebreaking the minimum degree algorithm for ordering sparse symmetric positive definite matrices |
Creator |
Cavers, Ian Alfred |
Publisher | University of British Columbia |
Date Issued | 1987 |
Description | The minimum degree algorithm is known as an effective scheme for identifying a fill reduced ordering for symmetric, positive definite, sparse linear systems, to be solved using a Cholesky factorization. Although the original algorithm has been enhanced to improve the efficiency of its implementation, ties between minimum degree elimination candidates are still arbitrarily broken. For many systems, the fill levels of orderings produced by the minimum degree algorithm are very sensitive to the precise manner in which these ties are resolved. This thesis introduces several tiebreaking enhancements of the minimum degree algorithm. Emphasis is placed upon a tiebreaking strategy based upon the deficiency of minium degree elimination candidates, and which can consistently identify low fill orderings for a wide spectrum of test problems. All tiebreaking strategies are fully integrated into implementations of the minimum degree algorithm based upon a quotient graph model, including indistinguishable sets represented by uneliminated supernodes. The resulting programs are tested on a wide variety of sparse systems in order to investigate the performance of the algorithm enhanced by the tiebreaking strategies and the quality of the orderings they produce. |
Subject |
Algorithms Linear systems -- Data processing Sparse matrices |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0051477 |
URI | http://hdl.handle.net/2429/27855 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1988_A6_7 C38.pdf [ 10.91MB ]
- Metadata
- JSON: 831-1.0051477.json
- JSON-LD: 831-1.0051477-ld.json
- RDF/XML (Pretty): 831-1.0051477-rdf.xml
- RDF/JSON: 831-1.0051477-rdf.json
- Turtle: 831-1.0051477-turtle.txt
- N-Triples: 831-1.0051477-rdf-ntriples.txt
- Original Record: 831-1.0051477-source.json
- Full Text
- 831-1.0051477-fulltext.txt
- Citation
- 831-1.0051477.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051477/manifest