ON THE EFFICIENCY OF CLIQUE DETECTION IN GRAPHS by ANTHONY HUNTER DIXON B.Sc. , University of British Columbia, 1968 M.Sc, University of British Columbia, 1970 fl THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in the Department ' of-COMPUTER SCIENCE We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA May,1973 In present ing t h i s thes is in p a r t i a l f u l f i l m e n t o f the requirements fo r an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree that the L i b r a r y sha l l make i t f r e e l y a v a i l a b l e for reference and study. I fu r ther agree that permission for extensive copying of th is t h e s i s fo r s c h o l a r l y purposes may be granted by the Head of my Department or by h is representa t ives . It i s understood that copying or p u b l i c a t i o n o f th is t h e s i s f o r f i n a n c i a l gain s h a l l not be allowed without my wr i t ten permiss ion . Department of Computer Science The U n i v e r s i t y of B r i t i s h Columbia Vancouver 8, Canada Date June 18, 1973 II ABSTRACT This thesis examines the devices employed by various algorithms to search for maximal complete subgraphs in graphs. Also known as cliques,in Chapter 1 these subgraphs are seen to play an important role in graph theory, information retrieval, Sociometry, logic design, and computational complexity. The enumeration of cliques using the Harary-Rcss, Bonner, Peay, and Bron-Kerbosch algorithms is discussed in Chapter 2. The Reduced Redundancy algorithm is introduced, and the performance of the five procedures is assessed using an alternative approach to empirical testing. Each of the algorithms is shown to generate a "derivation tree" for a given graph whose size can be used as a measure of its efficiency. In Chapter 3, the possibility of exploiting vertex similarity is examined with a view to reducing the size of the derivation tree. As a consequence, algorithms are proposed for finding non-similar cliques. The concept of "complete subgraph equivalence" of vertices is introduced to develop a means for expressing the cliques of a graph as the Cartesian product cf vertex subsets. An algorithm for detecting the existence cf a clique cf order k in a certain class of k-partite graphs in polynomial time is proposed in Chapter 4. This class consists of a l l graphs reducible by the algorithm to k-partite graphs having at most two vertices per block of degree greater than 0. This ii) algorithm is shown to provide an efficient heuristic which can be used in a procedure for determining whether a well-formed formula is a tautology. The thesis is concluded with an empirical analysis of the techniques employed by the enumeration algorithms of Chapter 2. In addition, the efficiency of the Clique Detection algorithm is compared with that of the Reduced Redundancy algorithm. IV TABLE OF CONTENTS CHAPTER 1 : INTRODUCTION 1.1 DEFINITIONS AND NOTATION 1 1.2 INTRODUCTORY REMARKS 5 1.3 HISTORICAL SURVEY 6 CHAPTER 2 : ENUMERATION OF CLIQUES 2.1 INTRODUCTION 16 2.2 MATHEMATICAL ANALYSIS OF ENUMERATION ALGORITHMS 18 2.3 ANALYSIS OF THE HARARY-ROSS ALGORITHM 22 2.3.1 NOTATION 23 2.3.2 THE ALGORITHM 24 2.3.3 NUMBER OF VERTEX SETS EXAMINED 35 2.3.4 STORAGE REQUIREMENTS 38 2.4 BONNER * S ALGORITHM 41 2.4.1 NOTATION 42 2.4.2 THE ALGORITHM 42 2.4.3 NUMBER OF VERTEX SETS GENERATED 46 2.4.4 STORAGE REQUIREMENTS 50 2.5 ANALYSIS OF PEAY'S ALGORITHM 51 2.5.1 NOTATION 52 2.5.2 THE ALGORITHM 53 2.5.3 NUMBER OF VERTEX SETS GENERATED 57 2.5.4 STORAGE REQUIREMENTS 64 2.6 A NEW ENUMERATION ALGORITHM 66 2.6.1 DESCRIPTION OF THE ALGORITHM 70 2.6.2 NOTATION 70 2.6.3 THE REDUCED REDUNDANCY ALGORITHM 71 V 2.6.4 NUMBER OF VERTEX SETS GENERATED 72 2.6.5 ANALYSIS OF AM ITERATION 75 2.6.6 STORAGE REQUIREMENTS 76 2.7 THE BRON-KERBOSCH ALGORITHM 78 2.7.1 NOTATION 79 2.?.2 THE ALGORITHM 79 2.7.3 NUMBER OF VERTEX SETS GENERATED 84 2.7.4 ANALYSIS OF ONE ITERATION 85 2.7.5 STORAGE REQUIREMENTS 88 CHAPTER 3 : CLIQUE DETECTION USING VERTEX SIMILARITY 3.1 INTRODUCTION 89 3.2 POINT AND LINE SYMMETRIC GRAPHS 91 3.3 DETERMINATION OF NON-SIMILAR CLIQUES 98 3.4 ALGORITHM FOR FINDING NON-SIMILAR CLIQUES 102 3.5 DISCUSSION OF THE ALGORITHM 105 3.6 ANOTHER APPLICATION OF ORBI/TAL PARTITIONING 108 3.7 ANOTHER APPROACH TO DESCRIBING THE CLIQUES 111 CHAPTER 4 : EFFICIENT DETECTION OF CLIQUES 4.1 INTRODUCTION 126 4.2 CLIQUE DETECTION AND SATISFIABILITY 127 4.3 CLIQUE DETECTION ALGORITHM 130 4.4 TIMING CONSIDERATIONS 138 4.5 STORAGE CONSIDERATIONS 139 4.6 INPLICATIONS OF THE PROCEDURE 140 CHAPTER 5 : EMPIRICAL OBSERVATIONS AND SUMMARY 5.1 INTRODUCTION 141 5.2 THE TEST DATA 142 5.3 DISCUSSION OF THE RESULTS 145 vi 5.1 SUMMARY 149 BIBLIOGRAPHY 151 APPENDIX A 156 APPENDIX B 157 APPENDIX C 169 LIST OF TABLES TABLE 4.1 BESULTS OF THE ALGORITHM FOR FIG. 4.1 TABLE 5.1 CHARACTERISTICS OF THE TEST GRAPHS TABLE 5.2 COMPARISON OF ALGORITHMS TABLE 5.3 COMPARISON OF DETECTION AND ENUMERATION V'IU LIST OF FIGURES FIG. 2. 1 K (3,3,3) 33 FIG. 2. 2 HARAR Y-ROSS ALGORITHM: DERIVATION TREE 34 FIG. 2. 3 SUBTREE OF DERIVATION TRIE 37 FIG. 2. 4 A PATH OF THE DERIVATION TREE 40 FIG. 2. 5 BONNER'S ALGORITHM: DERIVATION TREE 49 FIG. 2. 6 SUBTREE OF THE DERIVATION TREE 59 FIG. 2. 7 PEAY'S ALGORITHM: DERIVATION TREE 62 FIG. 2. 8 DERIVATION TREE FOR K(3,3) 63 FIG. 2. 9 MINIMAL COVER COUNTEREXAMPLE 69 FIG. 2. 10 MAXIMAL COVER COUNTEREXAMPLE 69 FIG. 2. 11 REDUCED REDUNDANCY ALGORITHM: DERIVATION TRIE 74 FIG. 3. 1 POINT SYMMETRIC GRAPH 93 FIG. 3. 2 INDUCED SUBGRAPH OF FIG. 3.1 93 FIG. 3. 3 POINT-SYMMETRIC GRAPH NOT LINE-SYMMETRIC 94 FIG. 3. 4 GRAPH WITH ALL CLIQUES NON-SIMILAR 107 FIG. 3. 5 A PATH OF DERIVATION OF NON-SIMILAR CLIQUES 120 FIG. 3. 6 DERIVATION TREE FOR K (3,3,3,3) 124 FIG. 3. 7 DERIVATION TREE FOR FIG. 3.6 125 FIG. 4. 1 GRAPH CONTAINING SUBGRAPH K(15) • 136 ACKNOWLEDGEMENT I w i s h t o thank my s u p e r v i s o r , P r o f e s s o r A . Mowshowitz, f o r h i s a s s i s t a n c e both a c a d e m i c a l l y and 0 f i n a n c i a l l y i n the development and p r e s e n t a t i o n of t h i s t h e s i s , and a l s o P r o f e s s o r E. Lawle r f o r b r i n g i n g to my a t t e n t i o n the importance of the pr o b l e m . F i n a l l y , I w i s h t o thank my w i f e , P a t t y , f o r her a s s i s t a n c e i n the p r e p a r a t i o n of t h i s m a n u s c r i p t . 1 CHAPTER U INTRODUCTION 1. 1 DEFINITIONS AND NOTATION The purpose of this thesis will be to examine procedures which search for maximal complete subgraphs cf a graph. In particular the manner in which algorithms enumerate these subgraphs will be explored in order to discover why such algorithms have an exponential computation time. The problem of detecting the existence cf a complete subgraph cf a particular order will also be explored and an algorithm will be proposed which uses a different method from that of enumeration. This technigue is instrumental in improving the efficiency of the procedure of clique detection, and fcr a particular class cf graphs the algorithm can be shown to have a polynomial computation time. First we shall provide a fairly complete l i s t of definitions pertinent to the problem at hand. Unfortunately, there is no universally accepted terminology in graph theory and for this reason the author has chosen his definitions to be compatible where possible with the widely known text of Harary [43]. The definitions will also serve to introduce the notation to be used in the body cf this text for the concepts defined. DEFINITION 1.1: A jjrach G consists of a finite set V (G) of vertices together with a prescribed subset E(G) of unordered pairs of elements from V (G) called the edges of G. If <u,v) is 2 an edge of G then the vertices u and v are said to be adjacent. DEFINITION JL2J. The degree , djyj. , of a vertex v in a graph G is the number of vertices in G adjacent to v. DEFINITION 1.3^ A labelling, of a graph G with | V (G) | = n vertices is an assignment of n distinguishing labels, one to each of the vertices in V(G). DEFINITION lz.Hl A k-partite graph GJm £n>2l... tE^l i s a graph whose vertices can be partitioned into k blocks V , V ,...,V such that for any two vertices u and v in the same block (u,v) is not an edge of G (m ,m ,...,m ). Given such a partition m. 1 2 k i denotes the number of vertices in block 7 . i DEFINITION 1.5: A complete k-partite g.rap_h K (m , m , . . ., m ) is ~" x <. k a k-partite graph such that for any two distinct blocks V^, and any vertex u in V , and any vertex v in V , (u,v) is an 1 *~ edge of K(m ,m ,...,m ). DEFINITION V.62 The chromatic number^ %1GJ_X of a graph , G is the minimum number of blocks V , 7 ,...,V . possible in any 1 2 A ( . G J partition of V (G) such that for any two vertices u,v in the same block (u,v) is not an edge of G. DEFINITION l-.ll The p_oint independence number ,/3^(G), is the largest number of mutually non-adjacent vertices in a graph. DEFINITION 1.8; The complement , G, of a graph G is a graph such that V ( G) = V(G) and (u,v) is an edge of G if and only if (u,v) is not an edge of G and u*v. DEFINITION 1.9: Let the vertices of G be labelled 1 through n where |V(G)| = n. The adjacency matrix , A (G) , of the graph G is a (0,1) matrix such that a =1 i f and only i f (i,j) is an edge i j 3 of the labelled graph. The adjacency set ,A(v), cf a vertex v is the set of vertices adjacent to v. J^ EFINITIQN _1«IQl A unicligual vertex v of a graph G is one which belongs to exactly one cligue of the graph (see definition 1.14). DEFINITION -1^ 11: A subgraph H of a graph G is a graph with V(B) V(G) and E(H) E (G) such that i f (u,v) is an edge of H, then (u,v) is an edge of G. DEFINITION 1.12 For any set of vertices W £ V ( G ) , the induced Sil^ 2I§£li G^ is t n e maximal subgraph of G with vertex set V(G^ )=W. That i s , for any u,v in V (Gw) (u,v) i s an edge of G^ if and only i f (u,v) is an edge of G. DEFINITION 1^.12}. A complete subgraph of order k of a graph G is a subgraph defined on k vertices of V(G) for which any two vertices in the subgraph are adjacent. A trianale cf G is a complete subgraph of order 3. DEFINITION J i l i i A cligue C of order k of a graph G is a complete subgraph of G for which there exists no vertex in V(G)-V{C) adjacent to a l l vertices in V(C). Cliques are therefore maximal complete subgraphs. DEFINITION 1.15: An automorphism of a graph G is a permutation of the vertex set V (G) which preserves adjacencies. DEFINITION 1.16: The collection of a l l automorphisms cf a graph G forms a group called the automorphism grogjE ,P(G), of the graph G. DEFINITION 1.17: Two vertices u,v of a graph G are similar if there exists an cc in T(G) such otu=v. DEFINITION 1.18: A graph G is points-symmetric i f for any two 4 vertices u,v in V(G) there exists an «c in T(G) such that<*u=v. DEFINITION 1.19: A graph G is line-symmetric i f for any two edges (u ,v ) , (u ,v ) in E(G) there exists an oc in r (G) such that either m ul=u2 and ^1=v2 or ocu^v^ andocv1=u2. DEFINITION ls.2.0.1 fin algorithm which operates on a graph is considered efficient i f the computation time and storage requirements can be expressed as functions of n bounded above by a polynomial in n, where n is the number of vertices in the graph. DEFINITION 1. 2,1: The Hadamard product , AXB, of two matrices A and B is the matrix C where c = a . . • b . . . 5 1.2 INTRODUCTORY REMARKS The focal point of this thesis is the detection of cliques in graphs. The importance of this topic from both a graph theoretic and an applications standpoint will manifest itself in the historical survey of the next section. There are several variants or special cases of this subject which can be considered. We will confine ourselves to a study of the following four problems: 1. ) The enumeration of cliques of a graph; 2. ) Detection of the largest clique of a graph; 3. ) Determination of the non-similar cliques cf a graph; 4. ) Determination of cliques of a specific order. The object in each case will be to study the special characteristics of the problem, and then to exploit these features to develop useful procedures for achieving the gcal. In addition, an effort will be made to determine bounds on the efficiency of several such procedures and to seek insight into intrinsic properties of the methods used which might aide in estimating the best that one can expect from procedures designed to solve these problems. 6 1.3 HISTORICAL SURVEY The study of methods for detecting maximal complete subgraphs originated principally with the search for an efficient and objective representation of the structure of social groups. Such groups can be modeled with a sociogram, a graph which characterizes the responses of individuals in a sociometric test which requires that each participant specify some subset of the group to which he wishes to belong (Moreno [62]). Pioneers in a mathematical treatment of this problem were Forsyth and Katz [33] who proposed representation of sociograms as matrices to which the elementary operations of row and column permutations could be applied to achieve some more desirable form in which the groupings of individuals could be more easily observed., Such a representation was subseguently refined by Luce and Perry [54], Festinger [31], and Luce [55]. A (0,1) matrix, effectively the adjacency matrix of the sociogram, was utilized, and the distance properties of a graph derived from the square, cube, and higher powers of the adjacency matrix were used as indicators in characterizing the structure of the group. Festinger observed that a unicliqual member i belonged to a clique of k persons i f and only i f the ith diagonal element in the cube of the adjacency matrix was equal to (k-1) (k-2) . It was therefore a simple matter to find the order of the clique to which an individual belonged, provided he was a member cf only one clique. Weiss and Jacobsen [77] applied the techniques of Luce, Perry, and Festinger to analyze the relationships of individuals and their tasks in business organizations. 7 As a result cf its sociological beginnings the word "cligue" became synonymous with the graph theoretic notion cf "maximal complete subgraph". The apparent usefulness of the graphical method of representation of socicmetric data was suggested by the early results. However, investigators became aware of the general need for more powerful methods of manipulationg the sccicgram and its associated matrix; in particular, of determining in an efficient but general manner the cliques suggested by the model. The importance of techniques for studying social groups thus motivated exploitation of a graph theoretic approach to the detection of cliques. Because of the finiteness of the problem, there exists a naive, "brute force" method for finding the cliques of a graph vertices for k=1,2,...,n. This clearly involves looking at moderately large graphs. Because sociograms could be very large, analysis by cligue membership would be practicable cnly with the advent cf much better detection procedures. Early techniques depended primarily on a "bag of tricks" which incorporated knowledge of the structure cf the graph induced by the particular application, and resorted to exhaustive search or estimation when simplification was no longer possible. In 1956 Harary and Ross [41] proposed a method wh.ere.by a graph was systemically reduced to components each of which contained at least one unicligual vertex and for with n vertices—merely examine each sets, an obviously intractable number for even 8 which the observation of Festinger, previously mentioned, could be used to find i t . The reduction procedure employed the following algorithm: 1. ) Initially, let the graph G itself be the component under consideraticn. 2. ) If v is a unicligual vertex in a component under consideration, then the subgraph induced on v and those vertices adjacent to v is a clique. Define a new component for consideration by deleting from the current component v and a l l unicligual vertices adjacent to v. 3. ) If the component under consideraticn contains no unicligual vertices, let v be a vertex belonging to a minimal number of trianqles of that component. Define two new components for consideration: a. ) the subgraph induced on v together with those vertices adjacent to v, and b. ) the union of subgraphs induced on the set of vertices not adjacent to v together with their respective sets of adjacent vertices. Since the union of these two components can be shown to be equal to the current component under consideration, delete it from the l i s t of components to be considered. 4. ) Choose a component from the l i s t and go to 2 until the l i s t is exhausted. A procedure for improving the efficiency of the algorithm was also suggested by Harary and Ross by which a component 9 containing not more than three cligues could be ccmpletely processed at step 2. The Harary and Eoss algorithm thus provided a more practical method than the naive algorithm fcr identifying cligues in sociometric data representable as a (0,1) matrix. Subsequent researchers have pursued the development cf techniques for -the analysis of variations and generalizations of the model in an effort to include and obtain more information about the structure in a representation cf a social group. By associating weights with each pair of vertices in a sociogram, the degree or strength of the relationship could be characterized (see for example Hubell [45]). By utilizing an adjustable threshold value, a hierarchy of graphs could be established, each graph being defined cn the same set of vertices but consisting of only those edges whose weights exceeded the threshold value. Dorien [20], Johnson [47], and Boyle [7] have investigated the hierarchichal structure of groups by this method. In particular Peay [6B] recently developed an algorithm for determining the hierarchichal cligue structure cf such a representation. A family of sets of vertices which we shall call "potential cliques" is generated from a graph G of order n and associated with a threshold value t. 1. ) Initially V(G) is the only potential clique in the family. Denote by w (v ,v ) the weight of edge (v ,v ) in E(G). i 3 1 j 2. ) If there exists an edge (v ,v ) in E(G) such that i j w (v ,v ) < t, and i f there exists a potential cligue C induced i j 10 on a subset of k vertices containing v and v then two new potential cliques C , C are induced on C-fv } and C-{v }. 1 2 i j 3. ) C is deleted from the family and C and C are added 1 1 2 to i t provided neither is contained in some current member of the family. 4. ) When it is the case that w(v , v )>t for any pair i J (v^v.) in any potential clique, then the family constitutes the set of cliques associated with the threshold value t. It is clear that the effect of the algorithm is to successively refine sets of vertices which contain non-adjacent pairs until such refinement is no longer possible; hence such sets of vertices can. be considered tc induce "potential cliques". To use the algorithm to find maximal complete subgraphs, a threshold value of 1 is assumed. Yet another important sociological property of groups that stimulated the study of cliques in graphs was the tendency for individuals to "cluster" into groups in such a way that members of a cluster retained a high degree of similarity, while different clusters characterized dissimilar properties (Davis [15]). Cluster theory also had applications outside of Sociometry. Abraham [1] has used clustering techniques to solve the problem of minimizing the number of interconnections of electrical assemblies, a problem also explored by Lawler [51], while Bonner [6] applied such methods to medical taxonomy problems. Bonner's efforts resulted in the design of an apparently efficient algorithm, so considered 11 because his method eliminated the need for comparing newly generated vertex subsets with previously generated sets for containment, a necessary part of the procedures employed by Harary and Ross or Peay. As a conseguence, Bonner's algorithm enjoyed some popularity and was used in comparative studies with more recently proposed algorithms. However, Augustson and Minker [5] showed this efficiency was often illusory since a large number of extraneous vertex subsets could be generated in certain cases. Besides the application of cligue detection tc the interpretation of sociometric data, the most recent widely explored application is to the problem cf information retrieval. Early developments in the area of document retrieval were made by Meetham [57] while Abraham [2,3] applied such technigues to the problem of thesaurus construction. In addition, the previously cited work of Bonner was also an application of cligue detection methods tc information retrieval problems. More recently, Gotlieb and Kumar [36] used clustering technigues to represent the degree of semantic association between index terms used in the classification of documents. The thesaurus problem was further explored by Augustson and Minker who employed a new algorithm developed by Bierstone which was shown by empirical methods to be better than that cf Bonner in many cases. Bierstone's algorithm was compared empirically by Mulligan [63] with two recently developed 12 algorithms for clique enumeration, one by Bron and Kerbosch [8] and the second by Corneil (see Mulligan [63]). From this study Mulligan concluded the Bron-Kerbosch algorithm to be superior. The problems and techniques of comparing some of these algorithms will be discussed further in the next chapter. Several important graph theoretical problems are related to the detection of maximal complete subgraphs. It is well known (see for example Nordhaus [66]) that the pcint independence number of a graph G is egual to the order of the maximal clique in G, its complement. In addition, the chromatic number of G is equal to the minimum number of independent cliques in G. At present there is no known efficient means of computing either of these graphical invariants. A procedure for determining either number would provide an important tool in pursuing the host cf problems (see for example [14,19,26]) that exist in chromatic graph theory, and hence serves to emphasize the importance of studying complete subgraphs. As a result the literature abounds with a variety of results relating tc the existence of complete subgraphs in graphs. As an example, one may consider the celebrated problem of Ramsey [69] concerning the smallest number of vertices that a graph may have and contain either a complete subgraph of order m, or an independent set of k vertices. The determination of such a number, r(m,k), is an unsolved problem for general m and k, although the published results include 13 the calculation cf specific values, existence theorems, and bounds [21,29,35,37,38,48]. It is evident that an efficient cligue detection procedure would provide a useful tool by providing a faster way of examining graphs for their complete subgraphs. Perhaps the results of extremal graph theory, pioneered by Turan[74,75], contributed most directly to the clique detection problem. In 1965, Moon and Moser [60] verified by direct methods the maximum number of cliques possible in a graph, a result earlier established by Erdos [ 27] through an inductive argument: The maximum number of cligues in a graph with n vertices is: a. ) 3n/3 i f n = 0 mod 3 b. ) 4-3^n"^/3 i f n = 1 mod 3 c. ) 2-3 (n"2)/3 i f n = 2 mod 3. It was also shown that the graphs which contain the maximum number of cliques were: a. ) the complete nr-partite graph K(3,3,...,3) i f n = 0 3 mod 3 b. ) the complete (n-1)-partite graph K(3,3,...,4) if 3 n = 1 mod 3 c. ) the complete (n*1)-partite graph K(3,3,...,2) if n = 2 mod 3. These graphs shall henceforth be referred to as Moon-Moser graphs. 14 It follows from this result that, as a function of the number of its vertices, a graph may contain an exponential number of cliques. Hence any algorithm which examines each clique at least once (ie. Sequential) may be expected to require an exponential amount of time to enumerate the cliques of a graph. Although from a graph theoretic point of view this is a disheartening result, an examination of graphs which contain such numbers of cliques reveals that a l l cliques are of the same or nearly the same order. Further the cliques in such graphs appear to be homogeneously distributed over the vertices, each vertex belonging to the same or nearly the same (again exponential) number of cliques. From a practical pcint of view, the number of edges in the graphical models generated by the empirical data cf the applications suggests that such conditions are unlikely to occur. The sparseness of the adjacency matrix of such graphs could therefore be used as a rough a priori test of the number of cliques a detection algorithm might be expected to find. Cliques in graphs having a maximal number of cligues are a l l of the same size. Moon and Moser also showed that the number of different sizes of cligues in a graph with n vertices is bounded above by n-log (n). Erdos [30] improved their lower bound on this number by showing that i t was bounded below by n-log (n)-H (n) -0 (1) , where H(n) denotes for some k the k-fold iterated logarithm, log log.. .log (n), and 0(1) is an unspecified constant. 15 The r e s u l t s of Erdos, Mccn and Rcser suggest that an algorithm f o r the enumeration of cligues i s an example of an exponential combinatorial process. There has recently been an e f f o r t i n the theory of computation tc establish a hierarchy of complexity classes of combinatorial algorithms, motivated by the lack of success i n finding and proving e f f i c i e n t algorithms for a large number of important combinatorial processes. This work was pioneered by Ccck [11] who showed that combinatorial problems can be expressed as language recognition problems. Using such a representation certain problems for which no e f f i c i e n t algorithms have jet been devised were shown to be eguivalent in the sense that each was reducible to the problem cf whether a well formed formula was s a t i s f i a b l e . The class of problems so reducible has been expanded by Lawler [52,53] and Karp [49] and includes the clique detection problem. The p r i n c i p a l result i s that either there e x i s t s a polynomial bounded algorithm for each problem in the class, or for none cf them. However, the nature of the problems strongly suggests that the l a t t e r case i s i n fact true. The study of the detection cf cligues may help tc resclve t h i s question since Howshowitz [63] has shewn that a well formed formula with k clauses can be represented by a graph in such a way that the well formed formula i s a tautology i f and enly i f there exists no complete subgraph of order k. This res u l t i s stated by Karp [49 1, and i s i m p l i c i t in Ccck. [11]. 16 CHAPTER 2i ENUMERATION OF CJ.IO.UES 2.1 INTRODUCTION In order tc gain insight into the complexity of the problem of cligue enumeration, we shall examine the algorithms described in Chapter 1 in some detail. The technigue evolved exploits the way in which the vertex subsets are determined during an iteration of each algorithm. A cligue enumeration algorithm will then be proposed and shown to be more efficient than those previously examined. The algorithms of Harary and Ross and Bonner were chosen because of their availability in the literature, their apparent differences of approach, and their freguency of citation as references in subsequent literature on the subject of cliques in graphs. In addition, the Harary and Ross algorithm is considered by this author to be the historical precedent for the development of clique enumeration algorithms. Peay»s algorithm was chosen because it is comparatively recent, offers a conceptually simple approach to the problem and there appears to be no analysis of its efficiency. Although not yet readily available at the time of this documentation, the Eron-Kerbosch algorithm has been included because of its superiority over some of the previous methods as determined by Mulligan [64]. Although the algorithms cited employ apparently different techniques to achieve their goals, this difference is primarily one of detail, for an examination reveals the 17 following common features: 1. ) Each algorithm refines or decomposes a previously determined vertex subset to obtain new vertex subsets each containing at least one cligue of the original graph, ft choice is made of a vertex from the i n i t i a l vertex subset, and its adjacency properties are used to define the new subsets. The old vertex subset is subsequently removed from further consideration. 2. ) Each alqorithm has the property that every clique of the original graph is contained in exactly one of the possibly several vertex sets available for consideration at any stage of the algorithm. 3. ) Each algorithm employs some device to avoid the pitfalls of multiply defining a cligue or including as a clique some complete subgraph which is not maximal. Such a situation can occur whenever vertex subsets are generated which are properly contained in other vertex subsets, or which contain complete subgraphs maximal on that vertex subset but not on the original vertex set. In the remainder of this chapter i t will be seen that i t is precisely these properties of the clique enumeration algorithms which profoundly affect the efficiency of such procedures. 18 2.2 MATHEMATICAL ANALYSIS OF ENUMERATION ALGORITHMS To obtain some means of estimating analytically the computational time required by each algorithm, the computation may be divided into two parts. The fir s t consists of determining the effort required for one iteration of the algorithm; that i s , the time required to determine new vertex subsets from an old one. The second involves the determination of the number of iterations required to find a l l the cliques of the graph. We shall see that the number of iterations required is related to the number of vertex subsets generated during the execution of an enumeration algorithm A. ^ DEFINITION 2^1^ A vertex subset W is directly derivable from a vertex set V using algorithm A i f during some iteration in the execution of A, W is determined from V. DEFINITION 2.2: A vertex subset » is derivable from a vertex subset V using algorithm A i f there exist subsets IL ,0 ,...,U. such that U ^ is directly derivable from V, U is directly from D\ for i<j, and W is directly derivable from U\ . Since the set W is contained in the set V from which i t was derived, using these definitions i t is easy to see that derivability induces a partial ordering on the set of a l l vertex subsets generated by algorithm A during the course of enumerating the cligues, provided we add the stipulation that every subset is directly derivable from itself. This partial ordering may be represented by a directed tree whose root represents the vertex set of the graph, and whose vertices represent the vertex subsets generated by 19 algorithm A. Vertex u, representing subset U, is connected by a directed edge to vertex wf representing subset W, i f H is directly derivable from a. It is clear that this tree is dependent upon the enumeration algorithm used and the labelling of the graph, hence we shall always associate with any derivation tree a labelled graph G and an enumeration algorithm A. DEFINITION 2.3: A vertex subset W will be said to be redundant i f i t is properly contained in some vertex subset V from which it was not derived. It is interesting to observe that the behavior of clique enumeration algorithms to be described subseguently can be compared to a tree searching procedure, the tree in this case being the derivation tree of a graph G as determined by an algorithm A. The determination of methods for minimizing the development of redundant nodes in the determination of the derivation tree may then be likened to the problem of finding suitable tree pruning technigues. This similarity has also been noted by Mulligan [ 6 4 3 , while Bron and Kerbosch have exploited i t in their algorithm which used a "branch and bound" technique on the derivation tree. To determine the computational effort required during a single iteration of each enumeration algorithm, a technique developed by Corneil [12, APPENDIX A ] will be used. The tasks performed within each algorithm will be grouped into blocks. Each block will be defined from a set of basic operation types determined by implementation considerations. These operation 20 types and their associated time constants are given in TABLE 1 of APPENDIX A. Host of the instruction types are self-explanatory. We shall elaborate briefly, however, on the "push" and "pop" operations. The need for such operations arises from the indeterminate amount of storage required for saving partially determined vertex subsets. It will be seen to be most useful to save such subsets, i f they are needed for some subsequent processing, on a push-down store. Such a data structure is easily implemented as a linked l i s t with additional storage added on as required. The essence of the "push" operation is to obtain storage for the current vertex subset tc be saved together with a link address pointing to the next data item in the store. The top item is always directly accessible through a pointer to the top of the store . The "pop" operation deletes the top data item of the store and resets the pointer to the store top so that it points to the next data item which thus becomes the new top of the store. In most cases it is difficult to obtain an exact expression for the computation time of the algorithm under consideration. This is due primarily to difficulties in determining the number of vertex subsets having a particular number of vertices. A second factor complicating the determination of such an expression is the difficulty of determining the extent of search to satisfy some condition (such as the "first edge" in Peay's algorithm) during a particular iteration. For these reasons our principal goal 21 will be to determine the order (as a function cf the number of vertices in the graph) of an iteration, and the number of vertices in the derivation tree. In determining estimates of the reguired computation time for the enumeration of cligues by various methods, the primary graph to be considered will be the complete k-partite graph with m vertices per block, denoted K(m,m,...,m) or K (m ). The reason for such a choice is that every k-partite graph is a subgraph of K (nr*) and the derivation tree of any other k-partite graph is smaller than the derivation tree of K(m ). As a consequence of this choice of graph for consideration i t is possible to determine the number of vertices in its derivation tree using each algorithm. 22 2.3 ANALYSIS OF THE HARARY-ROSS ALGORITHM As mentioned in the introduction of this chapter, the Harary-Ross algorithm was selected for consideration in a comparative study of some cligue detection algorithms because of its historical precedent. It is interesting to note that despite its frequent reference in subseguent papers on clique detection examined by this author, no mention had been made of an error in the alqorithm until Harary himself referred to its existence in a recent paper on the application cf graph theory to the Social Sciences [UU], Harary observed that although the method found a l l the cliques, i t also included some "other" subgraphs in the set of maximal complete subgraphs as well. These "other" subgraphs are in fact complete subgraphs which are not maximal and hence each is contained in some clique. Although there appears to be no subseguent attempt made to correct the problem, possibly due to the existence of more efficient algorithms by the time of the discovery of the error, a modification to correct the defect is fairly simple. When a complete subgraph has been discovered, determine i f there exists a vertex adjacent to all vertices in that subgraph. If not, a cligue has been found. Otherwise i t is obvious that such a subgraph is not a cligue and is therefore deleted from further consideration. This modification is included in the subseguent analysis. In the interests of improving the efficiency of their algorithm, Harary and Ross modified the general procedure cited in the historical survey by defining a procedure for 23 determining whether or not the subgraph induced on the vertex set under consideration had at most three cligues. If so, then such a subgraph could be completely processed by a direct method (rather than the recursive method of the general procedure) and the cligues of the subgraph determined. However in most cases only one additional iteration of the general procedure is required to determine a l l the cligues of a subgraph containing at most three cliques. In addition such a scheme requires additional computation, namely determining whether a subgraph has at most three cligues, during each iteration of the general procedure thus increasing the overall computation time. * For these reasons, the contribution to overall efficiency is small and for simplicity has not been included in the analysis or implementation of the Harary-Boss algorithm. 2.3.1 NOTATION FOB THE ALGORITHM G^ The subgraph currently under consideration. VJGJ. The vertex subset of the subgraph G. The adjacency matrix of the subgraph G. rj. An array such that r(i) contains the sum of a l l elements in row i of B= A2XA , the Hadamard product cf the adjacency matrix and its sguare. d^ An array containing the degrees of the vertices in the subgraph G. Bi a pointer to the vertex i in G with minimum row sum r ( i ) . 24 I K The number of vertices in the vertex subset V(G). 2.3.2 THE ALGORITHM STEPO: Initially place V(G) on the stack. STEPli Choose a vertex set V(G) from the stack of vertex subsets to be considered. If stack empty then go to step13. STEP2: Compute the matrix product A2XA where "X" is the Hadamard product. STEP3 : Let B= A2XA . Compute the row sums r (1) ,r (2) ,... , r (n ) of B as well as the degrees d (1), d (2) ,...,d (n ) of the vertices in the subgraph G. STBP4: Set i to 1 and m to 1. STJP5i I f r (i) =d (i) (d(i)-1) then go to STEP10. STEP6: i f r{i) < r(m) then set m to i . STEPJi set i to i+1. If i<n then go to STEP5. STEP8^ No unicliqual vertices exist. Therefore define two new vertex subsets V (G ) and V{G ) as follows. V(G ) is the set of 1 2 1 a l l vertices adjacent to m, the vertex of minimum row sum r{m). V(G2) is the set of a l l vertices not adjacent to m, together with a l l vertices adjacent to at least one of these vertices not adjacent to m. STEP9: Store V (G^ ) and V(G2) on the stack of vertex sets tc be considered and go to STEP1. STEP 10: A unicliqual vertex i has been found. Compute the intersection of a l l rows of the adjacency matrix of the original graph corresponding to vertices adjacent tc i . 25 STEP j 1_: If the result cf STEP10 yields an empty set then the complete subgraph determined by i is maximal. Hence print i and the set of vertices adjacent to i . STEP12:. Delete from V (G) vertex i and a l l vertices adjacent to i which are also unicligual. Place V (G) on the stack of vertex sets to be considered and go to STEP1. STEP13: a l l vertex sets have Been processed. Therefore stop. The tasks to be performed by the fiarary-Boss algorithm may be logically divided into four blocks. The function of block 1 is to compute the degrees of the vertices of the graph and to compute [A(G)]2 X A (G) and determine the row sums of this matrix. Block 2 uses this information to search for a unicligual vertex by finding a vertex which satisfies the relationship r = d • <d -1) where r is the corresponding row sum, and d the degree of the vertex. I f such a vertex is not found, two new subgraphs are determined in block 3, one of which is returned to block 1, the other saved for further processing. Otherwise we proceed to block U to search for a complete subgraph of the graph G induced on the unicligual vertex v and those vertices to which i t is adjacent. If the complete subgraph is a cligue i t is printed, a l l unicligual vertices in the discovered complete subgraph are deleted from V(G ) and the subgraph G is returned to block 1 for further i processing. 26 BLOCKJ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 pop V(G) i <-- 1 r. <— 0 j <-- i+1 -> r . <— 0 3 substr (a^ , j,0) : 0 a. <— a. +1 X 1 a 4 <— a.+i 3 3 k <-- 1 s <— 0 'S <— substr (a . a.,k,1) • s k <— k+1 r . <— r. *s x x r . <— r. +s 3 3 j <-- j*1 j : n_ i <— i+1 l : n , The parameter n is equal to the number of vertices of G the vertex* set V(G) , and a (i) denotes the adjacency set cf vertex i in G. BLOCK 1 dominates the computation in the Harary-Boss algorithm; the other blocks of the procedure will be seen to depend linearly on n_. For this reason we defer their description until we have further examined BLOCK 1. 27 In general it is difficult to obtain an explicit expression for the computational effort reguired by the Harary-Ross algorithm. This is due to difficulty in determining the number of times branch conditions are satisfied . I t is also difficult, even for an arbitrary complete k-partite graph, to determine the number of vertex subsets that are generated with a particular distribution of vertices over the blocks. Instead, we shall consider the behavior of the Harary-Ross algorithm when finding the cligues k of K(3 ). A similar procedure could also be adopted for each of the other two types of Moon-Moser graphs. An example of the derivation tree for K(3 ) is given in Fig. 2.2. Each node of the tree has been labelled according to the distribution of vertices among the blocks of the induced complete 3-partite subgraph which i t represents. Let G be an induced subgraph defined on a vertex subset obtained during the execution of the Harary-Ross algorithm, and suppose G has i ^ blocks with 1 vertex per block, ±2 blocks with 2 vertices per block, and i ^ blocks each containing 3 vertices. Necessarily it is the case that i,+i +i =k and i.,+2i +3i =n„. The number of times that 1 2 3 1 2 3 Lr execution passes from line 6 to line 7 during the execution of block 1 is egual to the number of one's in the adjacency matrix of G above the diagonal. Since the original graph is complete k-partite every induced subgraph containing at least one vertex per block is also complete k- partite. The number of one's in G is therefore given by fQ ( i ^ i ^ i ^ ) where f (i , i , i )= [ (i +2i +3i ) 2- (i +ui +9i ) ]. G 1 2 3 1 .2 3 1 2 y 28 Hence steps 7 through 15 of block 1 will be performed f ^ ( i ^ i ^ i ) times and the computation time of block 1 during one iteration is given by T (n^ ) where T (n ) = C»+n C»+n (n -1)C»+f (i , i , i )(C»+n C* ) 1 G 1 G 2 G G 3. 6 1 2 3 4 G 5 with C* = t +t 1 1 2 C2 = 3 ti+ 2 t3 C* = 2t1 + t3 + 2 t4 +t8 c* = 6^+at. 4 1 J C5 = 2 tl+ 2 t3+ t4+ t6+ t8 The constants t ^f i=1,2,...,10, represent the cycle times for the various operations as specified in APPENDIX A 29 BL0CK2 20 21 22 23 24 25 26 i <-- 1 m <— 1 -*r. \ <V1>- -*-Block4 r : r — i m m <— i i <— i+ 1 <r-I n Block3 The computational time for BLOCK 2 is given by: T (n) = C2+g C2 2 1 l 2 where 2t, C| = 3t1+2t3+3t^+tg It was assumed that line 24 which points to the minimal r(i) was executed one-half the time. The term g^ i s egual to one less than the label of the first unicligual vertex i encountered. 30 BLOCK3 27 28 29 30 31 31 32 33 34 35 VtG^ <— am V(G) substr (V(G^ ),m,1) <— 1 i <— 1 V(G2) <-- -7(6^ i <— i + 1 substr (V (G1) , i , 1) : 0 V(G2) 7(G2) a± i <-- i+1 < i : n. push VCG^VfG,,) The computation for one iteration of block 3 is T3 ( V = c i+ ( nG ~as( i ) , C2 *n C| ' W h e r e C3 = 4t +2t *t. *t +t rt 1 1 2 6 7 8 C3 = t +t , 2 1 6 ' cl = t *2t, +t 3 1 3 4 8 and d (i) is the degree of vertex i in the subgraph induced G on V (G). BL0CK4 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 C <— a . x substr (C,i,1) <— 1 T <-- T -T k <-- 1 substr {C,k, 1) : 0 — T <— T a,. k <— k+1 <-A k : n T : 0 - i print C substr (V (G) ,i,1) <— 0 j <-- j*1 ••-substr (ai # j , 1) : 0 i 3 substr (VG) , j , 1) <— 0 j <-- j+1 3 : n, push V (G) The computation time for BLOCK4 is T3(nG) = C*+dG(i)C*2+nGC3*bC*-iC*f the constants being given by Ct2 - 2 t l**3 -V*6 C5 " 3 V * 8 C i m . t l, t3t 2\ *t» and h is the number of unicligual vertices in the cligue. 32 For K{3 ) the next section will show that there are J[_(3 -2 3) components which do not have unicligual vertices; hence the value of g. in block 2 for each of these vertex sets is n„, 1 u and therefore the maximum value for (nQ) is cf+C2nQ* For blocks 3 and 4 d^(i) and h are also bounded by n^ and hence T„ (n0) and T . (n_) are also linear polynomials with respect to n^. , T (n^ , ) is thus clearly the dominant term in the computation time for one iteration of the Harary-Ross algorithm and is a polynomial of order n*. 33 Fig. 2.1: U 3, 3, 3 ) I l l 35 2.3.3 NUMBER OF VERTEX SUBSETS EXAMINED BY THE ALGORITHM From an examination of the derivation tree of an arbitrary graph using the Harary-Ross algorithm (see for example Fig. 2.2) i t is easy to determine the number of components or vertex sets that will be generated. This is because, given that the number of cligues in the graph is N, since the derivation tree from this algorithm is binary, the number of nodes is 2N-1. The nodes of the derivation tree of K (3 ) can be separated into two parts according to the type of processing carried out on the component represented by that node of the tree. In particular, whenever a unicligual vertex is found via block 2, the cligue to which i t belongs is determined in block 4 and the component corresponding to this cligue is output rather than returned to the processing stack, which consists of vertex subsets yet to be examined further for cligues. For example, i f we examine Fig. 2.2, and consider the subgraph induced on a vertex set having one vertex in every vertex block but one ( eg. (1,1,3), (1,1,2), (1,2,1), (2,1,1)) we see that a l l vertices in the block containing more than one vertex are unicligual. A vertex would be chosen from this vertex block, and through the computation in block 4 of the algorithm a clique would be determined and a vertex set returned for further processing. In the case of the example such a vertex set would have the form (1,1,2) or (1,1,1). For the general complete k-partite graph K(3 )r among a l l vertex sets generated having unicligual vertices, there is 36 only one vertex set of the form (1,1,1,...,1,3). This is because such a set is derivable only from previous sets having either one or three vertices per vertex block. Such a vertex set yields three cligues according to the seguence of derivations given in Fig. 2 . 3 . Of these three cligues one is reprocessed in block 1 as a conseguence of the previous discussion. Of the remaining 3k-3 cliques, a l l are derived from vertex sets having two unicliqual vertices in some block, one vertex in each of the remaining blocks, and hence again according to the previous argument one-half of these will be reprocessed ene further time. The purpose of this discussion has been tc ascertain how many vertex sets of the 2«3k-1 generated are subject to processing in block 1 where the major portion of computation occurs. This number is thus 3k-1+1+JL(3k-3) = 2(3k-1). In 2 2 addition, the number of vertex sets containing unicliqual vertices, and therefore examined in block 4, is 2 + 1.(3 - 3 ) . 2 Finally the number of vertex sets not containing a unicliqual vertex and therefore processed in block 3 is given by (2-3k- 1) - 3k - (2 + l(3k - 3)) = l(3k - 3) 2 2 If we denote by T^T^T^ '^4* e s t ; L i n a t e s o f t n e computation time for blocks 1,2,3,and 4 respectively then an estimate of the computation time is given by: T = (f +T_) (l(3k-1)) • T, (3 ( 3k _ 1 -1)) + f.(1(3k*1)) appx 1 2 ^ 3 2 * Fig. 2.3 SUBTREE OF DERIVATION TREE 38 2.3.4 STORAGE REQUIREMENTS The Harary-Ross algorithm as defined in blocks 1 through 4 reguires only one adjacency matrix be stored, that of the original graph. The appropriate sub-matrix is then determined during each iteration by keeping track of the appropriate vertices defining each induced subgraph. Also, since the r and d arrays are pertinent only to the induced subgraph currently under consideration, only one array of size n cf each is reguired. As a l l other terms are counters or pointers the only other storage ' reguirement is made by maintenance of a push-down store for keeping track of the vertex sets remaining to be processed. At any iteration we usually define two vertex sets named VfG^) and V (G2) in the description of the algorithm. If we order their position on the push-down store so that VfG.^ ) is always chosen fi r s t from the push-down store, then since no path in the derivation tree is of length greater than k, there cannot be more than k vertex sets waiting in the store. Each corresponds to the "other" vertex set paired with that vertex subset represented by a node lying on the path in the derivation tree. As an example consider the derivation of clique (258) in K (3 ) as labelled in Eig. 2.1. The seguence of events is illustrated in Fig. 2.4. For each pair of direct derivations, V (G^ ) corresponds to the "left" derivation, V (G )•to the "right" derivation. Before we can reach the vertex in the derivation tree labelled (23,456,789) we must have processed (1,456,789) since we have 39 arranged to do this f i r s t . Conseguently al l cligues containing vertex 1 have been determined and vertex set (1,456, 789) or its derivatives no longer appear on the push-down store. A similar argument applies to (2,4,789) and (2,5,7). The cnly nodes remaining tc be processed are (3,456,789), (2,6,789) and (2,5,9) . From a programming point of view it is most convenient as well as efficient to maintain lists of vertices as bit strings. Since both the vertex sets of the algorithm as well as the rows of the adjacency matrix are vertex lists i t is clear that the storage requirements are given by 2n+C "integer units" of memory plus n(n+k) bits. An "integer unit" will depend on the storage conventions for integer representation on a particular system. For our purposes during implementation this is egual to a "half-word" or 16 bits. The storage requirements for the Harary and Ross algorithm is thus n (n + k)+16 (2n+C) bits where C is the nuirber of pointers and counters reguired. Fig. 2.U A PATH OF THE DERIVATION TREE 2.a ANALYSIS OF BONNER^ S ALGORITHM Bonner's algorithm has generated some interest among researchers wishing to employ the analysis cf cligues in graphs to their particular application because of its apparent efficiency since no clique or vertex subset generated need be examined for containment in some previous component. This difficulty arose in the modified Harary-Ross algorithm previously described and is also inherent in Peay's algorithm, to be discussed next. In addition, Bonner's algorithm is interesting to examine in a comparative study cf clique enumeration algorithms because i t has been compared empirically with the efficiency of more recent algorithms. The approach taken by Bonner is rather different from that of Harary and Ross or Peay in that i t is a constructive procedure rather than one of reduction. The method employed is to build up the vertex sets of the cliques from a set of potential candidates, membership being determined by the adjacency properties associated with each vertex. We describe the steps of the algorithm as given by Bonner [ 6 ] , including a minor correction noted by Augustson and Minker [ 5 ] in their discussion of the efficiency of the procedure. The paper of Augustson and Minker showed that the efficiency of Bonner's algorithm may often be illusory because many complete subgraphs or components may be generated during the course of execution only to be discarded at seme later stage. It was discovered that graphs containing several very large cliques and a few very small ones resulted in an 42 excessive amount cf computation being performed on extraneous components which the algorithm would eventually delete. This generation of extra vertex subsets using Bonner's algorithm is also present in the enumeration of cligues of complete k-partite graphs, upon which we are focusing our discussion. We shall establish a generalization of observations made by Augustson and Minker which will then be used tc determine the number of vertices in the derivation tree of K (m ) using Bonner's algorithm. We f i r s t , however, describe the procedure its e l f . 2.4. 1 NOTATION ' A z_ an array representing the set of objects in the complete subgraph to the present stage of calculation. C 2 a« array of potential candidates for increasing the size of the complete subgraph induced on vertices in A . L : the last vertex of C to be considered for addition ~ i ~ to the complete subgraph induced on A . S _: the adjacency matrix of the original graph. 2.4.2 BONNER'S ALGORITHM (Augustson, Minker[5]) STEPJ: set i to 1, C to V (G) , A^^ to gS ,/L1 to 1. STEP2: If L is not in set C then set L. to L.+ 1 and go to i I X I STEP5. STEP3: Set C to { C O S (L ) }-{L. } and A . , to A. l_J {L. }. i l x x x x l x x 43 STEP4i set L^to 1^ + 1, i to i+1. STEP5: If there is an element in larger than L.^ then go to STEP 2. STEP6: Set T to k^. If C^ =$ then flj_ is a maximal complete subgraph. Else either has been found before cr i t is not maximal. STEP7^ Set i to i-1. If i=0 then stop. STEP8: Set U to be the set of a l l objects in greater than L. . If U £ T then go to STEP7. x STEP9i Set tc L± + 1 and go to STEP2. The tasks performed by Bonner's algorithm have been divided into two blocks. The function of the steps performed in block 1 is tc determine whether a discovered complete subgraph is maximal and to find the next component tc be processed. If one is found, control is transferred to block 2 which constructs another complete subgraph returning the discovered complete subgraph to block 1 for testing. 4<4 BL0CK1 1 2 3 4 5 6 7 8 9 W <— fl i A. C. : 0 I print W i <— i-1 U <— C stop substr (U,i,1) <— 0 • (0 u W) : w L. <— L. + 1 > 10 x 1 The computation time, T^ for block 1 is given by V c i * 'io*"0!* with constants being given by Cl * = "1**3**4 and = 1 i f W is a cligue, 0 otherwise, and h<i is the first value of i for which k^ ,!^ , are the vertices of a complete subgraph contained in a cligue not yet found, 45 BL0CK2 10 11 12 13 14 15 16 17 ->substr , 1) : 0 — Ci 1 < - ~ Ci° s(Li) substr (C i ltL i r 1) <— 0 Ai 1 <- &i substr(fl^ ^*L±*^) <— 0 L. T < — L. +1 X 1 X i <-- i+1 substr (C.. ,1^ + 1^-1. ±) : 0 L.+1 Every vertex subseguent to the original is examined exactly once until there are no further vertices to be included. Therefore loop:17 to 10 is executed at most n-L^ times for an arbitrary graph. For Kfm1^ there are at most m (k-j) vertices in C\, where j denotes the block to which vertex lu belongs. For each value of i there are m-1 vertices not in C. , namely the other vertices in the block. Since i is not x J incremented on such occasions loop:17 to 10 is executed m-1 times for each of the next k- j -1 blocks of vertices in . If the graph is labelled such that vertices in block i have labels (i-1)m + 1, (i-1)m*2f... ,(i-1)m*m, then j = ^1^ 1j . An upper bound cn the time consumed in block 2 is given by T>> defined as a function of L. and n: x T2(L.,n),= where (m-1)C|+mC2 ) C l = 6 t l + 7 W 2 t 8 C22 = V * 3 C32 = V 2 V 2 t 8 46 Since the value of h in block 1 is less than or equal tc n while k-| L. -11 is maximized when L. is in the first block, i t is clear that the computation for one iteration of Bonner's algorithm is bounded by n = mk, the number of vertices in the graph K(mk). 2.4.3 NUMBER OF VERTEX SETS GENERATED To determine the number of nodes in the derivation tree of Bonner's algorithm it is necessary first to establish the following result, a generalization of observations made by Augustson and Minker [5]. THEOREM 2.4: For m>2 every complete subgraph of K(m ) is generated during the execution of Bonner's algorithm. Proof: Using Bonner's notation the set Ai consists of a complete subgraph of order i defined on vertices labelled L-.I^ ; the set C. consists of all vertices adjacent 1 2 l l to every vertex in A^. i f Suppose the vertices of block V. in V (K (m )) are labelled (j-1)m + 1, (j-1)m+2,..., (j-1)m+m, for m>2. If we perform the algorithm to obtain the "f i r s t " clique of K (m ) we obtain the following assignment to A^ and for i=1,2,...,k: Lx= 1 A x= {1} L = m+1 A 2= { 1,m+1 } L = 2m+1 A = {1,m+1,2o+1} L = (k-1)m + 1 A = { 1, m+1 ,2m+1,.. . , (k-1) m+1 } 47 Let U be a vertex subset of consisting cf a l l vertices with labels greater than L^. Such a subset is not the vertex set of a complete subgraph unless there is at most one vertex in 0 because of the labelling of vertices in K(m ) . Therefore, by execution of the algorithm, L^ is set to L^ + 1 in step 8 and we return to step 2. Since i is determined by the number of vertices in A^ when we entered step 6, and since every possible value of L^ from i t s i n i t i a l one of (i-1)m*1 up to mk is adjacent tc L ,L ,...,L. , and also contained in C., i t is the case that a 1 2 i - l i complete subgraph with vertex set given by A^, 1<i<k*1, is generated where: 1. ) A^ = { L^,12,.,L ^ }, 2. ) (j-1)m+1<L <mk, 1<j<i -'• 3. ) L1<L2<...<L . QED We have thus established that for m>2 every complete subgraph of K(m ) is generated during the execution of Bonner's algorithm. The special case m=1 corresponding tc applying the procedure to a complete graph generates k complete subgraphs as described above in determining the first cligue of the graph. Since every possible subset U of C is contained in A , clearly no return is ever made to step 2, so that the algorithm terminates after printing Afc ={ 1,2,...,k}. The derivation tree for K(3k) is given Fig. 2.5 as an illustration of the vertex sets generated by Bonner's algorithm. From this one can clearly see the property cf 48 Bonner's algorithm defined in theorem 2.4. As a result the , k number of components generated by Bonner's algorithm on K (m ) is given by the following: I S J O S J f i 2-.5.1 The number of nodes in the derivation tree of K(mk) using Bonner's algorithm is (1 + m)k. Proof: From Theorem 2.4 i t is clear that every complete subgraph occurs as a node during some stage of execution. The k number of complete subgraphs of order i<k in K (m ) is egual to the number of ways of choosing i from k blocks V ,V,...,V , and then choosing 1 vertex from each of the i chosen blocks. This is s ix/k\. Hence the total number of complete subgraphs is k • I1' $ m^ kN i = 1 li) = (1+m)k-1. Since the root node of the derivation tree is not yet included this results in a total of (1+m) nodes. QED. 50 2.4-4 STORAGE REQUIREMENTS The storage reguirements for Bonner's algorithm are similar to those for the Harary and Ross algorithm. Two arrays A and C of length n are required, each element corresponding to a vertex subset which can be represented as a bit string as can the rows of the adjacency matrix S. In addition an integer array of pointers L is reguired. Two temporary bit strings T and U are also needed in addition to a counter i . Using the half-word of 16 bits as the integer unit, the storage reguirements are: 3nz • 16 (n+1) • 2n = 3n2 + 18n • 16 bits. 51 2.5 ANALX.SIS O F PEAY^ S ALGORITHM The computation involved in the Harary-Ross algorithm was dominated by the computation of a matrix product and by the generation of a large number of components and their associated complete subgraphs, which were later deleted. Bonner's algorithm was also dominated by the generation of a number of superfluous components. Because Peay's algorithm generates only components which are essential to the final determination of a l l cligues, i t is of interest as i t may have a reasonably small derivation tree. However, the means by which Peay deletes non-essential components results in a large number of additional operations. Specifically, Peay compares each of two newly generated components to an ever growing l i s t of vertex sets of cliques and subgraphs which are potential cliques. Thus, for a graph with an exponential number of cligues, as a function of the number of vertices, an exponential number of comparisons is required in addition to the time for generation. As will be seen, this off-sets tc a considerable extent the time saved by avoiding the analysis of redundant vertex sets. For this reason we discuss here a modification to the algorithm which reduces the amount of storage required. The extent of this reduction is determined in our storage analysis. The procedure to be implemented for obtaining this improvement depends on ordering the selection of vertex subsets so as to obtain a development of "depth before breadth" of the derivation tree. The stack containing these vertex subsets is then altered to contain only those vertex sets which do not induce complete subgraphs. This 52 drastically reduces the size of the push-down store by eliminating the growing l i s t of cligues previously being kept there. Instead, a test for cligue membership is made, like that employed in our modification of the Harary-Ross algorithm. These modifications are included in cur subseguent analysis. Before proceeding, we should note however that the inefficiencies inherent in the algorithm as cited by Peay are a consequence of the application to which such a procedure was being put, namely the determination of a hierarchy cf cliques in sociograros. As a rule the goal of a graphical treatment of such data is to assign the "individuals" to one or more of a few sets which i t is hoped characterize the structure of the group. Hence the number of cligues in a social group as determined by such an analysis is small and therefore the difficulties of a possibly exponential number of cligues is not pertinent. As cur treatment of cligue detection algorithms is graph theoretic, we have not assumed any a , priori information about the structure of the graph induced by its physical interpretation and must therefore be concerned with such problems. 2.5.1 NOTATION VJGJL_: the vertex set currently under consideration. gx the subgraph induced on V (G) I K the number of vertices in G. M: the number of vertex sets in the stack. 53 ajij_2 adjacency set of vertex i . 1J9.-, LL 119.oil newly generated vertex sets. G : the subgraphs induced on the new vertex sets. 2.5.2 THE ALGORITHM STEPOi Initially place V(G) on the stack. gTEPJ: Choose a vertex subset V (G) from the stack of vertex sets to be considered. If the stack is empty then stop. STEP2: Find a pair of vertices v^ »v^ both in V(G) such that (v. ,v .) is not an edge of the original graph. If no such pair exists then go to STEP5. STEP3: define new vertex sets V (G )=v (G)-{v. ], i l l ' V(G2) = V(G ) " { V . . } . STEP4: For k=1,2, i f v(Gk) is n o t contained in vertex set V currently on the stack then put V (G^ ) on the stack. Go to STEP1. STEP5: V (G) induces a complete subgraph. If there exists no vertex in the original graph adjacent to a l l vertices in V (G) then print V(G) as a clique. In either case go tc STEP 1. The tasks of Peay's algorithm can be logically grouped into two blocks. The function of block 1 is to examine the subgraph induced on a subset of the vertices of a graph G, in oreder to find a pair of non-adjacent vertices. If a pair is not found then a cligue has been discovered and it is printed. If two vertices, say v and w, are not adjacent then control is passed to block 2 which defines two new vertex sets. Each is 5H saved for further processing provided it is not contained in some previously generated vertex set. Control returns to block 1 which chooses another vertex set for examination. r 55 BLOCKJ. 1 2 3 5 6 7 8 9 10 11 12 13 ->pop V(G) M <— M-1 i <— 1 c <— c - c - * j <— i + 1 C <— C a . •substr (a^ j , 1) j <-- j+1 • j : n i <— i+1 • i : n • C : 0 print V(G) : 0- ->> Block2 If G is in fact a clique, then all n (n-1) ones in its 2 adjacency matrix above the diagonal will be examined. The computation time of block 1 for one iteration is therefore bounded above,by with constants T (n ) = C» + n c*+n (n -1)C» ci= 3 W W 4 i o C. = 3V2V Vt6 °3 " ' I ' V ^ ' 1 * 56 BLOCK2 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 V(G1) <-- V(G) substr (V ) , j , 1) <— 0 V(G2) <-- V(G ) substr (V(G2) f j,1) <~ 0 k <-- 1 •*V(G1) 0 V (G1) : V(G^k^)-i-V(G1) <-- 0 v(G2) 0 <+• +£- V (G) : V (G ^ V(G2) <-- 0 V(G1) V(G2) 4 i <— i + 1 i :M V(Gi) : 0 -push V(G1) M <— M + 1 V (G ) : 0 -«-1 2 push V(G2) M <— M+1 exit -> exit The computation time is maximized when neither new vertex set is contained in some previous vertex set. When this occurs the time for BLOCK2 is T = C2+MC2 where 2 1 2 C2 = 7t +2t +2t +2t +2t 1 1 2 3 4 8 C 2 = t +t +6t +t t 2 1 3 4 6 and M is defined in section 2.5.1. 57 2.5.3 NUMBER OF VERTEX SETS GENERATED The complete derivation tree for K (3,3,3) using Peay's algorithm is guite extensive. As will be seen this is due to the nature of the development of new vertex sets for consideration. In order to obtain an expression for the number of nodes in the derivation tree it will be convenient to consider the development which occurs during the processing of one block of vertices. That i s , since Peay's algorithm determines two new components whenever a vertex pair is discovered which is not an edge of the graph, we shall consider a l l such pairs defined upon a single vertex block of G. Fig. 2.6 gives such a development for block v = {1,2,3] of K (3,3,3) as labelled in Fig. 2.1. Examination of this sub-tree of the derivation tree reveals that three components are eventually generated with the property that each has exactly one vertex in block 1 and 3 vertices in each of the remaining two blocks. Since, for each of these sets, the one remaining vertex is adjacent to a l l other vertices in the vertex subset of that component i t is evident that no further computation will involve that vertex. Thus the induced subgraph of each vertex subset is eguivalent to K(3,3) with vertices labelled 4,5,6,7,8,9. The number of nodes generated is given by the sum of those determined during the generation cf three components, K (1,3,3), from one component, K(3,3,3), and the number of nodes generated in the reduction of each K (1,3,3) (which is eguivalent tc the reduction of K(3,3)). Therefore i f for K (m ) we can determine 58 the number of nodes created in generating m components K(1,mk-1) we can obtain a recurrence relation for the number of nodes rn the derxvation tree of K (m ) using Peay*s algorithm. 59 Fig. 2.6 SUBTREE; OF THE DERIVATION TREE 60 THEOREM 2.6: The number of vertex sets generated by Peay's k k algorithm to find the cligues of K(3 ) is 3(3 )-2. Proof z Let vi#u2f... #Vk b e t h e bl °c J l s o f K (m ) a n a label the vertices in block V. :(j-1)m+1, (j-1)m+2,... (j-1)m+m. Consider J the sequence of derivations defined by processing the vertex pairs (non-edges) (1,2) ,(1,3), ...,(1,m). The vertex sets derived t starting from ({ 1 ,2 ,3 ,. . . , m }, V , . ..,V ) are respectively ({ 1 ,3,4,. .., m ], V2 ,.. . , Vfe) , ({ 1, U,5,. . . , m }, V2,..., Vfc ),...,{{ 1,m},V2,...,Vk) , and finally ({ 1 }, V£ , . . . , Vfc) . If ({1,i,i +1,...,m },V ,...,V ) is a typical vertex set from this seguence of derivations, two new sets, ({ 1,i+1,,..,m},v2,...,Vk) and ({i , i * 1,..., m}, V2 ,..., Vfc ), are derived by separating vertices 1 and i . The latter vertex set is deleted since i t is contained in the previously determined set ({ 2,3,. .. , m },V ,. .., V ). We therefore have a total of 2(m-1) vertex sets determined during this seguence of derivations, half of which are deleted, the remaining ones being those given above. This process is illustrated in Pig. 2.7. The number of vertex sets considered in reducing one m-1 vertex block of K(mk) is given by 1+E 2i = 1 + m(m-1) . i=l Let am be the number of vertices in the derivation tree k of K(n>k) using Peay's algorithm. Since the reduction of one block of V(G) yields m vertex sets whose processing is eguivalent to that for K{m^~^-) , the number cf vertices is given by the recurrence relation am = 1 + m (m-2)+mam whose k k-1 solution i s : am = Cmmk + 1+m (m-2). The complete derivation tree k 1 1-m 6 1 for K(3,3) is given in Fig. 2.8 from which we obtain a™ with K m=3. Solving for C3 we get C3 = 3 and a3 = 3{3k)-2. QED. 1 - 1 k 64 2.5.4 STORAGE REQUIREMENTS As has been observed, Peay's algorithm as originally defined required storage space for a l l new vertex sets generated during its execution. For a graph with an exponential number of cliques such a demand is not practicable. In our discussion we have described and analyzed a modification to the procedure which eliminates the need for maintaining in the stack the cligues as they are discovered. By developing each path in the derivation tree as far as possible, the number of nodes placed in the stack is never greater than the length of the path generated, each entry corresponding to the "other" vertex subset of the pair of vertex subsets developed at that stage. Let V(G) be a vertex ,subset such that V (G) induces a complete k-partite graph having 1 vertex in i blocks and m vertices in (k-i) blocks. If we carry out a sequence of derivations by fixinq one vertex in block i+1, say v and sequentially derive new vertex subsets from the set of non-edges (v,w ), (v,w ),..., (v,w ) then according to the X rC~ -L argument presented in a discussion of the derivation tree for K(mk) developed by Peay's algorithm, two vertex sets will be added to the stack after such a seguence of derivations. The first consists cf a l l vertices of the original vertex subset other than v. The length of the path in the derivation tree corresponding to this seguence is m-1, the number of vertices not adjacent ot v. Since there are k blocks in K(mk), the maximum length of any path is (m-1)k. However by choosing non-65 edges as described, after the (m-1)i th node only one of the i+1 vertex subsets have been saved for further processing, hence the number cf vertex sets on the stack is at most k+1. The only other storage required is that for the adjacency matrix and a number of integer counters. Therefore, since each entry in the stack can be represented by a bit-string of length n, the total storage reguirements are n(k+1) + n2 + 16C bits. 66 2.6 A NEW ENUMERATION ALGORITHM The analysis of previous algorithms, while providing expressions for a comparative analysis of sequential clique enumeration algorithms has also revealed some of the properties desired by a "good" procedure and some of the hazards one must attempt to avoid. In addition, certain properties are characteristic of algorithms for explicitly enumerating the cligues of a graph. These algorithms appear to reguire a means of determining the sets of vertices adjacent to a given vertex as a l l procedures discussed use this information to generate the components to be used in further analysis. This is net tec surprising since any graph is characterized by this sort of information. However, the adjacency matrix representation cf a graph provides this most simply and directly. The representation of the rows of the adjacency matrix as a string of bits greatly simplifies the computation reguired in determining new components. The importance of an adjacency matrix representation over some other representation is thus emphasized by these observations. The desirable properties of a seguential cligue enumeration algorithm are two-fold. First, generate new components which do not destroy the existence of maximal complete subgraphs with as l i t t l e effort as possible. Secondly, generate as few such components as possible. The best possible situation is to avoid the need for determining whether a complete subgraph or component just generated is 67 properly contained in some other cligue of the graph and reguires some means of choosing just the right set of vertices so that no redundant component is ever generated. Because of the complexity of the possible intersections of the vertex sets of cliques in a graph, i t is difficult to determine just how such a set could be chosen. It is not sufficient to find either a maximal or a minimal independent set cf vertices which covers the vertex set of a graph as the following example illustrates. Consider the graph of Fig. 2.9. The vertices labelled 1 and 4 in the graph constitute a minimal independent set of vertices covering the vertex set of the graph. It is clear however that the cligue K (1,1,1,1) induced on vertices 2,3,5 and 6 contains no vertex in this particular minimal covering. Similarly consider the graph in Fig. 2. 10. It has a maximal independent set of vertices labelled 1,3 and 5, none of which is a member of the cligue induced on vertices 2,4, and 6. Clearly one reguires the prescient ability to choose an appropriate independent covering set of independent vertices among an exponential number of possible choices. There is presently no known way for accomplishing such a task in an efficient manner. The algorithm to be described generates a i reduced number of redundant vertex sets, and uses an efficient procedure for detecting such redundancy. It will be seen that with some modifications the procedure to be described combines some of the better features 6 8 of the previous algorithms. As a consequence we shall show that the performance of this algorithm is comparable to and in many cases (eg. Graphs with many cligues) better than that to be expected from the others. 6 5 Fig. 2.9 MINIMAL COVER COUNTEREXAMPLE 1 2 3 5 Fig. 2.10 MAXIMAL COVER COUNTEREXAMPLE 70 2.6.1 DESCRIPTION OF THE ALGORITHM Each vertex subset to be processed is divided into two parts; V(G1), the set of a l l vertices in that subset yet to be examined and V(G2), the set of a l l vertices previously examined and which induce a complete subgraph in the original graph. The vertices in V(G1) have the additional property that they represent all possible extensions of the complete subgraph induced on V (G2) which yield a larger complete subgraph. If V(G1) is empty, then provided there does net exist a vertex adjacent tc a l l vertices in V(G2), we have found a clique. Such a condition is maintained by deleting from further consideration any vertex subset a l l of whose members are adjacent to some vertex outside the subset. If V(G1) is not empty then we generate n - d(v) new sets by first choosing a vertex v, and then considering it together with the n - d(v) - 1 vertices not adjacent to v. Each vertex from this set is used to define a new vertex subset by adding it to V(G2) and thus extending the set of vertices already considered, and then defining a new set of vertices to be considered from V (G1) by including only those vertices adjacent to that vertex just added to V(G2). 2.6.2 NOTATION VJGV)_i set of vertices in the current vertex set yet to be considered. 71 V ( G 2 ) s e t of vertices in the current vertex set which-induces a complete subgraph. VJH1); new set of vertices to be considered. VJH2),: new expanded set of vertices inducing a complete subgraph. F:. set of vertices not adjacent to a chosen vertex from V{G1) . Az the adjacency matrix of the subgraph induced on V(G1)U V(G2). 2.6.3 REDUCED RDUNCANCY ALGORITHM STEPO: initially place V(G)i_j$ on the stack. STEP1: Choose a vertex subset V(G1)vjV(G2) from the stack of subsets to be considered. If stack empty, stop. STEP2: If there exists a vertex adjacent to a l l vertices in 7(G1)u V(G2) then go to STEP1 STEP32 If V(G1) is empty then print V (G2) as a cligue and go to STEP1. STEPjfi Choose a vertex v in V(G1) and define F to be a set consisting of v together with a l l vertices not adjacent to v. STEP52 Choose a vertex w in F and define a new subset V(H1)OV(H2 ) where V (H1) is the set of all vertices in V(G1) adjacent to w, and V (H2) = V(G2)o{w}. STEP6: Delete vertex w from sets V(G1) and F. STEPJl If I" empty then go to ST EP 1; else go to STEP5 In order to compute the time for one iteration of the 72 algorithm the instructions performed are as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 -•pop V (G1) ,V (G2)-J C <— C -C i <— 1 > substr (V <G1) V(G2),i,1) : 0 C <— C A (i) i <— i + 1-« x : n C : 0 V (G1) 0 print V (G2) v <— index (V (G1) , 1) F <— -A (v) w <— index (F,1)< V(H1) <— V(G1) A (w) V (H2) <— V (G2) substr (V (H2) , w,1) <— 1 push V (HT) , V (H2) substr (V (G1) , w, 1) <— 0 substr (F,w,1) <— 0 F : 0 ± 2.6.4 JOBBER OF VERTEX SETS GENERATED As an example we again consider the derivation tree of K (3,3,3) labelled as previously in Fig. 2.1, the tree this time being determined by our algorithm. It is given in Fig. 2.11. Each vertex not representing a cligue is labelled by the 73 pair (V (G1) , V (G2)) representing the vertex subset generated by the algorithm. The number of nodes in the derivation tree for K(m ) is given in the following theorem: THEOREM 2.7^ The number of nodes in the derivation tree of K(mk) is mk 1 - 1, Proof: Let the nodes of K(mk) be labelled such that i f v^ is a vertex of block V. , v. a vertex of block V. and i < j , then i J J the label of v. is smaller than the label of v. . The algorithm processes the vertices of a graph in ascending order of labelling. a typical component during execution of the algorithm has i vertices in V (G2), one from each of the blocks V ,V ,...,V. . V(G1) induces a complete (k-i) partite i 1 2 i graph with m vertices per block. This component therefore determines m new components, one for a selected vertex in block V. , and m-1 for the m-1 other vertices in V. _ the only x 1 x 1 vertices not adjacent to the selected one. Each component therefore determines m new ones, until there are k vertices in V(G2) in which case there are none in V(G1). The number of k !i k i vertices in the derivatxon tree is therefore IZ m = m -1. i=0 ^E?L 7k 75 2.6.5 ANALYSIS OF AN ITERATION The computational effort for one iteration of the algorithm to find the cliques of K(m ) is easy tc determine from the Iverson description. The loop:7 to 4 is executed n-1 times where n is the number of vertices in the original graph, while the loop: x 19 to 12 is determined as fellows. Let the vertex set currently under consideration have i vertices in V(G2), and (k-i) m vertices in V(G1). Then F defined in line 11 consists of a l l vertices in one block and conseguently loop: 19 to 12 is executed m-1 times. The expression for the computation time during one iteration is therefore given by T Q (n) = CJ+nC9,4mC^ where co = u t ^ t ^ t ^ +tQ CO = 2t1 + t3 +2 t6* t8 co = 6 ti+ t2 + t4+t6 +3te + t9 provided the vertex set under consideraticn does not have V(G1) empty. If V(G1) is in fact empty, as i t will be for a l l nodes of the derivation tree representing cligues, then only lines 1 through 9 are performed and the computation time in this instance is T.^ (n) = C* + nC°, where C9, is given above and C» = 2t, +t +2t,. 1 1 2 4 We can now combine the results of the computation time for one iteration with the number of nodes in the derivation tree to obtain an expression for the total computation time k reguired to find the cligues of K (m ). There are m nodes for 76 which the computation time for one iteration is T^ (mk) , and mH 1 - mk * mk- 1 nodes where Tn (mk) is the computation time. Hence T(mk) = T,(mk)m +T n(mkl m -1. m>1. The case fcr m = 1 1 u m-l clearly defines a derivation tree consisting cf a single path of length k. Hence the computation time to determine that K{1k) is a clique is T (k) = T (k) • (k-1) T (k) . 2.6.6 STORAGE REfiUIBEMEBTS Like the previous algorithms of Harary-Ross, Bonner, and Peay, the Reduced Redundancy algorithm maintains cnly one adjacency matrix, that of the original graph G. Vertex subsets are maintained on a stack and used to select the appropriate rows of the adjacency matrix of G, to obtain adjacency properties of the subgraph of G induced on the vertices in the vertex subset. Our algorithm, however, generates n -d(v) new vertex sets during an iteration where n is the number of vertices in the set and d (v) is the number of vertices in that set adjacent tc v. For this reason i t is more difficult to determine the storage reguirements of the push-down store for an arbitrary graph. Instead we shall again examine the complete k-partite graph K(mk). If we again adopt the strategy of developing the derivation tree in a "depth before breadth" manner, i t is clear that no path is of length greater than k. Further, from the previous discussion we know that every vertex set V(G1)uiV(G ) is complete k-partite with 1 vertex in each of i 77 blocks. V(G1) consists of the m(k-i) vertices in the remaining k - i blocks. If we select a vertex v from V(G1) tc determine new components, the number of vertices not adjacent to v is m-1. Therefore, to each node in a path of length k in the derivation tree there are m-1 other nodes corresponding to vertex sets yet to be processed. Hence the push-down store must be capable of handling k(m-1) vertex sets. The storage reguirements for the new algorithm applied to K(m ) are k (m-1) + (mk)2+2mk+16C bits, C being the number of counters and pointers used. Since we can partition a graph into k blocks no block of which has more than m vertices for k = X ( G ) * the chromatic number of the graph, this expression also serves as an upper bound on the storage requirements for an arbitrary graph. 78 2.7 THE BRON^ KERBOSCH ALGORITHM The Bron-Kerbosch algorithm is the most recent cligue enumeration algorithm known to this author. Mulligan [ 6 4 ] has described the procedure and found it to be superior tc Bierstone's algorithm, a method also discussed by Augustson and Minker [5]. The algorithm employs a recursive procedure which is used to modify a global vertex set consisting of a l l vertices which form a complete subgraph of the original graph. The function of the recursive procedure is to extend, i f possible, the number of vertices in the complete subgraph. This is accomplished by maintaining several lists and pointers in a stack generated through recursive calls to the procedure. These include two vertex subsets, one a set of candidates which can be used to. extend the complete subgraph, and the second a set of vertices which have already been used to carry out such an extension. Since the contents of the vertex subset is under continual modification i t is also necessary to maintain a pointer indicating the last entry into the set. Some other counters are also maintained through recursive stacking of definitions which will be apparent from the description of the algorithm. In what follows we shall use the notation developed by Mulligan and his formulation of the algorithm. 79 2.7.1 NOTATION ru number of vertices in the original graph. Com£sub^ complete subgraph currently being extended. order of compsub. Vz vertex set currently under consideration. Ne^ number, of vertices already examined in V. Cez, total number of vertices in V. S_2 pointer to selected vertex from V used to extend compsub. R2§1 position of a potential candidate. - Nodj. number cf vertices not adjacent to a fixed vertex among the set of vertices in V already examined. Minnoch minimum number of vertices not adjacent to a fixed vertex. Pixp^ : vertex with maximum degree in the subgraph induced on V. NEW: new vertex set. Newne: number of vertices in NEW that have been examined before. Newce_: total number of vertices in NEW. 2.7.2 THE BRON-KERBOSCH ALGORITHM (Mulligan [64]) A. Initial call to recursive procedure EXTEND STEP 1: Set V to V (G) , c to 0 ne to 0, ce to n. STEP2: Call recursive procedure EXTEND (V,ne,ce). On return stop. 80 B. EXTEND lVjjaexcel recursive procedure. STEP 1: set minnod to ce, nod to 0 , i to 0. STEP2: set i to i+1. If i>ce or minnod = 0 then go to STEP6. STEP3: set count to 0. For each V(j), j=ne+1 to ce not adjacent to V(i) set count to count+1 and pos to j . STEP4: i f count < minnod then set f ixp to V (i), minnod to count and go to STEP5; else go to STEP2. STEP5:_ i f i < ne then set s to pos; else set s to 2 and nod to 1. In either case go to STEP2. STEP6 Set nod to minnod+nod. STEP?; If nod < 0 then return. STEP8: Interchange V (s) with V(ne + 1). STEP9: Set newne egual to the number of vertices in { V (1) ,V (2) ,V (ne) ) adjacent to V(ne + 1). Set NEW (1),NEW (2) ,... ,NEW (newne) egual to those vertices. STEP10: Set NEW (newne+1),...,NEW (newce) egual tc those i vertices in { V (ne+2) ,. .. ,V (ce) } adjacent to V(ne + 1). Newce eguals the total number of vertices in NEW. STEP 11: set c to c + 1, compsub (c) to V(ne + 1). STEP12 If newce eguals 0 then print compsub(i) , i = 1,2,...,c as a cligue; else i f newne less than newce then call EXTEND(NEW,newne,newce) STEP 13^ Set c to c-1, ne to ne + 1. STEP 14: set nod to nod-1. If nod > 0 then choose another vertex from {V (ne+1),...,V (ce) } not adjacent to fixp and not yet chosen. Go tc STEP7. The Bron-Kerbosch algorithm has been divided into three blocks. The task performed by block 1 is to extend if possible 8 1 a complete subgraph contained in G by examining vertices not previously encountered to see whether they are adjacent to a l l of the vertices of the complete subgraph under consideration. Control is passed to block 3 where if such an extension is possible i t is made, a record being kept of those vertices previously encountered and yet to be examined. If the complete subgraph cannot be extended i t is printed out. Block 3 recursively calls block 1 returning only after a l l possible extensions have been examined. Control is then passed to block 2 which makes the next possible extension tc the vertex set under consideration at the present level of recursion. BL0CK1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 minnod <— ce nod <— 0 i <— 0 ->i <— i+1 > l : ce minnod : 0 count <— 0 j <— ne+1 I—substr(MV <i)) ,V (j) ,1) count <— count+1 pes <— j j <— j + 1 < j : ce count : minnod f ixp <— V (i) minnod <— count I : ne s <— pos 19 20 S < i-*r nod <— 1 BL0CK2 21 22 23 24 25 26 27 c <— c-1 ne <— ne+1 nod <— nod-1 s <— ne+1 *• substr (V (s), fixp, 1) s <— s+1 - s : ce return : 0- -29 BL0CK3 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 nod <— minnod+nod nod : 0 »return sel <— V (s) V (s) <— V (ne+1) V(ne+1) <— sel newne <— 0 i <— 1 -=>substr (fl (V (i)) ,V (ne + 1) , 1) newne <— newne+1 NEW (newne) <— V (i) i <— i + 1 •* : 0 i : ne •i-newce <— newne i <— ne+2 substr (A (V (i)) ,V (ne+1) , 1) newce <— newce+1 : 0 •45 84 44 NEW (newce) <— V (i) 45 i <-- i+1 46 42 <—— l : ce 47 c <— c+1 48 compsub (c) <— V(ne+1) 49 newce : 0 — 50 i <— 1 51 print compsub (i) 52 i <-- i+1 53 i : c 54 21 < — newne : newce < 55 21 •* call EXTEND (NEW,newne,newce) 2.7.3 THE NUMBER OF VERTEX SETS GENERATED . Let v^»V2'**"''k b e t h e D^o c k s °^ vertices of K (m ), each containing m mutually non-adjacent vertices. The Bron-Kerbcsch algorithm proceeds by fixing a vertex and defining a new vertex subset to be the set of a l l vertices adjacent to the fixed vertex. This vertex subset is partitioned into two parts to provide imformation for determining whether a complete subgraph is maximal or has been considered before. At a given level i of the recursion other vertex sets are generated whenever control again returns to that level by choosing anong the set of vertices not adjacent to the original fixed vertex. This selection procedure is analogous to the mechanism employed by the Reduced Redundancy algorithm fcr generating new vertex subsets and as a conseguence the number of vertices 85 in the derivation tree is the same, namely m -1. Tc see that m-1 this is the case i t is only necessary to establish an equivalence between the nodes of the derivation tree generated by our algorithm and the nodes in the derivation tree generated by the Bron-Kerbosch algorithm. If we assign level 0 to the node of the derivation tree corresponding to the original vertex set of the graph, then a node at level i in the algorithm corresponds to the vertex subset of a complete (k-i) partite graph while the selection of fixed vertices made in the generation of a path from the root tc level i is contained in the array compsub . From previous discussion our algorithm has a node a distance i from the root with two vertex sets V(G1),V(G2). G2 corresponds to a complete subgraph of order i , while V(G1) induces a complete (k-i) partite graph. By choosing that path of length i which results in V(G2) containing the same vertices as compsub, V(G1) is then the same set as the vertex subset generated by the Bron-Kerbosch algorithm. 2.7.4 ANALYSIS OF ONE ITERATION Since the algorithm employs the same technique for vertex set generation as our algorithm, the relative efficiencies of the two procedures are dependent upon how the properties of the vertex set so generated are exploited during an iteration. This depends cn three factors; the way the data is represented, the order of development of the derivation tree and the means by which redundant components are avoided. 86 To determine the computation time required by the Bron-Kerbosch algorithm as implemented by Mulligan let V be a vertex set under consideration at level i . The maximum depth of recursion by the algorithm is k. At level i we have i+1 calls outstanding of which i have been called by the procedure EXTEND itse l f . All parameters defined within the procedure are saved, a feature important in the determination of storage reguirements. A vertex set generated by the algorithm in finding the cliques of Mra^ ) has the property that at level i a l l vertices that have been considered lie in blocks V ,V ,...,V 1 2 i while those yet to be considered lie in V ,V ,...,V . i+1 i+ 2 k There are m (k-i) vertices in the latter collection since every vertex is adjacent to any vertex in V . from which the selected candidate was chosen. Hence every vertex in the last (k-i) blocks of K(mk) has the minimum number of disconnections since the vertex set under consideration can have at most m-1 vertices from any previous block already considered, the remaining vertex currently in compsub. Since minnod > 0 for a l l vertices in the vertex subset, loop:14 to 4 is repeated ce-ne-1 times plus once more when choosing a vertex for fixp, while loop:18-to 4 is repeated at most ne times. Finally, the inner loop: 13 to 9 is repeated m (k-i) times for each vertex already considered and m(k-i-1) times for each vertex yet to be considered. The time for one iteration of block 1 is thus bounded by T = C»+ (ce-1)Ci + ne (C* + m (k-i)C») + (ce-ne)m (k-i-1)C* 1 1 2 3 A- 4 87 where 4 = 7 t l + t4 C» = 3t,+2t~+4t, 2 1 j) 4 c3 = 3 t l+ t4 C4 = 3t1 +2t3 + t4 +t8 Loop:54 to 21 is repeated after every return from the recursive call in line 55. This equals n-d(fixp), the number of vertices not adjacent to fixp. Loops 27 to 25 and 46 to 42 are repeated ce-ne times, while loop:39 to 35 is repeated ne times. If i<k-1 a clique has not yet been found so statements 50 through 53 are skipped. If we define one iteration as being the total computation performed until a return is made at line 29, then the computation time for blocks 2 and 3 together is bounded by T2 = (n-d (fixp)) (C|+(ce-ne) C|+neC2) +t^ with C? = 13^+61^ 1 ' , J" 1 C| = (2t1 + t3 + t4 + t8) + (t1+t3 + 2t4+tg) C3 = 3 V 2 V2 t4+ t8 The order of the computation time for one iteration is therefore between n and n2. For vertex subsets such that ne=0, the computation for one iteration is of order m2(k-i)2, the square of the number of vertices in the subset under consideration. 88 2.7.5 STORAGE REQUIREMENTS As previously observed, the maximum depth of recursion is k. Hence a l l variables defined within the recursive function must be stored k times. This consists of pointers and counters and the array NEW an integer array of size n. The adjacency matrix of the original graph and the array compsub of order n are maintained outside the recursive procedure. Since the adjacency matrix is stored as an array of bit strings the storage reguirements for the Bron-Kerbosch algorithm as implemented by Mulligan are n2 + n (k + 1)+16Ck + 16 bits where C is the number of integer scalars defined within the routine EXTEND, and the additional 16 bits are an allowance for a globally defined variable. 89 CHAPTER 3i CLIQUE DETECTION USING VERTEX SIMILARITY 3.1 INTRODUCTION It is evident from the observations and results of Chapter 2 that the efficient detection of cligues is severely hampered by the possibly exponential number of such subgraphs. Even the act of printing them out can occupy an inordinate amount of time unless there exists seme means cf simultaneously identifying several cligues and also some more compact form of notation than explicitly defining the vertex sets of each cligue. Two approaches to this problem will be examined separately in this chapter, each of them exploiting properties which measure the degree to which any two vertices are different. One such device is similarity of vertices. The automorphism group of a graph partitions the set cf vertices V (G) into equivalence classes called the orbits of V (G). Two vertices are similar if and only i f they are members of the same orbit. Hence there exists a permutation in T(G) which maps u onto w where u and w are vertices in the same orbit. An examination of complete k-partite graphs with vertices in block , reveals that every vertex in any block can be interchanged with any other, ie vertices in any block are similar. Let u, w be two such vertices. Then if we know the cliques to which u belongs and a permutation which maps u onto w, we also have a l l the cligues to which w belongs. Such a representation is more compact as i t requires 90 explicitly defining only the cligues associated with u. In the development which follows we shall tacitly assume that a procedure for determining the orbits is available. For implementation we shall employ Cornell's algorithm[12 ]. It is important to note here that while Cornell's procedures have not failed on any graphs encountered to date, that their correctness depends on an unproved conjecture. Corneil therefore describes his algorithm as a heuristic one, a policy which we formally must also follow when using his routines. ft difficulty with such an overall approach as offered here for improving the efficiency of clique detection occurs when u and w, vertices in the same orbit, are both members of some common clique, in other words u is adjacent tc w. This is often the case as is illustrated by the existence of connected point symmetric graphs, which by definition have a l l vertices belonging to the same orbit. Since vertex similarity can be used at several levels of cligue detection other than enumeration we shall defer further discussion cn this problem until later. 91 3.2 POINT AND LINE SYMMETRIC GRAPHS , The remarks of the introduction to this chapter suggest that similarity of vertices may contribute to a characterization of the cligues in a graph. To what extent x this is true will be examined in this and the next section. Of particular interest will be the behavior of subgraphs induced by a single vertex since this is the major mechanism by which a graph can be decomposed into smaller subgraphs for further processing. This feature has already been observed previously in the seguential algorithms of chapter 2. The major portion of this section is devoted to an examination of cligues in point-symmetric and line-symmetric graphs. Of particular interest is the degree of symmetry of the induced subgraphs. THEOREM 2-.ll T1»e subgraph induced on the adjacency set of a fixed vertex in a line-symmetric graph is point-symmetric. Proof^ Let G be a line-symmetric graph, v a vertex in V(G), and denote by { w^,w 2,. . . ,w^ } the set of vertices adjacent to v. Since G is line-symmetric there exist permutations ct2f<X.3 »• • • r0^ in T(G) such that: o(-2 ( v ^ ) = (v,w2) oC_(v,w ) = (v,w,) These permutations, together with their inverses, hold v fixed and hence belong to P , the stabilizer of v. Since every ot in 92 p(G) preserves adjacencies, every «_ maps w^ onto some w\ where , are in {w^,...,wKJ. If wi 'wj a r e a n? t w 0 vertices in { w v, } then at .ocf (v, w .) = (v,w.) and hence ! K j X X J the set of vertices {w ,,..,w } is similar. Further, since every permutation oLeT^ maps w^ onto some w^ the integrity of the subgraph induced on {w ,...,w } is preserved. The subgraph induced on the adjacency set of a fixed vertex in a point-symmetric graph is not necessarily point-symmetric. This is illustrated in the counter-example given in Fig. 3.1. The subgraph induced on vertices adjacent tc vertex 1 is given in Fig. 3.2 and is clearly not point-symmetric. Dauber and Harary [43] and Folkman [32] have investigated the extent to which line-symmetric graphs are point-symmetric. The principal result of Dauber and Harary is the establishment of conditions which characterize such graphs, namely that every line-symmetric graph with uo isolated points is point-symmetric or bipartite. This result together with the previous theorem establishes a sufficient condition for point-symmetric graphs to have point-symmetric subgraphs induced on the adjacency set of a fixed vertex, that condition being that the graph be line-symmetric or regular bipartite. That this condition is not necessary is illustrated by the graph given in Fig. 3.3, a graph not line-symmetric or regular bipartite but point-symmetric and every subgraph induced on a set of vertices adjacent to a single vertex is also point-symmetric. For this graph the edge (1,2) is not similar to the edge (1,6). 6 5 Fig. 3.1 POINT SYMMETRIC GRAPH 7 Fig. 3.2 INDUCED SUBGRAPH OF FIG. 3.1 3 Fig. 3.3 POINT-SYMMETRIC GRAPH NOT LINE-SYMMETRIC 95 We shall denote by A(v) the set of vertices in a graph G adjacent to the vertex v. LEMMA 3^ 2^ : let v-^*^ b e a n * t w ° s^ n ,^^ a r vertices of a graph G, and denote by G1 and G^ the subgraphs induced on A (v^ ) a nd A(v2) respectively. Then there exists oc in V (G) such that oc G = G . 1 2 £l2°Jl Since , v2 are similar the number of vertices in G-^ is egual to the number in G2» Let oc be an automorphism of G mapping onto v2» Then for each u in V(G^) there exists a unigue image ocu. Further every such vertex ocu in otV(G^) is adjacent to v2 since every vertex u in V (G^ ) is adjacent to v . Now by definition V<G ) is the set of vertices adjacent to v2 , and hence V(G2) = <*.v(G1). Since oc is an automorphism, (u,w) is in E (G) i f and only i f (<xu,<*w) is in E(G). Hence for any u,w in V (G^ ) , (u,w) is in E(G1) i f and only i f (<*u, ocw) i s in E (G2) since «tu, ocw are vertices from V(G2 ). Therefore E (G2) = oc E (G^ ) and hence oc G ^ = G^. As a conseguence of this Lemma we have the following: LEMMA 3_.2x Let G be a point symmetric graph, and denote by cL , oc ,.. . , ot automorphisms of G such that *3V1 = V * kvl = V 96 Then i t is the case that: *3G1 = G3' * kGl = V where G ,G ,.,.,G, denote the subgraphs induced cn A (v^ ,A (v2 ) ,A (vk) respectively. Let {'V^,vL..,vh be the vertex set of G. for i = 1,2,...,k and suppose without loss of generality, that ot v"*" = v*. Then clearly i f we know the cligues of Gn, we can x 3 3 1 find a l l the remaining cligues of G knowing the permutations u ,<*.,.... ,cc . To avoid duplication of cligues the following 2 j k test can be employed. If we are examining the cligues associated with component G^ , then delete a l l cligues containing vertex v. for j = 1,2,...,i-1 as such cligues have 3 already been found during the examination of component G... i Such a strategy encounters difficulties cn two fronts. First, as we have seen previously, the point-symmetry of a graph is no guarantee for the point- symmetry cf subgraphs induced on vertices adjacent to a point, hence the problem of determining the cliques of G is as yet unresolved. Secondly, the determination of automorphisms oL toL ,... ,<* is in general a difficult problem. We can overcome the first difficulty by generalizing the procedure to include graphs which are not necessarily point-symmetric. Then, the existence of an algorithm fcr determining 97 the orbits can be exploited in the following way. Every member of each orbit can be represented by a single vertex which determines the induced subgraph for further processing. If Q = {v^ ,v^ , ...,v^ } is an orbit of P(G) fcr 1 2 k some graph G and i f oi • ,<*., ,...,<x- are permutations in P(G) x2 x3 T c such that: X2 X ± X2 u± \ = vi oe, v. = v± then by an argument simliar to that previously given the cligues associated with v. ,...,V; can be determined i f we know the cligues associated with v. . The development of an x l algorithm utilizing such technigues will be the focus of the next section. The object will not be to find a l l the cliques because of the difficulties associated with determining the permutations which map similar vertices onto each other. Rather, the algorithm shall attempt to find a set of ncn-similar cligues cf a graph G which together with a knowledge of the automorphism group P(G) will be sufficient to determine a l l the cligues of the graph. The algorithm can thus be considered as a sub-program which when incorporated with a sub-program for determining the automorphism group will provide a mechanism for finding the cligues. 98 3.3 DETERMINATION OF NON-SIMILAR CLIQUES The purpose of the procedure to be described here is to somehow characterize the non-similar cligues of a graph. Two cligues C^ and C^ of a graph G are said to be similar i f there exists an automorphism oc of G such that «. c^ = C ^ . A naive approach to the problem of determining the non-similar cligues of a graph which serves to illustrate what we are attempting to find involves generating the "equivalence classes" induced on the set of a l l unordered k-tuples of vertices of G by the automorphism group T(G), for k=1,2,....n-1. For any given k, we examine the cligues that are members of each equivalence class, choosing one as a representative member. Since the automorphism group preserves adjacencies, two k-tuples are members of the same class only i f the subgraphs cf G defined on the vertices represented by the k-tuples are isomorphic. Such a procedure, therefore, clearly provides more information than we desire as we are intersted only in those classes whose k-tuples are the vertices of maximal complete subgraphs. The mechanism to be employed will depend primarily on the observations of the previous section; namely, that i t is possible to generate a l l the non-similar cligues of a graph by reducing a graph to components egual in number to the orbits of the automorphism group of the graph, each component being the subgraph induced by a vertex from an orbit. Each component i will then serve as input and subsets of vertices will thus be generated in a recursive manner analogous tc the seguential 99 procedure described in chapter 2. There is not, unfortunately, sufficient information to determine a l l the non-similar cliques of a graph from the orbital partition cf V (G) alone. This is illustrated by the following example. Suppose we apply the procedure cf choosing vertices as just described to the graph of Fig. 3*3. Since this graph is point-symmetric, one vertex should be sufficient to characterize the "fi r s t " vertex of a l l the cliques. Let that be vertex 1 . The subgraph induced on vertices adjacent to 1 consists of three independent vertices 2,5 and 6 each of which belongs to the same block of the orbital partition of V (G). The "second" vertex of a l l cliques should therefore be characterized by one vertex, say 2. As (1,2) is a clique of the graph and since we have argued that a single vertex and the subgraph induced on adjacent vertices should be sufficient to characterize a l l cligues of the graph, we would have to claim that a l l cligues were similar to (1,2) which we know to be false since the graph being examined is net line-symmetric. In fact we have previously observed that edge (1,6) was net similar to edge (1,2) and therefore should also have been generated in the determination of the non-similar cligues of the graph. We can resolve the two non-similar cligues of the graph of Fig. 3.3 by using not only the orbital partition of V (G) but also by determining the orbits induced on 2,5,6 by the stabilizer of 1. This results in a partitioning of {2,5,6] into two sets {2,5) and {6}. By selecting a representative 100 vertex from each cf these sets we can obtain two non-similar cligues say (12) and (16). This example illustrates the fact that although the group T(G) of a graph may be transitive on the vertex set V (G) it is not necessarily transitive cn A(v). We have previously shown that i f G is line-symmetric then this will be the case. Before we present a description of an algorithm for finding the non-similar cliques cf a graph from its orbital structure we examine further the orbits of the stabilizer of a particular vertex v in the following two theorems. THEOREM 3.2: Every automorphism ot in the stabilizer, P ^ of v is an automorphism of G , the subgraph of G induced on the set of vertices adjacent to v. Proof: Let P be a permutation matrix corresponding to the automorphism oc in p . Without loss of generality assume the v vertices of of G are labelled 1,2,...,m, and the remaining vertices in G are labelled m+1 ,m+2,...,n. Let A (G) be the adjacency matrix of G, A (G ) that of G . Clearly A(G) is of v v the form: 'A (G ) A v 2 A A J 2 3 Since P corresponds to an automorphism in the stabilizer of v, by Lemma 3.1 P maps V (G ) onto V (G ) and is therefore of v v the form: \ ° 0 P where P acts on vertices labelled 1,2,...,m, the vertices in V(G^) 101 Since is an automorphism of G, i t is the case that P A (G) P = A (G) . Hence K 0 A(Gv) A2 P l 0 A(Gy) A2' T T ,o P 2 . T P P 2 J I A2 A3J LA2 A3 ^ or P A (Gv) P = A (Gv) . Consequently ot maps G^ onto its e l f . QED. As a corollary to this theorem we have following result: COROLLARY: Every orbit of P^. is contained in seme orbit cf Proofz By the previous theorem every automorphism of G is one of G . Let u,w be vertices in V (G ) and suppose there exists ot in r such that otu = w. Hence ot is an automorphism of G^ and u and w must be members of the same orbit induced on V(G ) by P(G ). QED. THEOREM 3z_3± Let G be a connected point-symmetric graph. Then G is line-symmetric i f and only i f for an arbitrary vertex v, the stabilizer P S T(G) of v is transitive on A(v) the set of v vertices adjacent to v. Proof: Let A (v) = {w ,w ,...,w }. If G is line-symmetric then for any w.,w.,i # j , (v,w. ) is similar to (v,w.). Hence at is transitive on A (v). Conversely suppose P is transitive on A(v). Then for any v w ,w there exists OL in i j P such that ot(v,w.) = (v,w.). V 1 J Further since G is point-symmetric, for any other vertex u ^ v in V (G), there exists / 3 in P (G) such that / 3 v = u. If A(u) = [ x ,... ,x ^} then since /3 preserves adjacencies 102 a (u) = fyaw 2_'f3v2' * * * '/3wk ^ a n^ hence/3(v, w.^) = (u,x..) for some in A(u). Finally every edge (u^ovu) is mapped onto (u,^ sw%) by the automorphism /3ct/3,~^~, hence every edge incident tc a vertex u or v is similar to any other edge also incident to a vertex u or v. Since u was chosen arbitrarily we can conclude that G is line-symmetric. The observations made in the previous theorems provide the machinery by which we can define an algorithm for determining the non-similar cligues of a graph provided we are equipped with a procedure for determining orbital partitions. 3 . 4 DESCRIPTION OF THE ALGORITHM Since two vertices which are members of the same orbit will be members of similar cliques, we initially determine k representative vertices one for each of the k orbits of the graph. Further, we shall reguire a knowledge of the orbits of the respective stabilizers of the representative vertices. The algorithm recursively decomposes subgraphs defined on the set of vertices adjacent to a representative vertex into as many new subgraphs as the number of blocks of the orbital partition of the stability subgroup fixing that particular representative vertex. Each new subgraph is determined as the subgraph induced on the set of vertices of the old subgraph adjacent to a single vertex chosen from one of the blocks of the orbital partition and is subseguently reduced in a similar manner. 1 0 3 A record is maintained of the representative vertices as they are chosen, and when there exists an isolated vertex in a block of the partition, a complete subgraph has been found. This subgraph is then examined to see if i t is maximal by a procedure similar to that of the Reduced Redundancy algorithm in the previous chapter. As stated previously, to determine the orbits we shall employ Cornell's algorithm for constructing the Terminal Quotient Graph, a graph each of whose vertices, i t is conjectured, corresponds to a block of the orbital partition of V (G) [13]. Cornell's algorithm is ideally suited to our purposes since in determining the Terminal Quotient Graph, he determines not only the vertices of the original graph belonging to each block of the partition but also the orbits of r for a vertex v from each orbit of P(G). Since the v adjacency set of v is obviously a subset of V (G) it is easy to determine the orbits of V to which they belong. Cornell's v algorithm provides this information in the determination of the vertex guotient graphs of G which are constructed by 1 J fixing a vertex and then determining the partition induced on the remaining vertices of V (G). Corneil uses the vertex quotient graphs to determine the orbits of P(G) by grouping two vertices in the same class if and only i f they have identical vertex quotient graphs. 104 NOTATION Q± : the ith orbit of P (G ). A J v)_: row v of the adjacency matrix of G. Gl l vertices yet tc be considered for a particular subgraph. G2_2 vertices which induce a complete subgraph. Hi: new vertex set to be examined, derived from G1. S2l expanded vertex set inducing a complete subgraph. NON-SIMILAR CLIQUES ALGORITHM STEPJk Use Cornell's algorithm to find the orbits Q°,0°,••-,8? X <c rC of P (G) . In addition, let # 7 , 0 L . . . be the orbits of 1 £ Kv V induced on A (v). STEP2: Choose a vertex set (G1,G2,W) from the stack of candidates. If stack empty, then stop. STEP3: Compute T =V / Q ^ G 2 & * I f T n 0 t eniPty then go to STEP2. STEP4i If G1 empty then print G2 as a cligue and go to STEP2. STEP5: Set i to 1 and F to 61. STEP6: If ^ HG1 empty then go to STEP10. STEP7: Choose v in 0 ™ O G 1 not previously chosen. If none left to examine, let v be any vertex in 0 ™ n G 1 and go to STEP9. STEP8.: If G2r»A(v) not empty then go to STEP9; else go to STEP7. STEP9:. Define a new vertex set (H1,H2,V) with H1 = FOA(v) and H2 = G2u{v}. Put (H1,H2,V) on the stack. STEPIO^ set i to i + 1 and F to in (~). If i<k then gc to 105 STEP6; else go tc STEP2. 3.5 DISCOSSION OF THE ALGORITHM The algorithm for enumerating non-similar cligues of a graph is similar to the seguential algorithm for the enumeration of a l l cligues proposed in Chapter 2. However, whereas the seguential algorithm's efficiency was dependent upon the number of cligues in the graph and the number of elements in a vertex subset, the determination of non-similar cliques by the method just described is dependent upon the similarity of vertices in the graph. It is clear that this determines the number of orbits of the group as well as the number of non-similar cligues. Since i t is only necessary to consider one vertex from each of the blocks of the partition of A(v) induced by the stabilizer of an appropriate vertex v, the number of vertices that need be examined and hence the number of new vertex subsets generated is reduced if the number of blocks in the orbital partitions determined in stepl is small. It is possible for a graph to have an exponential number of cligues, none of which is similar to any other. This is illustrated by the graph of Fig. 3.4. The subgraph induced on vertices {1,2,...,8} is K (3,3,2) the graph on eight vertices with maximum number of cligues. Additional vertices are then added to insure that the graph has identity group. Hence every cligue of G is non-similar to every other. In general it is 106 possible to construct a graph on 6k vertices having order 3K cligues in a similar manner. The purpose of this demonstration is to emphasize the fact that the detection of non-similar cligues may itself be an exponential process. Fig. 3-U GRAPH WITH ALL CLIQUES NON-SIMILAR 108 3.6 ANOTHER APPLICATION OF ORBITAL PARTITIONING ID the algorithm for determining the non-similar cliques of a graph, i t was not possible for us tc apply orbital partitioning to the vertex sets of each component obtained in the reduction of the graphs. This was because the non-similar cligues were determined by the orbits of p of v, and not v P(G ). He have previously shown that for every automorphism v of P , there is an automorphism of P (G ). However the converse v v is not necessarily true since two similar vertices in G might be non-similar in G . If grouped in the same class, a non-similar clique would be lost. We can however employ this strategy i f we wish to determine only the existence of cliques of different orders. Such a technique is seen to examine fewer vertex subsets than a procedure for finding the non-similar cligues of a graph, since we can take advantage of any symmetry that exists in the subgraph induced on a particular vertex subset. The vertex sets which are nearly resolved into cligues exhibit a high degree of vertex similarity and by only distinguishing between vertices in different orbits, the number of vertices examined is greatly reduced. The fact that cliques of a l l orders originally present in the graph will be obtained is established by the following argument. If we determine the orbits of P(G) on V (G), two vertices u,v in V (G) are members of the same orbit i f and only if the subgraphs induced on those vertices of V(G) adjacent to u and those adjacent to v are isomorphic. Hence each induced 109 subgraph has the same number of cliques of each order and conseguently either induced subgraph may be chosen fcr further processing and the other ignored without fear of losing a l l cligues of a particular order. THE ALGORITHM STEPO: Initially place (G1,$) on the stack. STEPJ.2 Choose a vertex set(G1,G2) from the stack. If the stack is empty ,then stop. STEP3: If G1 is empty then print G2 and go to STEP 1. STEP4: Determine the orbits ^ / ^ " " "' o f t h e a D t c n , o rPD l s n l group of the subgraph induced on vertex set G1. STEP5i Set i to 1 and F to Gl. STEP6:. Choose v in 0. r\ G 1 ^ (~A (v)) and define a new vertex set (H1,H2) where H1 = FOA(v) and H2 = G2 U {v}. Place (H1,H2) on the stack. STEP7i Set i tc i+1 and F to F n ( - f t ) . If i<k then go to STEP6; else go to STEP1. This algorithm is very similar to the Reduced Redundancy algorithm for the enumeration of cligues. It is obvious that the latter algorithm could be employed to determine the orders of the hon-isomorphic cligues of a graph. However in view of their possibly exponential number, i t is desirable to find some means of reducing the number of vertex subsets generated by reducing the number of vertices that need tc be examined. STEP2: Compute If T is not empty then go to STEP 1. x 110 This is achieved in our algorithm by again exploiting the fact that two similar vertices belong to the same number of cligues of different orders and therefore in the situation where we wish to find the orders of the different sized cligues of the graph, i t is only necessary to treat one of the two similar vertices. It should be noted that in step 6 of the algorithm i t is not sufficient tc choose one vertex from each of the k orbits of the automorphism group of the subgraph induced on G1. This is because i t may turn out that the number of crbits exceeds the number of vertices not adjacent to v, in which case our algorithm would perform more poorly during that iteration than the Reduced Redundancy algorithm of the previous chapter since i t would generate more new vertex sets than the seguential procedure. For this reason v is chosen from 5,nGl n(~i (v)). 111 3,7 ANOTHER APPROACH TO DESCRIBING THE CLIQUES In the previous sections we have attempted to exploit the similarity of vertices in a graph as an aid to the detection of its cliques. This was only partially successful, ene cf the major difficulties being the difficulty of determining the group of the graph, which was necessary for a complete enumeration of the cligues. Even the task of determining the non-similar cligues has proved to be limited by the existence of few good procedures for finding the orbital partition of the vertex set. Finally, we saw where it was even possible fcr a graph with identity group to have an exponential number of non-similar cligues. In this section we shall explore an alternative approach in which two vertices will be related by a condition stronger than that of similarity. DEFINITION 3^X1 T w c vertices u and w of a graph G are said to be complete subgraph eguivalent J CS eguivalent) if for any subgraph of G defined on vertices u,v ,v ,...,v. cf V(G), u,v^,v^,...,v are mutually adjacent i f and only i f w,v^ ,v^,.,.,v_ are mutually adjacent. Two vertices of degree 0 are CS equivalent. It is clear from the definition that i f two vertices are CS equivalent then they are similar. This fellows from the fact that two CS equivalent vertices are adjacent to the same set of vertices and can be interchanged. By finding a l l the cligues to which vertex u belongs, we have also found a l l the 112 cliques to which vertex w belongs and it is a simple matter to determine the latter explicitly: for each occurrence of u in a cligue, replace i t by w. A supplementary but equally important advantage of a method which reduces the number of vertex subsets to be considered by finding complete subgraph eguivalent vertices is that it provides a means of representing a l l the cligues of the graph in a more concise manner than explicit enumeration. Given the vertex set V(G) of a graph G, let v ,v ,...,v be a set of CS eguivalent vertices, a l l of which are by definition adjacent only to vertices in A (v^), the set of vertices adjacent to v^. If we denote by C^ the set of maximal complete subgraphs induced on the subsets of A (v^) the Cartesian product {v ,v ,...,v } X C is precisely the set of a l l X 2 k 1 cligues of G containing one of the vertices v ,v,...,v . It 1 2 Ic is evident that this procedure could be extended so that the cligues of C^ were also expressed as a set of Cartesian products each one being determined by a set cf CS equivalent vertices and their common set of adjacent vertices defined on the subgraph induced on A(v ). As an example we may consider again the graph K (3,3,3) given in Fig. 2.1. The vertices 1,2, and 3 are CS eguivalent and are each adjacent to vertices 4,5,6,7,8,9. In the subgraph induced on this latter set of vertices, the vertices 4,5,6 are CS equivalent and each is adjacent to vertices 7,8,9. Since the vertices 7,8,9 are isolates they are also CS equivalent. Thus a l l the cliques of the graph are given by the expression: { 1,2,3}X{{4,5,6}X{7,8,9}}. 113 This is obviously a much more compact way cf defining the twenty-seven cligues of K(3,3f3). The primary drawback of implementing the technigue just described is the paucity of vertices which are CS equivalent in an arbitrary graph. Instead, we shall implement a procedure which uses a weakened form of the definition cf complete subgraph equivalence to group the vertices in a similar manner. DEFINITION 3^2^ Two vertices u and w are weakly CS equivalent if there exists a complete subgraph defined on some subset of vertices of G say u,v ,v ,...,v, such that the subgraph 1 2 j induced on w,v^,v^»-••#v. is also complete. It is clear from the definition that weakly CS eguivalent vertices are not necessarily similar, and that complete subgraph equivalent vertices are weakly CS equivalent. We now consider the properties of a set cf weakly CS equivalent vertices defined in the following way. Let v_^ be an arbitrary vertex from V (G) and let {v ,...,v } be a set cf 2 j vertices also from 7(G) such that each vertex v is adjacent i to a l l vertices in A(v^). All the complete subgraphs containing v^ including the cligues are induced cn {VjluA(v^). If {w ,w ,...,w ]£A(v ) induces a complete subgraph then 1 2 k 1 v ,w ,w ,...,w also induces a complete subgraph. Further, X X 2 k since v is adjacent to every vertex in A(v ) , i t is adjacent i 1 to ^ » w 2'* *"' Wk a D d n e n c e ^Vj _ ' "]_» H2 '" * *' Wk* induces a complete subgraph. Therefore v^ ,v. are weakly CS eguivalent. Most 1 14 importantly i t is the case that every cligue of G with vertices from the set {v^JuA(v^) is induced on {v ,v ,...,v ,}UA(v ). In other words, given any vertex set {v.,w ,...,w } inducing a complete subgraph, al l possible X X K vertices which could be used to extend that vertex set so as to induce a larger complete subgraph must be contained in (v1,v2,...,v.]uMv1). If we denote by the vertex sets of a l l complete subgraphs which are maximal on the subgraph induced on {v ,v ,...,v. }, and denote by V the vertex sets of a l l complete subgraphs which are maximal on the subgraph induced on A (v ), then the vertex sets determined by the Cartesian product X induce complete subgraphs which are maximal on This result has an alternative interpretation as a product of graphs. The join (see for example Harary [ 4 3 ] ) of two graphs G^ and G2» denoted G^ + G2# is the graph G defined on V (G^ ) U V(G2) such that every edge of G^ or G^ is an edge of G and for every vertex v in V (G^ ) and vertex w in V (G^ ) , (v,w) is also an edge of G. Let C^ be a complete subgraph which is maximal on the subgraph induced on [v ,v ,...,v. }, and let C^ be a subgraph which is maximal on the subgraph induced on A(v^). Then C^ + C2 is a cligue of G. Re illustrate the determination of the cligues of the graph by finding weakly CS eguivalent vertices in the following example. Consider the graph of Fig. 2. 9,. It is evident by inspection that no pair of vertices exists which is 115 complete subgraph eguivalent. Instead we define two sets cf vertices by first choosing arbitrarily vertex 1 and defining one set to be A(1) = {2,6}. Now vertices 3 and 5 are also adjacent to {2,6} so the second set is {1,3,5}. By repeating this procedure on the subgraphs induced on the first of these vertex sets we discover that {2,6} can be separated into two sets expressible as a Cartesian product {2}X{6} corresponding to a complete subgraph of order 2. The vertex set {1,3,5} however consists of two components, an isolate 1, and a complete subgraph cf order 2 defined on {3,5} and expressible as {3}. The complete subgraphs of the subgraph induced on {1,3,5} are induced on vertex sets {1} and {3}X{5} and hence some of the cligues of G are given by {{ 2 }X{ 6 } }X{ 1, { 3 }X{5}} which corresponds to the cliques (261) and (2635). It is clear that not a l l the cliques have been found for we have not yet examined vertex 4. We therefore determine two new sets {3,5} and {2,4,6} in the same way, and the processing of their induced subgraphs yields {{ 3 }X{ 5 } }X { 4,{ 2 }X{ 6 }} giving us the third subgraph (354). This example illustrates the principal drawback of this procedure for enumeration, namely the generation of redundant cligues. We have encountered this type of problem in nearly a l l of the algorithms previously discussed. The most usual means of overcoming this problem has teen to simply exauine each vertex of the induced subgraph to see i f there exists some vertex not in the set, yet adjacent to a l l vertices in the set. Such a mechanism is clearly not applicable in this case. Alternatively, Peay in his algorithm as criginally 116 described, maintained a l l cligues in a stack which he could compare with newly determined maximal complete subgraphs to see whether i t had been found before. Although more directly applicable to our situation, this method is net useful since we have no explicit representation for each cligue with which to compare. In addition we must make use of a possibly exponential amount of storage. This difficulty is partly overcome in the following algorithm by making use cf information pertaining to the current derivation path in the tree of derivations in a manner described in the next section. SUBGRAPH EQUIVALENCE ALGORITHM Two stacks are used in the algorithm. Stack 1 consists of all vertex sets derived in the development of the current derivation path except those from which the current set is derivable. Stack 2 consists of a l l vertex sets directly derivable from the last vertex set in the derivation path. BLOCK A: Initialization procedure. g-TEP.ll v (G) be the set of a l l vertices in the graph and init i a l l y set both stacks to be empty. STEP2: Call recursive procedure ENUM(V(G)) defined in BLOCK B. The order and adjacency matrix of G are defined globally tc ENUM On return, stop. BLOCK B: Recursive procedure ENUM (V (G)). STEPI^ Choose a vertex v in V (G)and define a vertex set F 117 equal to the union of v together with a l l vertices not adjacent to v. STEP2:. Choose a vertex w from F and define vertex sets H1 = V(G) o A (w) and H2 egual to the set of a l l vertices in F adjacent to every vertex in HI. STEP.3: If the vertex set H1 «-> H2 is contained in a vertex set previously defined during this iteration then go to STEP7. STEP4: If the vertex set H1 u H2 contains a previously defined vertex set during this iteration then replace that vertex set by H1 u H2 on stack 2. STEP5: Compare H1 U H2 with a l l vertex sets generated during previous iterations in the development cf the current derivation path other than those vertex sets from which H1 U H2 was derived. If H1 U H2 is contained in some previous such set then delete i t from stack 2. Otherwise place the new set on stack 1. STEP6:, A new pair of vertex sets has been found. Stack 2 contains their union as well as the vertex w used to define them. STEP7:. Delete w from F. If F is not empty then go to STEP2. STEP8: If no new vertex sets have been added to stack 2 this iteration then return. STEP9 Choose a pair of sets H1 U H2 from stack 2 together with their defining vertex w. Remove this set from stack 1. STEP IO.: If H1 is empty then print vertex w and go to STEP 14. STEP VI:. Call recursive procedure ENUH (H2). STEPJ22 Print 11X" . (The maximal complete subgraphs of H1 U H2 will be given by the Cartesian Product of the results 118 upon return from calls in STEP 11 and STEP 13. STEP13^ Call recursive procedure ENDM ( 1 ) . STEPJHjj. Return H1UH2 and w to stack 1 but delete i t from stack 2. If stack 2 not empty then go to STEP9. STEg15i Return. As previously described, the algorithm finds the cligues of the graph by determining sets of weakly CS equivalent vertices in a particular way. A new vertex set whose vertices have been partitioned into two sets of weakly CS equivalent vertices is determined by choosing a vertex v from a set F and defining the two blocks H1 and H2 of the partition according to STEP2. The set F consists of a vertex v and all vertices of the induced subgraph on the current set of vertices, V(G), under consideration not adjacent to v. This set insures that a l l cligues will be found and was employed in the Harary-Ross algorithm and the Bron-Kerbosch algorithm as well as our own seguential algorithm previously discussed in Chapter 2. We are thus guaranteed of finding a l l the cliques and it is therefore only necessary to minimize the possibility of finding redundant cliques. As we have mentioned, this is net a simple problem because of the nature of the representation being exploited in our algorithm. The technique employed is tc keep track of a l l vertex subsets from which a newly determined vertex subset could possibly be derived. Tc dc this i t is sufficient to keep 'track of only the i n i t i a l nodes of a l l possible branches in the derivation tree which deviate from the path of derivations we have taken to reach the current 1 19 vertex subset under consideration. By definition a l l other vertex subsets derived during the execution will be contained in one of the vertex subsets represented by these ncdes. In a derivation path cf length kf let v be the defining vertex of the vertex set H1 U H2 represented by a node on the derivation path at distance i from the root. It is evident that the maximum number of vertex subsets generated during the generation of set H1 U H2 is n , -d(v , ) where n . , is the i - i i - l i - i number of vertices in the i-1st vertex subset in the path of derivations and v is its defining vertex. The maximum i-1 number of vertex subsets placed on stack 1 is thus } (1+(n. n-d(v. ))). i=l i - l i - l In Fig. 3.5 we illustrate the vertex subsets involved in such a seguence of derivations. Since our algorithm employs a depth before breadth technigue of development, i t is clear that stack 1 is not exponentially growing. Hence any gains in efficiency from such a representation will not be offset by inordinate storage reguirements. V(G) Fig. 3 .5 A PATH IN THE DERIVATION OF NON-SIMILAR CLIQUES 121 It is difficult to assess the overall efficiency of this algorithm because the choice of a "best" defining vertex at each iteration is a non-deterministic procedure. Clearly, the worst case occurs when the choice that is made results in an explicit enumeration of every cligue in the graph. In this situation, since the mechanism by which new vertex subsets are generated is similar tc that of the Bron-Kerbosch or our sequential algorithm for the enumeration of cliques, a one-one correspondence can be made between the nodes of the derivation tree of this new algorithm with either of the previously discussed seguential algorithms. k Because a search of as many as £.:(n- n-d(v. , )) elements ^ X"*X X—X in a stack must be made (see Fig. 3.5) for each new vertex subset in addition to its generation, i t is evident that the time reguired for one iteration will be longer than that reguired by the seguential method. Since we have employed the same techniques of our previous algorithm to the generation of new vertex subsets, the time reguired for one itera-tion is k proportional to 51 (n-d (v. )) • T (n) where T (n) is the time 1=1 1 reguired for one iteration of the seguential algorithm of Chapter 2. When, however, we can take advantage of the weak CS-eguivalence of vertices to minimize the number cf vertex subsets generated, the maximum efficiency is realized by the greatly reduced derivation tree. This is clearly evident for any complete k-partite graph K(m,m,...,m) . Here, each block X ^ x \ of m. vertices corresponds to a set of complete subgraph 1 2 2 eguivalent vertices, and hence also a set of weakly CS eguivalent vertices. The derivation tree for K (m , m , . . . , ra ) 1 2 Ic using our new algorithm is linear, as only one vertex at each iteration defines a new subset. We illustrate such a derivation for K(3^ in Fig. 3.6, where the vertices of block V are labelled (i-1)m+1, (i-1) m+2,..., (i-1) m + m, i=1,2,3,U, with m = 3. For the complete k-partite graph K(mk), the number cf nodes in the derivation tree determined by our algorithm is mk+2(k-1)+1 for k > 1 and 2(k-1) + 1 for k = 1. Obviously then the derivation trees are smallest for complete k-partite graphs. As mentioned previously the worst case to be encountered occurs when there are no weakly CS eguivalent vertices in the graph and conseguently cligues are enumerated explicitly. The derivation tree may be used to determine the number of cligues in the graph. If we again examine Fig. 3.6, edges of the tree incident to a common vertex have been related by by the symbols "X" or "," according to whether the sets determined in that derivation can be combined in a Cartesian Product to obtain a subgraph of the original graph G. If not then their union (denoted by ",") is a subgraph of G. We illustrate this notation with the example of Fig. 3.7, a derivation of the cligues of Fig. 2 . 9 : . Using such a notation, the cligues are given by the expression: {1 X [2 X 6}},{3 X {{2 X {5 X 6)},{4 X 5}}}. From this example one can see that the cligue (126) has been explicitly defined while those containing the vertex 3 123 (ie. (2356) and (345)) are grouped together. It is evident from Fig. 3.7 that there cannot be any edges not incident tc the root of the derivation tree which are related by a •*," for a graph whose cligues are a l l determined explicitly. Since the number of vertex subsets generated from the root is at most n-d(v) where v is a vertex of minimum degree, such a graph has fewer than n cligues. Hence a l l graphs having more than n cligues have some vertices which are weakly CS eguivalent in the induced subgraph defined on some vertex subset; therefore some improvement over a seguential algorithm can often be obtained by reducing the number of vertex subsets that must be considered. Fig. 3-6 DERIVATION TREE FOR K(3 V(G) / \ / \ 5* 6 Fig. 3.7 DERIVATION TREE FOR FIG. 3.1 I 126 CHAPTER EFFICIENT DETECTION CF CLIQUES 4.1 INTRODUCTION In the previous two chapters we have explored some of the ways in which cliques can be detected in graphs. We have also examined how various properties associated with graphs might be used to improve the efficiency of such algorithms. The major observation to be made is that i t is not at a l l clear how one might devise an efficient cligue detection algorithm even to detect cliques of a particular order. In this chapter, however, a procedure for the detection of such cliques is proposed which can be proved to be an efficient algorithm for a particular class of graphs and for which no counterexample has yet been found for general k-partite graphs. An important application of cligue detection in graphs is motivated by the fact that i t is possible to represent a well formed formula of the propositional calculus in disjunctive normal form as a k-partite graph where k is the number of conjuncts in the sentence. In the survey of Chapter 1 we mentioned briefly the efforts of Cock, Karp and Lawler, among others in developing a taxonomy of combinatorial problems. In particular we noted an important result of Cook's which relates the tautology problem to a number of other important combinatorial problems. An extensive l i s t of these problems has been prepared by Karp £49]. We shall use his notation to define the concepts reguired in describing the eguivalence of a k-partite graph to a well-formed formula in disjunctive 127 normal form. 4.2 CLIQUE DETECTION AND SATISFIABILITY He turn now to a consideration of the "Satisfiability Problem" as defined by Karp [49] and its solution through the detection of cligues in a graph as suggested by Mowshowitz [63]. DEFINITION JKARP]_J. The satisfiability problem is defined as follows: Given as input the clauses C^ ,C 2, ...,Cp, cf a well-formed formula in conjunctive normal form, does there exist a set S c { x^ ,x2,. . . ,xn<; x.^ ,x 2 # . i . ,xn } such that a. ) S does not contain a complementary pair of literals and b. ) SOC * $ for k=1,2,...,p. He are thus given the well - formed formula A = C r\ c <^C and asked to decide whether or not i t is 1 2 p satisfiable. To do this we convert its negation to disjunctive normal form. Suppose ~A is a tautology. Then for a l l possible assignments of truth values to the variables cf ~A, ~A is true and conseguently A is false for a l l possible assignments. Therefore A is satisfiable i f and only if ~~k is not a tautology. It is the disjunctive normal form cf ~A that we shall represent by a graph. Let S = D UD o . . . o D, be a sentence of the 1 2 k propositional calculus in disjunctive normal form with each conjunct D = a n a r» ...r» a. where a. is a l i t e r a l . 1 ±1 i 2 1m x3 128 Define a k-partite graph G as follows. Each vertex in V (G) corresponds to a literal of S, there being as many vertices as there are literals of S. The unordered vertex pair (v ,v ) a b corresponding tc literals a and b is an edge of G i f and only i f a is not the complement of b, and a and b are net both members of the same conjunct. Thus to each conjunct of S there corresponds a vertex block of G consisting cf mutually non-adjacent vertices. This representation can be used to determine whether or not S is a tautology. The decision rule is: THEOREM iiilj . S is a tautology i f and only i f there does not exist a cligue of crder k in the corresponding graph G , k being the number of conjuncts in the disjunctive normal form. Proof: A cligue of order k in G exists i f and only i f there exists a selection of literals, one from each conjunct of S such that no literal and its complement are both contained in the selection. If such a selection exists then we can assign the value 0 (false) to each of the literals in the selection and hence negate the well-formed formula. On the other hand i f such a selection is not possible this corresponds to the fact that no such assignment 1 to the literals of the well-formed formula can be made and hence i t must be a tautology. QED. The object cf this chapter is to describe an algorithm which provides an efficient heuristic for determining whether a cligue of order k exists in an arbitrary k-partite graph. Such an algorithm can then be employed through theorem 4.1 as 129 an efficient solution to the tautology problem. Before presenting such an algorithm i t i s necessary to define and discuss a collection of vertex sets determined by the algorithm for a k-partite graph G which provides the mechanism for detecting the existence of cligues cf order k. The importance of these vertex sets will be established in a subsequent theorem. First, however, we shall assume that we are given a k-partite graph G with its vertex set V(G) partitioned into k blocks V ,7 ,. •.,V of mutually non-X *L K adjacent vertices. We define wf(u,w), i$s$k-2 to be be a subset of block associated with an edge <u,w) such that Wt(u,w) f 4 , m ai=1,2,...,t; t=1,2,...,s-1. If i=s then Ws(u,w) = V n A(u)nA(w) where A(u),A(w) denote s s the adjacency sets of u and w respectively. Else for i < s W S (u,w) is the set of all vertices v in 7. i i i such that (a) v.e W?_1(u,w)0 f fT^f S^sT* ^ W^ 1 (u#v )OwT _ 1(w,v l l L r=i+lLvr*W°(uw) I r' I r * J (b) W1"1 (u,v )0 w ^ ( w ^ ) *§ for j=1,2,...,i-1. It is evident from the definition that for any particular value of s a family of sets associated with an edge (u,w) is determined in the order Ws (u, w) ,M s (u, w) , . . . ,(u,w) . This S S*~X X order is a conseguence of the fact that W?(u,w) is dependent upon W.s,(u,w) ,W.sJu,w) ,... ,WS (u,w). A number cf properties XX X £ s associated with this family of sets may be readily determined from the definition: PROPERTY Jz W^ (u,w) <= W S^1 (U, w) . 130 PROPERTY 2: W?(u,w) not empty implies that W.s. (u,w), X X J j=1,i,...,s-i, not empty. PROPERTY 3j_ Every vertex in w? (u,w) is adjacent to u and w. 4.4 CLIQUE DETECTION ALGORITHM The algorithm proceeds by constructing the vertex sets (u,w), s=1,2,...,k-2 for each edge in the graph G. An edge (u,w) is deleted from the graph whenever (u,w) becomes empty for some i=1,2,...,s. After the sets w k~2 (u,w),i=1,2,...,k-2, have been constructed for edges remaining in G connecting vertices in blocks V and Vk , and if at least one such edge remains, then the sets H f (u,w) are redefined by iterating the above procedure. These iterations continue until one of two conditions occurs: (a) All edges have been eliminated from G. (b) The latest iteration resulted sin no further deletions. The following theorem establishes that condition (a) implies G contains no complete subgraph of order k. THEOREM 4.2z If G has a complete subgraph of order k then W k~2 (u,w) is not empty for some edge (u,w) of G. Proof: From the definition, the theorem is true for k = 3,4. Assume that for k = s i t is the case that K(1s) is a subgraph 131 of G implies Ws~ (u,w) is not empty for some edge (u,w) in E(G), where u,w are in V(K(1S)). Suppose further that K(1s 1) is a subgraph of G where V(K(1 s x)) = {v. ,v ....,v .,u,w}, v. 1 2 s-l 1 being a vertex from block V . Complete subgraphs of order s are defined on vertex sets: V (Mf3)) = { v1'V2"**'Vs-2 'U' "3 V (K(1s)) = {v ,v v ,v , ,u} 1 2 S-<£ s-l V (K(1s)) = { V V2" - - ' V 2 'Vs-1 'W }* Therefore from the induction hypothesis v is contained in each of WST2 (u,w),Wsr2 (u,v , ), and Ws~2 (w,v ,) . . Since 1 1 s-l 1 s-l v, ,v ....,v 0 ,u,w are mutually adjacent, v contained in 1 2 s-2 1 Ws~2(u,w) implies v^ is contained in 1r| [ W ^ ( u , vr) r\ H1""1 (w, ) ] for r = 2,3,... ,s-1. Hence by definition v is contained in W° (u,w). QED. The following lemmas and theorem show that i f there are at most two vertices per vertex block of G with degrees greater than 0 after a l l possible edge deletions have been made then G contains a complete subgraph of order k. LEMMA iJiJi Let V ,V ,...,V denote the vertex blocks of a k-partite graph G and suppose 'vj j = 1 f o r i=1,2,...,k. Then w k-2 (U # Hj n ot empty implies K(1k) is contained in G. Proof: By Property 2 W k"2 (u,w) is not empty for i=1,2,...,k-2. Since |V | = 1 W k~2 (u,w) = V . Let v be the vertex in i i i i block V . From the definition: 1 k-2 wk-2 (U,w) = {v } n [ [W r - l { u # v ) Wr-1 { W r V ) 3 n i i r=i+-l 1 r 1 r J 132 implies vi is adjacent to vr for r=i+1,i+2,...,k-2, and i=1, i , . . . ,k-2. Hence { vi 'v2 * * # * *vk-2 *U'WJ *s a s e t ° ^ mutually adjacent vertices. LEMMA 4.2^ If G is a k-partite graph such that each vertex block has exactly two vertices, then for any edge (u,w) k-2 k W (u,w) not empty implies K(1 ) is contained in G. 3 Proof: Let k=5 and suppose v^ is a vertex in (u,w). Then there exists v^ in (u,w) such that v^ is contained in 2 2 2 (u,w)rkH^ (u,v.j)r\W (w,v.j). It also follows from the 2 2 2 definitions of W.^ (u,w), 9^ (u,v,j), and (w,v^ ) that one can choose vertices x,y,z in such that complete subgraphs are induced on vertex sets {v^,x,u,w}, {^,y,v,j,u}, and { v1#z,v3 ,w }. Now since |V\| = 2, either x = y, x = z, or y = z. If x = y or x = z then x is adjacent to v and hence a complete subgraph is induced on {v^,x, v^,n, w}. If y = z then y is adjacent to w and a complete subgraph is induced on * v l ' Y ' V 3 ' U ' W J" Assume the lemma is true for k=s-1 and suppose v is a vertex in wf (u,w). Then by definition there exists v in i 1 s Ws (u,w) such that v.. is contained in s 1 W3"1 (ur w) r^- fi s~1 (u,vg)o»s^1 (w,vg). Since is a member of each of the sets W s"1(u,w) ,W s~1 (u,v ),and W s~1 (w,v ), by the JL -L S -L S induction hypothesis one can find vertex sets X, Y, and Z such that complete subgraphs of order s+1 are defined on 133 {v ,u,w}ux, {v ,v ,u}UY, and (v ,v ,w}uz. Each of the sets X X S X s contains one vertex from each of the vertex blocks V .V„,... ,V \ . 2 3 s-1 Let s = x n y , S = X H Z , and S = I n Z . Then S US u s X ^ J X *c j contains a vertex from each of blocks V_,V_,...,V , since 2 3 s-JL |V | = 2. Since each vertex in Sn is adjacent to every vertex i 1 in X and Y, i t is adjacent to every vertex in S2 and S^ and hence S US and S O S „ are a set of mutually adjacent 1 2 1 ' 3 vertices. Similarly each vertex in S2 is adjacent to every vertex in Z and hence S ^ S ^ is also a set of mutually adjacent vertices. Hence S ^ u s ^ s ^ induces a complete subgraph of order s-2. Further, by construction v ,v ,u, and w X s are adjacent tc every vertex in S^u s^U S^ and therefore G contains a complete subgraph of order s+2. QED. THEOREM 4..3:. If G is a k-partite graph with at most two vertices in each block, and i f Wk~2(u,w) is not empty for some edge (u,w), then G contains a complete subgraph cf order k. Proof: The result follows almost immediately from lemmas 4.1 and 4.2. If for some i Wk~2 (u,w) contains only one vertex v i l then as a consequence of the definition v^ is adjacent to every vertex in every other set Wk~2 (u,w), i + j . Suppose 3 there are r < k-2 sets having 2 vertices. By the method in lemma 4.2 we can find a complete subgraph of order r+2 defined on vertices chosen from these sets. By the remark above, each vertex in a set which it is the sole member is adjacent to a l l other vertices and therefore G contains a complete subgraph of 134 order k. QED. Finally, we note that the following cligue detection algorithm could be modified to work for a l l k-partite graphs. If condition (b) occurs and i f the condition of theorem 4.3 is not satisfied, then a cligue enumeration procedure can be applied to the subgraph of G which remains after nc further edge deletions occur. This method has been implemented for verification purposes and the results are summarized in Chapter 5. CLIQUE DETECTION ALGORITHM Let G be a k-partite graph with blocks , V"2,. .. ,Vfc . Denote by A (u) the adjacency set of vertex u. STEP 1: Set s to 1. Define graph HQ egual to G. STEP2: If E (H Jempty then stop—K{1k) is not in G. ~ s-l STEP3: Choose an edge (u,w) in E(H . ) where u and w are s-l vertices in V(H , ) - V . s-l s STEP4: Set ws(u,w) to v n A (u)rv A (w). s s STEP52 If s = 1 then go to STEP16. STEP6:. set i to s-1. STEP?: Set r to i+1. Set P to be empty. STEP8: Choose a vertex v in W^ (u,w). STEP9: If (u,v) and (w,v) are edges of H , and Wr~l(u,v)n Wr-1 (w,v) not empty then go to STEP12. i i STEP10: Set Ws(u,w) to Ws(u,w) - fv}. r r STEP11: If Ws (u,w) empty ,then set Ws(u,w) to be empty and gc r 1 135 to STEP 16. STEP 122 S e t p t o pU[ W ^ ( U f V j n W 1\1 (w,v) ] . STEP 13: If some v in Ws(u,w) has not been examined then go to r STEP8. STEP 14: If P not empty then set W3(u,w) to W3"1 (u,w)OP. Otherwise set W3(u,w) to be empty and go to STEP16. STEP152 Set i to i-1. If i>1 then go to STEP7. STEPI62 If W3(u,w) has not been computed for a l l edges (u,w) in E (H ), where u,w are in V(B ,)-V , then gc to STEP3. s -1 s -1 s STEP 17: Define graph H £ H ., as follows: s s-1 a. ) V(H ) is set to V (H n ) - V , s s -1 s b. ) (u,w) is an edge of E(H ) if and only i f (u,w) is s an edge of E (H ) and Ws(u,w) is not empty. s -1 1 STEPJ82 Set s to s*1. If s<k-2 then go to STEP2. STEP 19^ L e t G' b e t b e subgraph of G such that v is in V (G*) if and only i f there exists (u,w) in E (H ) such that v is in Wk~2 (u,w) for some i = 1,2, ,k-2. If G ' 4 G then let i G = G ' and go to STEP1. STEP20: If G* has at most 2 vertices in each block of degree > 0 then K(1k) is contained in G ; else a cligue must be verified by enumeration. As an example, consider the graph of Fig 4. 1. We summarize the results of the algorithm in TABLE 4.1. The subgraph defined by STEP 19 after 1 iteration is the complete subgraph of order 5. The second iteration defines sets for this graph as indicated in the table. Vertices which are deleted according to STEP 10 are indicated in parentheses. By STEP 19, the algorithm terminates after the second iteration. Fig. U.l GRAPH CONTAINING SUBGRAPH K(l^) ITERATION EDGE W 2 w2 W l w3 W3 W2 W3 (36) § (37) 2 (38) 2 (46) § (47) 2 (49) 2 1 (56) .1 (58) 1,2 (59) 1,2 (68) 1 (3) ,5 1 (69) 1 (U) ,5 1 (78) 1,2 3 2 (79) 1.2 4 2 (89) 1,2 5 1,2 6, (7) 5 1 (56) 1 (58) 1 2 (59) 1 (68) 1 5 1 (69) 1 5 1 (89) 1 5 1 6 5 1 TABLE 4.1 RESULTS OF THE ALGORITHM FOR FIG.4.1 138 4.4 TIMING CONSIDERATIONS Let G be a complete k-partite graph with m vertices in each block i = 1,2,...,k. For such a graph any choice of vertices v .,,v„ ,... ,v. with v. a member of V. is the vertex set 1 2 k l x of some K (1 ) contained in G. Further every edge in G is a member of some E(K{1 ) ) . G therefore constitutes the "worst" k-partite graph the algorithm can encounter. During iteration s a l l edges defined over V u V u . . . VJ V must be examined. There are raj (k-s) (k-s-1) S+l s+2 K 2 such edges. Fcr each edge (u»«) we must compute W s(u, w) , W s ; (u, w) ,..., wlf (u,w). Hs (u,w) can be determined in s s-l 1 s one intersection (that of rows u and w of the adjacency matrix treated as n digit binary numbers), while H?(u,w) is computed from the definition as follows, i i r»-l r-l S T S / \ W . (u,v ) W . (w,v ) is computed in m vrcW|(u,w) x r i r intersections, m-1 unions and m tests for emptyness for each r = i+1,i+2,...,s. The total computation time fcr wf(u,«) is therefore 1+(s-i) (3m-1), sc to compute a l l s sets reguires s-l 1 + Zl ((s-i) (3m-1) + 1) steps. The total number cf steps at i=l iteration s for a l l edges is thus given by: s-l (mf (k-s) (k-s-1)) (1 + n ((s-i)(3m-1) + 1) 2 i = l Since there are k-3 iterations reguiring these computations (iteration 1 only computes W^ (u,w) fcr ^ .2(k-1)(k-2) edges), the total number of steps in the computation is: k-2 m* (k-1) (k-2) + r m2 (k-s) (k-s-1) s(s-1)(3m-1) + s T s=2 T J - fm2 (k2-3k + 2)l r3m-1k3 + m + 1k2 - 3m-38k + 3 (3m-in I h 1 1 30 2 5 5 J 139 Hence an iteration of the algorithm is 0(k?) with leading coefficient m2 (3m-1). 120 Since at least one edge must be deleted during each iteration of the procedure, a crude upper bound on the number of iterations is given by the number of edges of the graph. In general it is difficult to obtain a sharper bound although empirical tests cn a large sample of graphs yielded none reguiring more than two iterations to reach a decision. In any case, the time reguired to execute the procedure remains bounded by a polynomial in k. 4.5 STORAGE CODS IDEBAT IONS Let G be the graph described in our discussion cf timing considerations. Both the adjacency matrix of G and the storage reguired for saving the W-sets place the greatest demand cn space. Each Ws(u,w) and row of the adjacency matrix can be i represented by bit strings of length m and mk respectively. Since edges with incident vertices in V will not be r considered after iteration r-1, such edges reguire storage for r-1 terms R r~:J",W ,..., W F"""''. An examination of the complete r-l r-2 1 r k-partite graph G with block size m reveals that m2 edges will have k-2 such terms computed for them (those with ene vertex in V and one in V ), 2m2 edges will have k-3 such k-1 k terms,..., (k-2)m2 edges will have only one such term. Since only the most recent values of any such term are ever reguired 2 b/ k-2 the maximum number cf terms is m2 £: i(k-i-1) = m.2/k\. The i « l 140 total storage reguirement in bits for these items is thus (mk)2 • £ 3 /*M. 4.6 IMPLICITIONS OF THE PROCEDUBE We have established that the procedure is an efficient algorithm for detecting k-cligues in graphs having at most two vertices per block. When used as a tautology testing procedure, this agrees with Cook's result[11] that well-formed formulae having at most two literals per clause can be determined to be tautologies in polynomial time. Moreover, if graphs having more than two vertices per block, each of degree > 0, can be reduced to graphs having at most two per block, then the procedure is s t i l l guaranteed to detect the existence of k-cligues in polynomial time. Thus, the algorithm accepts a larger class of well-formed formulae (represented as graphs) than the Davis-Putnam procedure [66]. In light of the work of Cook and Karp, the procedure provides a mechanism for greatly reducing the number of well-formed formulae that might require an exponential solution to the tautology problem. The construction of a specific counterexample might help to resolve the guestions raised by Cook and Karp concerning the exponential nature of the tautology problem. However, finding such a counterexample remains an open problem. 141 CHAPTER 5i EMPIRICAL OBSERVATIONS AND SUMMARY 5. 1 INTRODUCTION As has been observed in Chapter 2, the clique enumeration algorithms discussed there al l operate in essentially the same way, namely the development of a derivation tree from a node representing the i n i t i a l vertex set of the graph to ncdes representing the cligues of the graph. The thesis of Chapter 2 is that the size of the derivation tree developed by each algorithm applied to a Moon-Moser graph together with the order of computation for one iteration can be used as a measure of its efficiency. This efficiency therefore depends on the technique employed to develop new nodes of the derivation tree in as much as this procedure determines their total number. The algorithms examined in Chapter 2 a l l use different methods to generate new vertex subsets with the exception of the Bron-Kerbosch algorithm which was seen to develop the same derivation tree as the Reduced Redundancy algorithm. The purpose of comparison in this Chapter is to compare the actual performance of different methods for developing the derivation tree with the results of Chapter 2. The same set of data used to test the cligue enumeration algorithms was also used to test the efficiency of the clique detection procedure of Chapter 4 in order to determine hew much better its performance was than employing an ordinary enumeration algorithm. 142 5.2 THE TEST DATA In order to determine the performance of the algorithms, random graphs of various edge densities were generated for graphs having 9, 12, 18, and 21 vertices. Each algorithm was then run on the test data. For each graph and each algorithm the following statistics were recorded: a. ) The time required to find a l l the cligues, b. ) The number of vertex sets examined, c. ) The number of cligues found. Sixteen test graphs were generated and their orders, edge densities, and number of cliques are given in TABLE 5.1. The results of applying each algorithm to this set of graphs is given in TABLE 5.2 (time in seconds) while the actual implementations used may be found in APPENDIX B. Due to excessive computation time, the algorithms of Harary-Boss and Peay were not applied to some of the graphs. Because of the features of dynamic storage allocation and bit string manipulation, PL/1 was used as the programming language for the implementations of these algorithms. The programs were run on an IBM 360/67 Duplex system. The actual graphs may be found in APPENDIX C. graph vertices edge density cligues 1 9 0.4 8 2 9 0.6 10 3 9 0.8 17 4 9 1.0 27 5 12 0.4 12 6 12 0.6 19 7 12 0.8 31 8 12 1.0 81 9 18 0.4 34 10 18 0.6 39 11 18 0.8 69 12 18 1.0 729 13 21 0.4 43 14 21 0.6 58 15 21 0.8 144 16 21 0.9 392 TABLE 5.1 CHARACTERISTICS OF THE TEST GRAPHS HARARY-ROSS PE AY «S BONNER'S R. R. ALGORITHM ALGORITHM ALGORITHM ALGORITHM GRAPH TIME NODES TIME NODES TIME NODES TIME NODES 1 0.51 15 0.20 36 0. 15 27 0. 20 20 2 0.71 19 0. 18 46 0. 17 34 0.23 24 3 1.28 33 0.47 49 0.22 48 0. 28 31 4 1.97 53 0. 71 79 0.27 64 0.42 40 5 1.85 23 0.64 62 0.25 55 0.35 37 6 2.62 37 1.11 103 0.36 77 0.50 51 7 a.75 61 0.97 99 0.70 142 0.68 65 8 11.80 161 2. US 241 0.97 256 1. 18 121 9 8.06 67 2.48 182 0.55 117 0.85 78 10 9.48 79 2.91 221 0.71 143 1.36 91 11 18.67 137 11. 13 655 2.77 465 2.01 150 12 15.31 4096 14.03 1093 13 13.91 85 3.32 231 0.73 149 1. 54 108 14 16.32 115 6. 12 444 1.62 354 2.35 167 15 21.75 1490 4.80 1177 5. 36 374 16 15.03 4023 11.01 688 TAELE 5.2 COMPARISON OF ALGORITHMS 145 5.3 DISCUSSION OF THE RESULTS The results cf running each algorithm on the test data of TABLE 5.1 were used to verify the predictions made for the size of the derivation tree. For this purpose, the graphs numbered 4, 8, and 12 were , examined as they correspond to Moon-Moser graphs (see page 13) on 9, 12, and 18 vertices respectively. Because of the slowness of the Harary-Ross algorithm and Peay's algorithm for the smaller graphs no attempt was made tc obtain such results for the Moon-Moser graph on 18 vertices. For the test cases, the Harary-Ross algorithm generated the fewest nodes of the derivation tree in finding the cligues of a graph yet performed more poorly than Bonner's algorithm which generated the largest derivation trees. This is a conseguence of the fact that the method used by Harary and Ross to generate new vertex subsets while being very selective is also very time consuming as was seen in our analysis of this algorithm in Chapter 2. On the other hand, Bonner sacrifices efficient node generation in the derivation tree for a simple means cf defining new vertex subsets. In spite of its defects as observed by Augustson and Minker £ 5 ] , this method appears to be very successful particularly with small graphs as one would expect, since for such a graph the size of the derivation tree does not yet dominate the computation. This hypothesis is further supported by the observation that the Reduced Redundancy algorithm appears to be aost competitive with Bonner's algorithm for graphs of high edge 146 density. Such graphs have large numbers of cligues and hence their derivation trees will be large. Peay's algorithm performed significantly better than the Harary-Ross algorithm and would probably have been wore competitive with the other algorithms i f the size of derivation tree generated were reduced. Such a modification seems feasible i f one were to employ a technigue of examining a l l the non-adjacent vertices associated with a particular vertex at one time rather than step-by-step. This is an approach similar to that taken in the Reduced Redundancy algorithm. If the order cf computation for one iteration multiplied by the number of nodes in the derivation tree cf K(3 } derived in Chapter 2 is used as a rough measure of relative efficiency, then good agreement is obtained with the empirical results. although such an estimate does net indicate accurately how much better one algorithm is than another, the difference in magnitudes of these values, particularly with large graphs, provides some guide in choosing the most efficient algorithm. The inefficiency of cligue enumeration algorithms for finding the existence of a maximal complete subgraph cf order k in a k-partite graph is revealed by the results given in TABLE 5.3 where we compare the computation time of the decision procedure of Chapter 4 applied to the graphs of TABLE 5.1 with the best time available for the Reduced Redundancy algorithm. While i t is true that such enumeration procedures 147 could be modified tc terminate when a k-cligue had been discovered, this would not preclude the possibility of such cliques being discovered rather late in the enumeration. ENUMERATION DECISION ALGORITHM ALGORITHM GRAPH TIME TIME 1 0. 15 0.06 2 0. 17 0.07 3 0.22 0.09 4 0.27 0. 11 5 0.25 0. 15 6 0.36 0. 16 7 0.68 0. 20 8 0.97 0.28 9 0.55 0.64 10 0.71 0.77 11 2.01 1. 05 12 14.03 4.28 13 0.73 0.82 14 1.62 0.89 15 4.80 1. 54 16 11.01 3.98 COMPARISON OF TABLE 5.3 DETECTION AND ENUMERATION 149 5.4 SUMMARY The principal goal of this thesis has been to examine ways of detecting cligues in graphs. In particular, we have sought to derive an efficient algorithm fcr determining the existence of a clique cf order k in a k-partite graph. This goal was achieved in Chaper 4 for a subset of a l l graphs by adopting a different approach to the problem than that offered in Chapters 2 and 3. In these chapters we saw that i t was unlikely that we could solve our problem by employing any cligue enumeration algorithm. This was because such algorithms could be compared to tree searching processes which are known to be inefficient procedures. In Chapter 3 we attempted to exploit some properties of graphs which would allow us to group "similar" vertices together so that we would not have to examine each vertex in such a group individually. Two kinds of "similarity" were discussed: graph theoretic similarity, and complete subgraph equivalence. The latter offered some possibility cf improvement because of the concise notational representation. However no solution was found for avoiding the problem of multiply defining cligues although an attempt was made to minimize such behavior. Further, i t should be observed that the new type of notation while concise, does not readily display either the cligues or their orders. As the number of clique enumeration algorithms in the literature increases, i t is useful to carry out some empirical comparison of the performance of these procedures. It has been 150 our observation that the implementations of several of these algorithms have been extremely sensitive tc modifications which improve their efficiency.'Some of these modifications have been discussed previously and were included in the implementations. The efficiency of these implementations was examined in the previous section. While we have been able to suggest improvements to the algorithms discussed in this thesis, such changes have not really changed the basic approach and as a conseguence their computation times remain exponential. Therefore empirical tests of such procedures are possible only for moderately large graphs. For very large graphs, determination of the size of the derivation tree provides a more useful and less expensive method for assessing the performance of enumeration algorithms. In Chapter 4, an efficient algorithm was defined which detects the existence of k-cligues in certain k-partite graphs. Such graphs have the property that they can be reduced by the algorithm to graphs having at most two vertices of degree > 0 in each vertex block. The work of Cook and Karp suggests (but does not imply) that the algorithm will not work for a l l k-partite graphs. proving this may help in characterizing why the tautology problem does not appear to be solvable in polynomial time. 151 BIBLIOGRAPHY ____________ ( 1 Abraham, CT. "An application of clustering techniques to minimizing the number of interconnections in electrical assemblies " Some Problems In Information Science (M. Kochen ed.), 1965 pp. 252-265 2 Abraham, CT. "Techniques for thesaurus organization and evaluation " Some Problems In Information Science (M. Kochen ed.), 19657 pp.~137-150 3 Abraham, CT. "Graph theoretic techniques for the organization of linked data " Some Problems In Information Science (M. Kochen ed.), 19657 PF~ 229-251 4 Andrasfai, B. "New proof of a graph theoretic theorem of Turan " Magyar Tud. Akad. Hat. Kutato. Int. Kczl 7 (1962) pp. 193-196" 5 Augustson, J . Gary, Minker, Jack "An analysis of some graph theoretical cluster techniques " J . Assoc Commuting Machinery. 17 (1970) pp. 571-588 6 Bonner, R.E. "On some clustering technigues " IBM J . Research And Development 8 (1964) pp. 22-32. 7 Boyle, R.P. "Algebraic systems for normal and hierarchichal sociograms " Spcipmetry 32(1969) pp. 99-119 8 v Bron, Coen, Kerbosch, J.A.G-.M. "Finding a l l cliques of an undirected graph " Communications Assoc. Computing Machinery tc appear ~ ~ 9 Cartwright, D., Harary, F. "Structural balance: a generalization of Heider's theory " Psych. Rev. 63(1956) pp. 146-153 10 Coleman, J.S., MacRae, D. (jr.) "Electrcnic processing of sociometric data for groups up tp 1000 in size " Amer. Sociological Rev.. 25 (1960) pp. 722-727 11 Cook, S.A. "The complexity of theorem-proving procedures " Third ACM Symposium On The Theory Of Computing 1970, pp.~15T-T58 ~ 12 Corneil, D.G. "Graph isomorphism " Department Of Computer Science^ University Of Toronto Ph.D thesis, 1968 13 Corneil, D.G., Gotlieb, C C "An efficient algorithm for graph isomorphism " J . Assoc_. Computing Machinery 17 (1970) pp. 51-64 14 Culik, K. "On the chromatic decompositions and chromatic numbers of graphs " SpJLsjy Pri red. Fak. Univ. Brno. 1959 pp. 177-185 152 15 Davis, J.A. "Clustering and structural balance in graphs " Hunan Relations 20(1967) pp. 181-187. 16 Dirac, G. "Extensions of Turan's theorem on graphs " Acta^ Math. Acad. Sci. Hungar. 14(1963) pp. 417-422 17 Dirac, G. "On complete subgraphs and complete stars contained as subgraphs in graphs " Math. Scand. 12(1963) pp. 39-46 18 Dirac, G. "Extensions of the theorems of Turan and Zarankiewicz " Proc. Symp. Smolenice 1963 pp.127-132 19 Dirac, G. "Chromatic number and topological complete subgraphs " Canad. Math. Bull. 8(1965) pp.711-715 20 Doreian, P. "A note on the detection of cligues in valued graphs " Sociomgtrv 32(1969) pp. 237-242. 21 Erdos, P. "Remarks on a theorem of Ramsey " Bull.. Res. Council Israel Sect. F 7 (1957-1958) pp. 21-24 22 Erdos, P. "Graph theory and probability I " Canad. J . Math^ 11(1959) pp.34-38 23 Erdos, P. "Graph theory and probability II " Canad. MatJls. 13(1961) pp. 346-352 24 Erdos, P. "On a theorem of Rademacher-Turan " 111. J.. Math.. 6(1962) pp 122-127 25 Erdos, P. "On the number of complete subgraphs contained in certain graphs " Magjfar Tud. Akad.. Mat^ Kafat2i I S i i Kozl* 7(1962) pp. 459-464 26 Erdos, P. "On circuits and subgraphs cf chromatic graphs " Mathematika 9(1962) pp. 170-175 27 Erdos, P. yon complete topological subgraphs of certain graphs " Ann. Univ.. Sci. Budapest Eotvos Sect, "ath. 7 (1964) pp. 143-149 28 Erdos, P. "On the number of triangles contained in certain graphs " Canad.. Math. Bull. 7 (1964) pp. 53-56 29 Erdos, P. "Some remarks on Ramsey»s theorem " Canad. lath. Bull.. 7 (1964) pp. 619-622 30 Erdos, P. "On cligues in graphs " Israel J± Math. 4 (1966) pp. 233-234. 31 Festinger, L. "Time analysis of sociograras using matrix algebra " Human Relations 2(1949) pp. 153-158. 153 Folkman, J. "Regular line symmetric graphs " J,. Combinatorial Theory 3(1967) pp.215-232 Forsyth, E., Katz, L. "A matrix approach to the analysis of socioraetric data: preliminary report " Sociometry 9(1946) pp. 340-347. G i l l , A. "Analysis of nets by numerical methods " AssoS' Computing Machinery 7 (1960) pp. 251-254 Giraud, G. "An upper bound for Ramsey number (5,5) " C.R. Acad.. Sci. Paris 265 (1968) pp. 809-81 1 Gotlieb, C.C., Kumar, S. "Semantic clustering of index terms " Assoc. Computing Machinery 15(1968) pp. 493-513 Graver, J., Yackel, J. "An upper bound for Ramsey numbers " gull. Amer. Math. Soc. 72(1966) pp. 1076-1079 Graver, J., Yackel, J. "Some graph theoretic results associated with Ramsey's theorem " J.. Comb. Theory 4 (1968) pp. 125-175 Hajnal, A., Suranyi, J. "On the decomposition of graphs into complete subgraphs " Ann.. Dniv. Sci.. Budapest, fotyos Sect.. Math^ 1 (1958) pp.""T 13-12~ Harary, F., Norman, R.Z. Graph Thecry As A Mathematical Model In Social Science Ann-Arbor: Institute for Social Research, 1953 Harary, F., Ross, I. "A procedure for clique detection using the group matrix " Sociometry.. 20 (1957) pp. 205-215. Harary, F. "Graph theoretic methods in the management sciences " Management Science 5(1959) pp. 387-403 Harary, F. Graph Theory Addison-Wesley, Reading, Mass., 1969 Harary, F. "Graph theory as a structural model in the social sciences " Graph Theory And Its Applications Bernard Harris (ed.) ,"970, ppT 1-16 Hubbell, C.H. "An input-output approach to cligue identification " Sociometry.. 28 (1965) pp. 377-399. Ilzinia, I. "Finding the cliques of a graph " Aytomat. I. Vychisl.. Tekn^ #2 (1967) pp. 7-11. Johnson, S.C. "Hierarchichal clustering schemes " Psychometrika 32(1967) pp. 241-254 154 Kalbfleisch, J. "On an unknown Ramsey number " Mich. Math. 13 (1966) pp385-392 Karp, R.M. "Reducibility among combinatorial problems " Proceeding Of The IBM Conference On The Complexity Of Computations Plenum Press, New~York, l972~ ~ ~ Kochen m. Some Problems In Information Science. Scarecrow Press, new york, 1965. Lawler, E.L. "Electrical assemblies with minimum number of intersections " IRE Trans. EC-11(1962) pp. 86-88 Lawler, E.L. "Polynomial bounded and (apparently) non polynomial bounded matroid computations " Proceedings Of The NYO-ONR Symposium On Combinatorial Algorithms ~ (tc appear7~Prentice-Hall, New York,~1972 Lawler, E.L. "Matroids with parity conditions: a new class of combinatorial optimization problems " Mathematical Programming submitted 1972 Luce, R.D., Perry, A.D. "A method of matrix analysis of group structure " Psychometrika. 14(1949) pp. 95-116. Luce, R.D. "Connectivity and generalized cligues in sociometric group structure " Psychometrika. 15(1950) pp. 169-190. Meetham, A.R. "Algorithm to assist in finding the complete subgraphs' of a given graph " Nature 21 1 (1966) p. 105 Meetham, A.R. "Graph separability and word grouping " Proc. Assoc. Comp. Mach. 2,1st National Conference 1966 pp. 513-574 Mendelson, E. Introduction To Mathematical Logic Van Nostrand, Princeton, N.J.,"l964. ~ ~ ~ Moon, J.W. "On the number of complete subgraphs of a graph " Canad. Math. Bull. 8(1965) pp. 831-834 Moon, J.W., Moser L. "On cligues in graphs " Israel J_. Math.. 3 (1965) pp. 23-28. Moon, J . "On independent complete subgraphs in a graph " Canad. J. Math. 20 (1968) pp. 95-102 Moreno, J.L. Who Shall Survive? A New Approach To The Problem of Human Inter-Relations Beacon House,~New~York, 1934 Mowshowitz, A., ( private communication ) 155 64 Mulligan, G.D. "Algorithms for finding cliques of a graph " Department Of Computer Science^ University Of Toronto Technical report Mo. 4^, May 1972 65 Mulligan, G.D., Corneil, D.G. "Corrections to Bierstone's algorithm for generating cligues " Assoc. Computing Machinery 16(1972) pp. 244-247 66 Putnam, H., Davis, M. "A computing procedure for guantification theory" J. Assoc. Computing Machinery 7 (1960) pp. 201-215 67 Nordhaus, E.A. "On the density and chromatic numbers of graphs " Lecture Motes In Mathematics. 110(1969) pp. 245-249. 68 Peay, E.R. (jr.) "An iterative cligue detection procedure " Mich. Math. Psych. Prog^ 4 (1970). 69 Peay, E.R. (jr.) "Monmetric grouping: clusters and cligues " Mich. Math. Psych., Procj., 5(1970) 70 Ramsey, F. M0n a problem of formal logic " Proc., Lond. Math. Soc. 30 (1930) pp. 264-286 71 Rose, M.J. "Classification of a set of elements " Computer J. 7 (196 4) pp. 208-211. 72 Sauer M. "A generalization of a theorem of Turan " J. Comb. Theory 10(1971) pp.109-112 73 Sims, Charles C. "Graphs and finite permutation groups " Mat!u Zeitschr^ 95 (1967) pp 76-86 74 Spencer, J.H. "On cligues in graphs " Israel J^ Math. 9 (1971) pp. 419-421. 75 Turan,: P. "Eine , extremalaufgabe aus der graphentheorie " Mat. Fiz. Lapo_k 48(1941) pp. 436-452 76 Turan, P. "On the theory cf graphs " Cc1log. Math. 3<1954) pp. 19-30 77 Turner, James "Point-symmetric graphs with a prime number of points " JL. Combinatorial Theory 3(1967) pp. 136-145 78 Weiss, R.S., Jacobsen E. "A method for the analysis of the structure of complex organizations " Ajer.. Sociological Review. 20(1955) pp. 661-668. 79 Zelinka, B. "On the number of independent complete subgraphs " Publ. Math. Debrecen. 13(1966) pp. 9 5-97 APPENDIX A operation symbol time constant STORE < — \ PUSH,POP push,pop ADD,SUBTRACT COMPARE • \ M0LTIP1Y s UNION,INTERSECTION u, rv H COMPLEMENT -SUBSTRING substr INDEX index s PRINT print t 10 LIST OF OPERATIONS 1 5 7 APPENDIX B 158 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * THE MODIFIED HARARY-ROSS ALGORITHM * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * HAROSS:PROC(A, N) ; DCL (A(N), /* ADJACENCY MATRIX */ G, /*CURRENT VERTEX SET*/ CLIQ, /*COHPLETE SUBGRAPH VERTICES */ GTEMP) BIT (N) , (N, /* NUMBER OF VERTICES IN GRAPH*/ WT, /*NUMBEB OF VERTICES IN G */ VTX(N), /* LABELS OF VERTICES OF G */ R(N), /*ROW SUMS OF (A**2XA) FOR G*/ D(N)) /* DEGRES OF VERTICES OF G */ FIXED BIN, VSET BIT(*) CTL; /* STACK OF SETS */ DCL CTR FIXED BIN; CTR=0; G=G| ( G) ; PUT SKIP; PUT SKIP LIST ('THE CLIQUES ARE:*); START: /* DETERMINE THE VERTICES IN SUBGRAPH G * / IF G-'O'B THEN GO TO NEXT; GTEMP=GTEMP| ( GTEMP) ; WT=0; DO 1=1 TO N; IF SUBSTR (G,I,1) = '0»B THEN GO TO LP 1 ; GTEMP=GTEMPSA (I) ; WT=WT+1 ; VTX(WT)=I; LP 1: END; IF GTEMP = 'O'B THEN GO TO NEXT; /* CALCULATE ROW SUMS OF (A**2XA) AND DEGREES OF VERTICES OF G V R,D =0; DO 1=1 TO WT-1; DO J=I+1 TO WT; SUM=0; IF SUBSTR (A (VTX (I)) ,VTX(J) , 1) = • 1 • B THEN BEGIN; ^ D(I)=D(I) + 1; D(J)=D(J) + 1; DO K=1 TO WT; IF SUBSTR (A (VTX (I) ) 8A (VTX (J) ) ,VTX (K) , 1) = »1»B THEN SUM=SUM + 1 ; END; END; 159 R (I)=R (I) *SUM; R (J)=R (J) +SUM; END; END; / * SEARCH FOR UNICLIQUAL VERTICES */ MIN=1; DO 1=1 TO WT; IF R (I) = D (I) * (B (I)-1) THEN DO; /* UNICLIQUAL VERTEX DISCOVERED */ CLIQ=A (VTX (I) ) SG; SUBSTR (CLIQ, VTX.(I) , 1) = «1*B; / * IS THIS A MAXIMAL COMPLETE SUBGRAPH? V GTEMP=GTEMP| ( GTE MP) ; DO 12=1 TO WT; IF SUESTR (CLIQ,VTX (12) , 1) = »1«B THEN GTEMP=GTEMP6A (VTX (12) ) ; END; IF GTEMP=»0'B THEN PUT LIST(CLIQ); / * DELETE ALL UNICLIQUAL VERTICES IN TBIS COMPLETE SUBGRAPH FROM G. */ SUBSTR (G,VTX (I) , 1) = • 0 • B ; DO J=I+1 TO WT; IF SUBSTR (A ( VTX (I) ) , VTX (J),1)=*0'B THEN GO TO LP2; IF R(J)=R(I) THEN SUBSTR (G, VTX (J) , 1) = '0,B; LP2: END; GO TO START; END; ELSE IF R(MIN)>R(I) THEN MIN=I; ILP: END; /* END I LOOP */ / * NO UNICLIQUAL VERTEX IN G DEFINE TWO VERTEX SETS. SAVE ONE AND PROCESS THE OTHER. V ALLOCATE VSET BIT (N) ; GTEMP=G&A (VTX (MIN) ) ; SUBSTR (GTEMP,VTX(MIN) ,1) = M'B; VSET= GTEMP; DO J=1 TO IT; IF SUBSTR (GTEMP,VTX (J) , 1) = »0»B THEN VSET=VSET|A (VTX (J)); END; VSET=VSETSG; CTR=CTR+1; G=GTEMP; 160 GO TO START; /* GET A NEW SET FROM THE STACK. */ NEXT: IF CTR > 0 THEN DO; G=VSET; CTR=CTR-1; FREE VSET; GO TO START; END; FREE VSET; RETURN; END; /* END HAROSS PROCEDURE */ 161 ***************************** ********** * * * BONNER'S ALGORITHM * * * **** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * BON: PROC (G,N) ; DCL (G (N) ,A (N) ,C (N) ,U,T) BIT (N) r L (N) ; STEP 1: 1=1; C(1) = '0«B; C(1)= C(1); A(1) = »0'B; L(1) = 1; STEP2: IF SUBSTR (C (I) ,L (I) ,1) = '1'B THEN BEGIN; STEP3: C (I + 1)=C (I) 8G (L (I) ) ; SUBSTR (C (1+1) , L (I) , 1) = ' 0 * B; A (I+1)=A (I) ; SUBSTR (A (1+1) ,L (I) , 1) = » 1 «B; STEPU: L (I+1)=L (I)+1 ; 1=1+1; END; ELSE I (I)=L (I) +1; STEP5: IF SUBSTR (C (I) , L (I) ,N+1-L (I) ) = '0«B THEN GO TO STEP2; T=A(I) ; STEP6: IF C(I)=»0«B THEN STEP7: 1=1-1; IF 1=0 THEN RETURN; STEP8: U=»0«B; SUBSTR (U,L (I)+1 ,N-L (I) ) = SUBSTR (C (I) ,L (I) +1,N-L (I) ) ; IF (T | U) = T THEN GO TO STEP7; L(I) = L(I)+1; GO TO STEP2; END; /*END BON PROC */ 162 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * THE MODIFIED PEAY ALGORITHM * * * * * **** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * PEAY: PROC (A,N) ; DCL (A(N), /* ADJACENCY MATRIX */ G, /* CURRENT VERTEX SET */ GTEMP) BIT (N) , (N , /*NUMBER OF VERTICES IN GRAPH*/ WT, /* NUMBER OF VERTICES IN G */ VTX(N)) /*LABELS OF G'S VERTICES*/ FIXED BIN, 1 VSTORE BASED (VPTR), /* STACK OF SETS*/ 2 VNXT PTR, 2 VCTR FIXED BIN, 2 VN FIXED BIN, 2 VSET BIT (N REFER (VN)) , VHD PTR; /*POINTS TO STACK TOP*/ DCL P PTR; DCL CTR FIXED BIN; G=G| ( G) ; VHD=NULL; CTR=0; PUT SKIP; PUT SKIP LIST ('THE CLIQUES ARE: *) ; START:WT=0; /* DETERMINE THE LABELS OF THE VERTICES IN G V DO 1=1 TO N; IF SUBSTR (G,I,1) = '0'B THEN GO TO LP 1; WT=WT+1; VTX (WT) =1; LP1: END; / * FIND TWO NON-ADJACENT VERTICES IN G V DO 1=1 TO WT; GTEMP=A(VTX (I))6G; SUBSTR (GTEMP, VTX (I) , 1) = M « B ; IF GTEMP = G THEN GO TO FOUND; END; /* G IS A COMPLETE SUBGRAPH. DETERMINE WHETHER IT IS MAXIMAL. */ GTEMP=GTEMP| ( GTEMP) ; DO 1=1 TO WT; GTEMP=GTEMPSA (VTX (I)) ; END; IF GTEMP=«0,B THEN IF STACK NOT EMPTY THEN CHOOSE ANOTHER VERTEX SET FOR FDRTHER PROCESSING */ IF VHD = NOLL THEN DO; G=VHD-> VSET; P=VHD; VHD=VHD->VNXT; CTR=CTR-1; FREE P->VSTORE; GO TO START; END; RETURN; FOUND: /* DETERMINE TWO NEW VERTEX SOBSETS,SAVE ONE AND PROCESS THE OTHER. */ GTEMP= A (VTX (I) ) ; DO K=1 TO N; IF K = VTX (I) THEN IF SOBSTR (A (VTX (I)) ,K, 1) = '0 'B THEN GTEMP=GTEMP| A (VTX (I) ) ; END; SOBSTR (GTEMP,VTX (I) ,1)=«0'B; / * CHECK THE STACK TO SEE IF NEW SET "GTEMP" IS CONTAINED IN A PREVIOUS ONE AWAITING PROCESSING. V P=VHD; DO WHILE (P =NULL) ; IF (GTEMP|P->VSET) = P->VSET THEN GO TO START; P=P->VNXT; END ; CTR=CTR+1; ALLOCATE VSTORE; VSET=GTEMPSG; VNXT=VHD; VCTR=CTR; VHD^ VPTR; G=A (VTX (I) ) SG; SUBSTR (G,VTX(I) ,1) = '1»B; GO TO START; END; /* END PEAY PROCEDURE */ 164 * REDUCED REDUNDANCY ALGORITHM * * * * * * * * * * * 4 1 4 c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ENUM:PROC (A,N); DCL (A(N), /* ADJACENCY MATRIX */ Gf /* CURRENT VERTEX SET */ H, /* NEWLY DEFINED SET */ CSUB, /*COMPLETE SUBGRAPH TO BE EXTENDED BY VERTICES IN G */ CLQ, /* NEWLY EXTENDED CSUB */ F, /*SET OF DEFINING VERTICES */ GMX, CMX) BIT (N) , (V, /* VERTEX CHOSEN FROM F */ N) /* NUMBER OF VERTICES OF GRAPH*/ FIXED BIN; DCL CTR FIXED BIN; DCL VTEMP PTR; DCL VSET BIT (*) CTL; /* STACK OF SETS */ CTR=0; NN=N*2; G=G| ( G) ; CSUB= * 0•B; / * DETERMINE WHETHER THERE IS A VERTEX ADJACENT TO ALL VERTICES IN G|CSUB. IF YES THEN NC CLIQUE IS DEFINED ON G|CSUB SO CHOOSE A NEW VERTEX SET. V START: GMX=G|CSUB; NV=0; DO 1=1 TO N; IF (A(I) |GMX) = A(I) THEN GO TO NEWSET; END; V=INDEX (G,•1'B) ; / * F IS THE SET OF VERTICES IN G NOT ADJACENT TO V. */ F=GS ( A (V) ) ; LOOP: H=GSA (V) ; CLQ=CSUB; SUBSTR (CLQ,V,1)='1'B; IF H=«0'B THEN /* NO FURTHER VERTICES CAN BE ADDED TO CLQ, HENCE CLQ IS A COMPLETE SUBGRAPH. DETERMINE IF IT IS MAXIMAL. DO; NV=0; DO 1=1 TO N; IF (A(I)|CLQ) = A (I) THEN GO TO OUT END; OUT: GO TO NXTV; END; / * PLACE H AND CLQ ON THE STACK FOR FURTHER PROCESSING */ CTR=CTR+1; ALLOCATE VSET BIT (N N) ;. VSET=H||CLQ; NXTV: SUBSTR(F,V,1) = «0«B; SUBSTR (G, V, 1) = «0»B; V=INDEX (Ff * 1 ' B) ; IF V =0 THEN GO TO LOOP; /* ALL VERTEX SETS HAVE NOW BEEN DETERMINED FOR THIS ITERATION. CHOOSE A NEW SET FROM THE STACK FOR FURTHER PROCESSING V NEWSET: IF CTR <= 0 THEN RETURN; G=SUBSTR(VSET,1,N); CSUB=SUBSTR (VSET,N + 1, N) ; FREE VSET; CTR=CTR-1; GO TO START; END; /* END ENUM PROC */ I 166 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 4 4 * * 4 * 4 4 * * * * * * * 4 * * * 4 * * * * * * * * * ALGORITHM TO DETECT THE EXISTENCE OF A CLIQUE OF * * ORDER K IN A K-PARTITE GRAPH * * * ****************************4****444*4********************** KGRPH:PROC(A,N,M,K,MX); DCL A (N) /* ADJACENCY MATRIX */ BIT (N) , (M (K) , /* NO. VERTICES PER BLOCK */ MX, /* MAX. NO. VERTICES/BLOCK */ N, /* NO. VERTICES IN GRAPH */ K, /* NO. OF BLOCKS */ R,S,T,VX,VTX,P0S,P0S2,M1M2) FIXED BIN; DCL VB (K) FIXED BIN ; DCL AO (N) BIT (N) ; DCL U FIXED BIN; STEP0:A0=A; ITER-ITER+1; N1=N-M (1)-M (K) ; POS=M(1)+1 ; POS2=POS+M(2); M1M2=POS2; NSP= (N-M1M2) *N1+N-M (K) ; BEGIN; DCL MSUM (K) ; DCL (P,Q,TEMP,W (NSP,K-2) ) BIT (MX) ; W=«0'B; MSUM (1) = M (1) ; DO 1=2 TO K; MSUM (I) = MSUM (1-1) +M (I) ; END; DO S=1 TO K-2; DO I=POS TO N-M (K) ; / * CHOOSE AN EDGE (I,J) NOT YET DELETED FROM FURTHER CONSIDERATION. */ DO J=I+1 TO N; IF SUBSTR (A(I),J,1)=11'B THEN BEGIN; P='0»B; / * MAP THE VERTEX PAIR (I.J) ONTO AN INTEGER IJ. INITIALLY DEFINE THE SET W(IJ,S). */ IJ= (J-M1M2)*N1 + I; W (IJ,S)=SUBSTR (A (I) 6A (J) ,POS-M (S) ,M (S) ) ; /* DEFINE THE SET W(IJ,T) FOR T=S-1,S-2,...,1 AS A FUNCTION OF THE PREVIOUSLY DETERMINED SETS W (IJ,T+1) ,W(IJ,T+2) , . . . , W (I J , S) . V DO NT=1 TO S; 167 T=S+1-NT; IF T=S THEN GO TO SKP; DO R=T*1 TO S; VTX=INEEX (W(IJ,R),'1*B) ; DO WHILE (VTX = 0); VX=VTX+MSUM (R-1) ; IF SDBSTR (A (J) ,VX, 1) SSUBSTR {A (I) ,VX, 1) = *0 «B THEN BEGIN; V SUBSTR (W (I J, R) ,VTX,1)='0*B; GO TO SKP3; END; Q=W ( (J-M1M2) *N1+VX ,T) &W { (I-M1M2) +N1+VX ,T) ; / * IF THERE EXISTS A VERTEX VTX NOT ADJACENT TO ANY VERTEX IN W(IJ,T) THEN DELETE IT FROM FURTHER CONSIDERATION */ IF Q = 'O'B THEN P=P|Q; ELSE SUBSTR (W (IJ,R),VTX,1)='0»B; IF VTX >= MX THEN GO TO CHK4; TEMP=»0'B; SUBSTR (TEMP,VTX + 1,MX*1-VTX) = SUBSTR(W(IJ,R),VTX+1,MX+1-VTX); VTX=INDEX (TEMP, • 1 'B) ; SKP 3: END; CHK4; W (IJ,T)=W(IJ,T)&P; END; /* END R LOOP */ /* IF W(IJ,T) IS EMPTY FOR ANY T=1,2,...,S THEN DELETE EDGE (I,J) FROM FURTHER CONSIDERATION * / SKP: IF W(IJ,T)=,0,B THEN BEGIN; SUBSTR (A (I) ,J,1) = '0'B; SUBSTR (A (J) ,1, 1)= «0 «B; GO TO NEXT; END; END; /* END T LOOP */ END; /* END BEGIN BLOCK */ NEXT:END; /* END J LOOP */ END; /* END I LOOP */ POS=POS2; POS2=M (S + 2) +POS2; END; /* END S LOOP */ /* TEST WHETHER ALL EDGES HAVE BEEN DELETED FROM THE SET OF CANDIDATES. IF S=K-2 AND NOT ALL EDGES HAVE BEEN ELIMINATED, THEN TEST FOR FURTHER ITERATION. V DO I=N-M(K)-M(K-1) + 1 TO N-M (K) ; DO J=N-M(K)+1 TO N; IF W ( (J-M1M2) *N1*I, 1) = «0»B THEN 168 GO TO STEP19; END; END; CONDA:PUT SKIP LIST('NO K-CLIQUE'); RETURN; STEP19:DO 1=1 TO K; VB (I)=0; DO J=1 TO M (I) ; U=I*MX+J-MX; IF A0(U) = A (U) THEN GO TO STEPO ; IF A(U) = »0'B THEN VE (I) =VB (I) *1 ; END; END; CONDB:PUT SKIP LIST(»NO CHANGE'); DO 1=1 TO K; IF VB(I) > 2 THEN GO TO STEP20 ; END; PUT SKIP LIST ('K-CLIQOE EXISTS BY THEOREM U.3'); RETURN; STEP20:CALL ENUM (A f N) ; RETURN; END; /* END KGRPH */ 169 APPENDIX C graph 1 graph 2 000101001 000001100 000100101 101000101 000000110 110000110 011111000 101100000 000101001 000111100 000100111 111000111 010000110 110000110 011111000 101100000 graph 3 000111001 000111110 000 1 11111 111000111 11 1000110 111000111 011111000 011111000 101101000 graph 4 000111111 000111111 000111111 111000111 111000111 111000111 111111000 111111000 111111000 graph 5 000101001000 000001 100000 000100101110 101000101101 000000110110 110000110011 0111 11000100 000011000101 101100000111 0011 10111000 001011001000 000101011000 graph 6 000101001100 000111100111 000100111011 111000111000 010000110110 110000110101 011111000110 001111000111 101100000011 110011110000 011010111000 011001011000 graph 7 000111001110 0001 11110101 000111111111 111000111110 111000110111 111000111111 011111000011 011111000011 101101000111 111111001000 101111111000 011011111000 graph 8 000111111111 000111111111 000111 111111 111000111111 111000111111 111000111111 111111000111 111111000111 111111000111 111111111000 111111111000 111111111000 graph 9 graph 10 0000000001 11100111 000001011010111101 000000001111000001 000000010000010101 000000001110111001 010000000110100011 000000000101000001 010100000011100111 011010000000001C00 101011100000111111 11 1011010000010100 101000110000001000 110011010100000010 010110000110000010 010010001101000001 110100010110000000 100001010100110000 111111110100001000 000001111100110110 000001000111001101 000001001101100100 00000011100010011 1 000000010010101110 111000000011100011 100100000001111011 100110000011011110 101100000000110000 111000000000111011 010011010000010110 011001110000000001 101111101100000110 100000111110000001 010010110100000000 111110010010100000 100111110110100000 010101100101010000 graph 11 000101101111001110 000011111111111111 000001110111111111 100000111110100111 010000011111101111 111000001111111011 111100000111011100 011110000001101110 110111000001001011 111111100000111111 111111100000000110 111011111000001111 011111010100000111 011001100100000011 111011111101000000 111110110111100000 111111011111110000 011111001101110000 graph 12 000111111111111111 000111111111111111 000111111111111111 111000111111111111 111000111111111111 111000111111111111 111111000111111111 111111000111111111 111111000111111111 111111111000111111 111111111000111111 111111111000111111 111111111111000111 111111111111000111 111111111111000111 111111111111111000 111111111111111000 111111111111111000 graph 13 graph 14 000000000111100111010 000011010111101110001 000001000001010100000 000000101100111011100 010000000110100011001 011000000000010011100 000100000100000100000 010000000011110010100 000100000001000000101 1101 10100000001 101010 110010010000001110000 11 100001 1000001 101001 110110010000000011100 001101010000000C00011 010100000111000001001 11 1000100111000000000 110111010010100000001 1001 11000101101000000 000101011000100000000 100000000100010000000 010010001001011010000 0001111110010 10 111000 000011100011110011110 000001011011110110110 100000011111011011011 110000010001001101001 111000000011001011001 11000000000011001 1110 101110000011110111000 101100000000111111110 000100000000101100100 011101010000011001011 111111010000001010100 011000111100000111101 111100111010000011110 00011 1001 11 1000000001 101010011100100000110 11 1 101 11 1001 1 10000010 110111111010110000001 01100010110111010CCOO 011100101010010110000 000111000010101001000 graph 15 00011111111111001101 00001111111101101010 00000111111111011110 10000011110100111101 11000001011110111011 111000001111111111110 111 10000001011 11101 1 11111000001 1111111110 11110100000011011111 11111100000011111011 11 101111000001111011 11111101000000010111 10101111110000011111 11100111111000001111 0101 1111011000000111 00111111111110000011 11 1 11111111011000001 10110101100111100000 011011111111111100000 100111111111111110000 111110101111111111000 graph 16 000111111 000011111 000001111 00000 111 0000011 000001 ooooo ooooo ooooo ooooo ooooo ooooo ooooo ooooo 100000 1100000 11100000 111100000 1111100000 1111110000 1111111000
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On the efficiency of clique detection in graphs
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
On the efficiency of clique detection in graphs Dixon, Anthony Hunter 1973
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | On the efficiency of clique detection in graphs |
Creator |
Dixon, Anthony Hunter |
Publisher | University of British Columbia |
Date Issued | 1973 |
Description | This thesis examines the devices employed by various algorithms to search for maximal complete subgraphs in graphs. Also known as cliques, in Chapter 1 these subgraphs are seen to play an important role in graph theory, information retrieval, Sociometry, logic design, and computational complexity. The enumeration of cliques using the Harary-Rcss, Bonner, Peay, and Bron-Kerbosch algorithms is discussed in Chapter 2. The Reduced Redundancy algorithm is introduced, and the performance of the five procedures is assessed using an alternative approach to empirical testing. Each of the algorithms is shown to generate a "derivation tree" for a given graph whose size can be used as a measure of its efficiency. In Chapter 3, the possibility of exploiting vertex similarity is examined with a view to reducing the size of the derivation tree. As a consequence, algorithms are proposed for finding non-similar cliques. The concept of "complete subgraph equivalence" of vertices is introduced to develop a means for expressing the cliques of a graph as the Cartesian product of vertex subsets. An algorithm for detecting the existence of a clique of order k in a certain class of k-partite graphs in polynomial time is proposed in Chapter 4. This class consists of all graphs reducible by the algorithm to k-partite graphs having at most two vertices per block of degree greater than 0. This algorithm is shown to provide an efficient heuristic which can be used in a procedure for determining whether a well-formed formula is a tautology. The thesis is concluded with an empirical analysis of the techniques employed by the enumeration algorithms of Chapter 2. In addition, the efficiency of the Clique Detection algorithm is compared with that of the Reduced Redundancy algorithm. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-03-03 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0052075 |
URI | http://hdl.handle.net/2429/31952 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1973_A1 D59_6.pdf [ 7.25MB ]
- Metadata
- JSON: 831-1.0052075.json
- JSON-LD: 831-1.0052075-ld.json
- RDF/XML (Pretty): 831-1.0052075-rdf.xml
- RDF/JSON: 831-1.0052075-rdf.json
- Turtle: 831-1.0052075-turtle.txt
- N-Triples: 831-1.0052075-rdf-ntriples.txt
- Original Record: 831-1.0052075-source.json
- Full Text
- 831-1.0052075-fulltext.txt
- Citation
- 831-1.0052075.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0052075/manifest