Utilizing Graph Classes forCommunity Detection in Social andComplex NetworksbyJames NastosB.Math., University of Waterloo, 2000B.Ed., University of British Columbia, 2002M.Sc., University of Alberta, 2006A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE COLLEGE OF GRADUATE STUDIES(Interdisciplinary Studies - Optimization)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)April 2015c© James Nastos, 2015AbstractSocial network analysis is a cross-disciplinary study of interest to math-ematicians, physicists, computer scientists and sociologists. It deals withlooking at large networks of interactions and extracting useful or meaning-ful information from them. One attribute of interest is that of identifyingsocial communities within a network: how such a substructure should be de-fined is a widely-studied problem in itself. With each new definition, thereis a need to study in what applications or context such a definition is appro-priate, and develop algorithms and complexity results for the computationof these clusterings.This thesis studies problems related to graph clustering, motivated by thesocial networking problem of community detection. One main contributionof this thesis is a new definition of a specific kind of family-like community,accompanied by theoretical and computational justifications. Additionalresults in this thesis include proofs of hardness for the quasi-threshold editingproblem and the diameter augmentation problem, as well as improved exactalgorithms for cograph and quasi-threshold edge deletion and vertex deletionproblems.iiPrefaceMany of the results contained in this thesis have already appeared inpublished print.− The results in Section 2.3 are joint work with my supervisors Y. Gaoand D. R. Hare and appear in the Discrete Mathematics paper: “Thecluster deletion problem for cographs” [56].− Sections 2.4, 2.5 and 2.6 appear in the Social Networks paper: “Fa-milial groups in social networks” [110].− The diameter-related results in Section 4.3 have been published inthe Discrete Applied Mathematics paper [57], although we presentan alternate proof of the diameter-2 theorem in this thesis, originallyprinted in an earlier arxiv manuscript [107].− The work shown in sections 4.4.4 and 4.4.5 is my work done as part of aproject originating from Ajay Sridharan’s MSc thesis (UVictoria) [132]and appear in the publication [133].− The results in Chapter 5 were preliminarily published in the LNCSconference proceedings of the 2010 Conference on Combinatorial Op-timization and Algorithms (COCOA) [108] and later in journal formin [109].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 11.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Graphs and Networks . . . . . . . . . . . . . . . . . . 21.1.2 Complexity Theory . . . . . . . . . . . . . . . . . . . . 41.2 Computational Problems on Graphs . . . . . . . . . . . . . . 71.2.1 Max Clique . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Dominating Set . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Diameter Augmentation . . . . . . . . . . . . . . . . . 71.2.4 Graph Modification Problems . . . . . . . . . . . . . . 81.3 Graph Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Cluster Graphs . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Quasi-Threshold Graphs . . . . . . . . . . . . . . . . . 111.3.3 Cographs . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.4 P4-sparse Graphs . . . . . . . . . . . . . . . . . . . . . 131.3.5 Chordal Graphs . . . . . . . . . . . . . . . . . . . . . 151.3.6 Bipartite and Split Graphs . . . . . . . . . . . . . . . 16ivTABLE OF CONTENTSChapter 2: Social Communities . . . . . . . . . . . . . . . . . . 182.1 Existing Methods for Cluster Partitioning . . . . . . . . . . . 182.1.1 An Induced Subgraph Variation . . . . . . . . . . . . 192.2 Cliques and Beyond . . . . . . . . . . . . . . . . . . . . . . . 202.3 Cluster Deletion . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.1 On the Hardness of Cluster Deletion . . . . . . . . . . 232.3.2 Cluster Deletion on Cographs . . . . . . . . . . . . . . 252.3.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Quasi-Threshold Graphs as Communities . . . . . . . . . . . 302.4.1 Properties of Familial Groups . . . . . . . . . . . . . . 312.5 Hardness of Finding Familial Groups . . . . . . . . . . . . . . 342.5.1 Algorithms for Familial Groups . . . . . . . . . . . . . 392.5.2 Intra-communal Ranking . . . . . . . . . . . . . . . . 412.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.6.1 Zachary’s Karate Club . . . . . . . . . . . . . . . . . . 422.6.2 Communities in the Les Mise´rables Network and Char-acter Importance . . . . . . . . . . . . . . . . . . . . . 462.6.3 Lusseau’s Dolphin Network . . . . . . . . . . . . . . . 472.6.4 Grassland Species . . . . . . . . . . . . . . . . . . . . 472.6.5 College Football Network . . . . . . . . . . . . . . . . 492.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Chapter 3: Familial Groups for Hierarchical Organization . . 543.1 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . 543.2 Graph-theoretic Framework for Hierarchical Organization . . 553.3 Hierarchical Organization of Individuals in a Network . . . . 563.4 Familial Groups in Directed Networks . . . . . . . . . . . . . 583.4.1 Directed Networks with a Simple Underlying Graph . 593.4.2 Transitive out-tree editing without reversal operations 623.4.3 Weighted Directed Framework . . . . . . . . . . . . . 64Chapter 4: Network Measures: Diameter and Distribution . 674.1 Degree Distribution and Power Law . . . . . . . . . . . . . . 704.2 The Small-World Phenomenon . . . . . . . . . . . . . . . . . 704.3 Graph Diameter . . . . . . . . . . . . . . . . . . . . . . . . . 714.3.1 Diameter Augmentation is W [2]-hard . . . . . . . . . 734.3.2 Generalization . . . . . . . . . . . . . . . . . . . . . . 774.3.3 Additional Observations . . . . . . . . . . . . . . . . . 774.3.4 Diameter Augmentation for P4-sparse Graphs . . . . . 784.4 Network Models . . . . . . . . . . . . . . . . . . . . . . . . . 79vTABLE OF CONTENTS4.4.1 The Erdo˝s-Re´nyi Model . . . . . . . . . . . . . . . . . 804.4.2 The Watts-Strogatz Model . . . . . . . . . . . . . . . 804.4.3 The Baraba´si-Albert Preferential Attachment Model . 814.4.4 The Random k-tree Model . . . . . . . . . . . . . . . 814.4.5 Cliques and Higher-Order Structures . . . . . . . . . . 824.5 A Graph Classes Perspective on Graph Generation . . . . . . 87Chapter 5: Bounded Search Tree Methods . . . . . . . . . . . 905.1 Edge-Deletion Algorithms . . . . . . . . . . . . . . . . . . . . 925.1.1 Computing Cograph Edge-Deletion Sets on P4-sparseGraphs in Linear Time . . . . . . . . . . . . . . . . . . 935.1.2 A Bounded Search Tree Algorithm for Cograph Edge-Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . 955.1.3 A Bounded Search Tree Algorithm for Edge-Deletionto Trivially Perfect Graphs . . . . . . . . . . . . . . . 985.2 Vertex-Deletion Algorithms . . . . . . . . . . . . . . . . . . . 1015.2.1 Vertex-Deletion to Cographs . . . . . . . . . . . . . . 1025.2.2 Improvement using Hitting-Set . . . . . . . . . . . . . 1045.2.3 Vertex-Deletion for Trivially Perfect Graphs . . . . . . 1085.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111Chapter 6: Concluding Remarks . . . . . . . . . . . . . . . . . . 1146.1 Summary of Thesis . . . . . . . . . . . . . . . . . . . . . . . . 1146.2 The Key of Contributions of this Thesis . . . . . . . . . . . . 1156.2.1 Future Considerations . . . . . . . . . . . . . . . . . . 116Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119viList of TablesTable 2.1 Football conferences. . . . . . . . . . . . . . . . . . . . 51Table 3.1 Branching rules for transitive out-tree editing with theaddition, deletion, and reversal operations. . . . . . . . 62Table 3.2 Branching rules for transitive out-tree editing usingonly the addition and deletion operations. . . . . . . . 63Table 3.3 Branching rules for transitive out-tree edge-deletion. . 64viiList of FiguresFigure 1.1 Wolk’s characterization for quasi-threshold graphs. . . 11Figure 1.2 The forbidden induced subgraphs for P4-sparse graphs. 14Figure 1.3 Spiders. . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 2.1 The one-vertex extensions of a P3. . . . . . . . . . . . 21Figure 2.2 Small graphs . . . . . . . . . . . . . . . . . . . . . . . 25Figure 2.3 Max clique is not always kept intact. . . . . . . . . . 26Figure 2.4 A cotree (left) and its corresponding cograph (right). 28Figure 2.5 A cotree. . . . . . . . . . . . . . . . . . . . . . . . . . 29Figure 2.6 Two equally-weighted outcomes of modifying a graphto a closest (P4, C4)-free graph. . . . . . . . . . . . . . 30Figure 2.7 Freeman’s communities. . . . . . . . . . . . . . . . . . 30Figure 2.8 Quasi-threshold graph. . . . . . . . . . . . . . . . . . 31Figure 2.9 Reduction construction. . . . . . . . . . . . . . . . . . 36Figure 2.10 Obstacle to greedy method. . . . . . . . . . . . . . . . 40Figure 2.11 The degree of an actor does not determine its socialrank. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 2.12 Zachary’s karate club. . . . . . . . . . . . . . . . . . . 43Figure 2.13 Les Mise´rables. . . . . . . . . . . . . . . . . . . . . . . 44Figure 2.14 Dolphin network. . . . . . . . . . . . . . . . . . . . . 45Figure 2.15 Grassland species. . . . . . . . . . . . . . . . . . . . . 48Figure 2.16 Football network. . . . . . . . . . . . . . . . . . . . . 50Figure 2.17 Intracommunal ranking. . . . . . . . . . . . . . . . . . 52Figure 3.1 Krackhardt’s out-tree. . . . . . . . . . . . . . . . . . . 59Figure 3.2 Obstructions for out-forests. . . . . . . . . . . . . . . 60Figure 3.3 Hockey team rankings. . . . . . . . . . . . . . . . . . 65Figure 3.4 Edited results of the hockey network. . . . . . . . . . 66Figure 4.1 Centrality. . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 4.2 Clustering coefficient. . . . . . . . . . . . . . . . . . . 68viiiLIST OF FIGURESFigure 4.3 Network of hijacking terrorists from the 9/11 attacks. 72Figure 4.4 G2 constructed from G = P4. . . . . . . . . . . . . . . 75Figure 4.5 Example Networks from Palla et al. with power-lawcommunity sizes. . . . . . . . . . . . . . . . . . . . . . 83Figure 4.6 K-Clique Communities in a Random Partial 4-Tree. . 84Figure 4.7 Power Law Community Size in Partial k-trees. . . . . 85Figure 4.8 K-Clique Communities in BA-model Graphs. . . . . . 86Figure 4.9 K-clique communities in a BA-odel graph with m = 20. 87Figure 5.1 An impossible configuration for a C5 in an extendedP4-sparse graph. . . . . . . . . . . . . . . . . . . . . . 107ixAcknowledgementsFirstly, I would like to thank Unit 5 of UBCO for their years of hospi-tality, support, and opportunity. An interdisciplinary PhD has proven tobe a fruitful endeavor and I appreciate the effort put towards creating andevolving such a graduate program. I also greatly appreciate the opportuni-ties I had to teach undergraduates, especially the lecturing appointments Ihad in 2011 and 2012.I also want to thank everyone in the department I have had dealings with,from fellow graduate student officemates to professors who I marked examswith to the departmental support staff. I feel the relationships I made herewere collegial yet friendly and personal, and have contributed to an amazingexperience for me.In addition to departmental financial support, I would like to especiallythank Dr. Y. Gao and for his financial support throughout my PhD.I would also like to thank Dr. Y. Gao for his mentorship and encourage-ment over the duration of this PhD. His ability to identify the strengths ofhis students and how they can contribute to a research project has been thedriving force behind all the work in this thesis.And I would like to thank both my supervisors, Dr. Y. Gao and Dr. D.R. Hare for their collaboration with and supervision of all the work in thisthesis. They each brought a style of problem solving to our collaborationwhich resulted in successes I could not have achieved on my own. I alsomust thank them for their infinite patience.And finally, I would like to thank my family for their love, support and hugs.xDedicationDedicated to my father, Athanasios (Tommy) Nastos, who offered me(among so many other things) every academic opportunity I ever wanted.xiChapter 1IntroductionThe field of graph theory has generally been application-driven, evenfrom its earliest days. Indeed, the first theorem on graph theory from 1736was motivated by a path-finding problem over bridges [44] and its associ-ated algorithms have recently been used in DNA fragment assembly [119].While graph theory has found its place in pure mathematics, the branchof algorithmic graph theory has flourished alongside the development andimprovement of computer technology.Social networks are graph structures that had been studied by sociolo-gists long before the existence of online social networks. The popularity andaccessibility of online social networks in recent years has brought the notionof graph networks to the forefront of much research in sociology, physics,marketing, computer science, mathematics, epidemiology of infectious dis-eases, and more. The analysis of the structure of social networks has spreadbeyond that of an academic interest as companies (such as Klout) that spe-cialize in social media analytics are noticeably gaining popularity.Much of this thesis is inspired by the phenomenon of clustering in net-works. In social networks, clustering can take the form of communities,hierarchical organization or structural equivalence, as examples.The rest of Chapter 1 reviews background material on algorithms, com-plexity theory, graph classes and graph modification problems.Chapter 2 introduces and explores a new definition for modeling so-cial communities as familial groups through the graph-theoretic concept ofquasi-threshold graphs. Computational problems for finding familial groupsare given, including a problem reduction and exact algorithms. Some com-putational results are also included to strengthen the case for a social inter-pretation of familial groups.Chapter 3 offers a formal framework for potentially extending the ideaof familial groups to directed networks. Chapter 4 discusses the well-studiedphenomenon of power-law distributions observed in social network measuressuch as vertex degree and community size. An equally popular phenomenon,the small-word property, is discussed and a study on the computationalcomplexity of the relevant measure of graph diameter is given. Chapter 511.1. Definitionspresents the main algorithms of the thesis for various graph modificationproblems. We summarize the results and mention some open problems andfuture directions in Chapter 6.1.1 DefinitionsWe begin with a set of necessary definitions of terms that will be usedthroughout the thesis.1.1.1 Graphs and NetworksOur graph-theoretical definitions follow the reference book Graph Classes:A Survey [16], and we refer the reader to that book for any definitions orbasic concepts not included in this thesis.A graph is a structure composed of a set of objects or vertices and aset of edges, each edge joining two vertices. More formally, a graph is apair (V,E) where V is a set and E is a set of unordered pairs of V . Unlessotherwise noted in the discussion, a graph will be understood to be simple(no duplicate edges, no edge joining a vertex to itself). Vertices are alsosometimes referred to as nodes.An edge e that joins two vertices u and v will be written as e = uv orequivalently, e = vu, e = {u, v}. When e = uv is an edge of a graph, we willsay that e is uv, or that e is incident on u and v or that u is adjacent to v.We also say that the endpoints of e are u and v. The degree of a vertex v isthe number of distinct edges having v as an endpoint.The vertex set V of a graph G = (V,E) can be written as V (G) or VG,and similarly the edge set E of G will be written as E(G) or EG. Unlessotherwise defined, a convention we adopt here is that when there is only onegraph in consideration, the symbols n and m are reserved for n = |V | andm = |E|. Additional notation will be defined in cases where multiple graphsare involved.When edges are ordered pairs, E can be described as a subset of V ×V .Such edges are called arcs or directed edges, and a graph involving directededges will be called a digraph.A subgraph H = (VH , EH) of a graph G = (VG, EG) is another graphwhere VH ⊆ VG and EH ⊆ EG. An important type of subgraph that willbe used throughout this thesis is an induced subgraph, where if u and v arevertices in H, there is an edge uv ∈ EH if and only if uv ∈ EG. Given agraph G, we may specify a subset W of the vertex set V (G) and speak ofthe induced subgraph on W , which is the unique graph having vertex set21.1. DefinitionsW and every edge of E(G) which joins two vertices of W . The size of aninduced subgraph H is the number of vertices in H.The complement G = (V,E) of a graph G = (V,E) is another graphwith the same set of vertices, but where uv ∈ E if and only if uv /∈ E. Aset of vertices which are pairwise adjacent is called a clique and a clique iscalled a maximal clique if it is not contained as a subgraph in any otherclique. A maximum clique of a graph is a clique with the largest numberof vertices. A maximum clique is a maximal clique, but not every maximalclique is a maximum clique. A clique of size k will be called a k-clique. Thecomplement of a clique is a set of vertices with no edges, and it is called astable set or an independent set.A colouring of a graph is an assignment of a label to each vertex of thegraph such that no two adjacent vertices are given the same label. Theselabels will be called colours. When the colouring uses k colours to colour then vertices in this manner, it is called a k-colouring. The chromatic numberχ(G) of a graph is the smallest number k for which G has a k-colouring.Another way to describe this is that a graph is k-coloured if we partitionits vertices into k disjoint sets, each set inducing an independent set. Notethat if a graph contains a k-clique, then χ(G) ≥ k.A sequence v1, v2, . . . , vk of pairwise distinct vertices is a path of a graphG if vivi+1 ∈ E(G). The length of such a path is k− 1. A path is also calleda cycle if v1vk ∈ E(G). The length of such a cycle is k.If v1, v2, . . . , vk is a path (respectively, cycle) of a graph G, we say thatan edge e ∈ E(G) that joins two vertices of the path (cycle) is a chord ofthe path (cycle) if e is not an edge of the path (cycle). A path (resp. cycle)is chordless if it contains no chords. The endpoints of a path v1, v2, . . . , vkare v1 and vk.In this thesis, we denote the chordless paths and chordless cycles on kvertices as Pk and Ck, respectively. A P2 is simply an edge and a C3 is aclique on 3 vertices, also called a triangle.Two graphs G and H are called isomorphic if there is a bijective functionf from V (G) to V (H) such that uv ∈ E(G) if and only if f(u)f(v) ∈ E(H).When a subgraph H on k vertices is isomorphic to a path (resp. cycle),we will say that H is a Pk (Ck). When the induced subgraph on a set ofvertices is a Pk (Ck), we may refer to that vertex set itself as being a Pk(Ck).Two vertices are called connected if there is some path having those twovertices as endpoints. A component (or, equivalently, a connected compo-nent), C of a graph G is a maximal subgraph such that every two verticesin C are connected. We write C ⊆ G to denote that C is a subgraph of G.31.1. DefinitionsA connected graph which has no cycle is called a tree. A graph in whichevery connected component is a tree is called a forest.1.1.2 Complexity TheoryAlgorithm AnalysisA complete treatment of the analysis of algorithms can be found in astandard textbook such as [30], and we refer the reader to that text for anyadditional definitions and basic concepts not mentioned here. Our defaultassumption in analysing algorithms in this thesis is that an item of the inputdata can be accessed in constant time. This is realistic in practice providedthe data in use is completely stored in RAM (random access memory). Anyexceptions to this assumption will be explicitly stated.The following notation is standard in the analysis of function growthrates:function f(n) is O(g(n)) if there exists a positive constant c such thatf(n) < c ∗ g(n) for all sufficiently large n.function f(n) is Ω(g(n)) if there exists a positive constant c such thatf(n) > c ∗ g(n) for all sufficiently large n.function f(n) is Θ(g(n)) if f(n) is both O(g(n)) and Ω(g(n)).As is standard, algorithm runtime or space-complexity will be anal-ysed by worst-case analysis throughout this thesis unless otherwise stated.Growth rates for runtime or space will be expressed in terms of input sizeand/or input parameters. For a graph with n vertices and m edges, an al-gorithm is considered to run in linear time if it runs in O(m + n)-time. Apolynomial time algorithm is an algorithm whose runtime can be boundedby some polynomial function of the input size.An exponential time algorithm is an algorithm whose (worst case) run-time is Ω(cn) where n is an input parameter and c > 1 is a constant. Whenmultiple input parameters such as n and m are given and an algorithmruns in O(2mp(n)) where p(n) is some polynomial function, we may say thisalgorithm is exponential in m and also polynomial in n).ReductionsA problem P is called polytime solvable or simply polytime if there is apolynomial-time algorithm solving every instance of P . Unless P is stated41.1. Definitionsto be an optimization problem, P is understood to be a decision problemwhere every instance I of P is either a yes- or no-instance. The task of adecision problem is to determine whether a given instance is a yes-instanceor no-instance.A decision problem P1 can be shown to be at least as hard as a problemP2 if one can show that every instance I2 of P2 reduces to a problem instanceI1 of P1 in such a way that I2 is a yes-instance of P2 if and only if I1 is a yes-instance of P1. With such a reduction, any algorithm that solves problemP1 can be used to solve any instance of P2 by reducing the instance of P2 toan instance of P1 and using the algorithm that solves P1 to find the correctanswer.Let R be a reduction that transforms instances of problem P2 to aninstance of problem P1. If P1 can be solved with a polynomial time algorithmand the reduction can be computed in polynomial time, then composingthe reduction with the algorithm as described earlier will yield a polytimesolution for P2. If, on the other hand, there is no polytime solution to P2, yetP2 polynomially reduces to P1, then there cannot be a polytime algorithmsolving P1. This concepts allows for a definition of computational hardness:that if P2 is sufficiently hard, then P1 is at least as hard.When a proposed solution to an instance of a decision problem is givento affirm that this instance is a yes-instance (such as a set of vertices forClique or a truth assignment for 3-Sat) and this solution can be verifiedwithin polynomial time of the problem size, then the problem is said to bein NP.A problem P1 is NP-hard if every problem in NP polynomially reduces toP1. If an NP-hard problem is also in NP itself, then it is called NP-complete.The Cook-Levin Theorem [28] showed that the Boolean Satisfiabil-ity Problem is NP-complete, and Karp’s consequences [82] used polytimereductions to show that many other problems such as Clique, Indepen-dent Set, 3-Sat are also NP-complete.To date, there is no proof that an NP-complete problem cannot be solvedin polynomial time. Since a polynomial time solution to any one of thethousands of known NP-complete problems would imply a polytime solutionto every NP-complete problem, and that no polytime algorithm has beenfound to solve an NP-complete problem, it is generally accepted that theclass of NP-complete problems form a set of computationally-hard problems.This notion of computational hardness has allowed researchers to identify aproblem as NP-hard and then attempt to deal with solving these problemswith specific algorithm-design strategies, such as those from parameterizedcomplexity (next subsection), approximation algorithms, heuristics or other51.1. Definitionssearch strategies.Parameterized ComplexityGiven a problem P and an algorithm solving it, the algorithm is typicallymeasured in terms of the input size n of P . In the case that the input toa problem is a graph, the runtime is usually written in terms of n = |V |and m = |E|. Sometimes, in addition to a problem instance, the inputmay also accept a parameter k, in which case we say that the problem isparameterized.A problem which accepts a parameter k is called fixed-parameter tractable(or FPT) if there exists an algorithm solving the problem with runtimeO(f(k)nc) where n is the input size, f is a function of k which does not de-pend on n and c is a constant. When the value k is fixed, this is essentiallya polynomial runtime, and in particular for any fixed k it is the same poly-nomial (up to coefficients). FPT algorithms have received much attentionlately as many NP-hard problems have been shown to be fixed-parametertractable. For instance, the Vertex Cover(G, k) problem is a well-knownNP-complete problem but can be solved in O(2kn)-time by a simple searchtree method, selecting one of two possible endvertices of an uncovered edge.Note that this runtime is linear in n for any fixed k. Analogous to the ideaof NP-hardness, there is a measure of hardness for parameterized problemswhich depends on parameterized reductions.Parameterized problems are classified into a hierarchy of problem classes:FPT, W [1]1,W [2], . . . ,W [t], . . .. The weighted 3Sat(k) parameterized prob-lem asks whether an instance of 3Sat has a satisfying assignment withHamming weight equal to k. Weighted 3Sat(k) and Clique(G, k) are rep-resentative problems in W [1]. A parameterized reduction is a Turing re-duction taking time O(f(k)p(n)) where p(n) is a polynomial in the inputsize and f(k) is an arbitrary function which does not depend on n. Addi-tionally, an instance (x, k) reducing to an instance (x′, k′) must produce aparameter k′ dependent only on k and not on n = |x|. The class of all fixed-parameter tractable problems, FPT, is contained in W [1]. The completeproblems for W [1] are not expected to be in FPT, as it has been shown that1The class W [i] is defined as the class of parameterized problems that can reduce toa Weighted Circuit Satisfiability problem for weft-i circuits. Elaborating on thisdefinition diverts us away from our discussion of graph problems, and the reader is onlyrequired to know that there are representative graph problems for W [1] and W [2] for thepurposes of this thesis. Some representative problems for these classes are given in thefollowing paragraph.61.2. Computational Problems on Graphsif FPT = W [1] then all problems in NP can be solved in O(2o(n))-time [41].Weighted 3Sat(k) and Clique(G, k) are W [1]-complete. Dominat-ing Set(G, k) is W [2]-complete. Discussion of these results and a thor-ough introduction to parameterized problems can be found in [115]. Beingparameterized-hard also has implications for the approximatibility of theproblem. For instance, a problem which is W [1]-hard does not have anefficient polynomial-time approximation scheme (EPTAS) unless W [1] =FPT [103].1.2 Computational Problems on Graphs1.2.1 Max CliqueFinding a maximum clique is fundamental to many network clusteringmethods. The corresponding decision problem is formulated as follows:Problem 1. Clique(G, k)Input: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set S ⊆ V of size at least k such thatfor every u, v ∈ S, uv is an edge.This problem is NP-complete [82] and W [1]-complete [115].Since an independent set is the complement of a clique, the problem offinding a maximum independent set is computationally equivalent to themaximum clique problem on the graph complement.1.2.2 Dominating SetA dominating set in a graph G = (V,E) is a set of vertices S ⊆ Vsuch that every vertex of G is either in S or adjacent to some vertex of S.Dominating sets and their variants have appeared in the context of manyapplications, see for example [138] or [137].Problem 2. Dominating Set(G, k)Input: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set S ⊆ V of size at most k such thatfor every v ∈ V \ S there is some s ∈ S where sv is an edge.This problem is NP-complete and W [2]-complete [115].1.2.3 Diameter AugmentationFor any two vertices x, y in a graph, a (x, y)-path is a path having end-points x and y. Let dist(x, y) be the distance between x and y, defined71.2. Computational Problems on Graphsas the number of edges in a shortest path joining them, if one exists. Ifsuch a path does not exist, we may define dist(x, y) to be ∞ where everyreal number r has the property that r < ∞, but we do not explicitly needthis special case in this thesis. When intermediate vertices and edges ofa (x, y)-path are restricted to a particular subgraph H of G, the distancebetween x and y will be denoted distH(x, y). A graph G has diameter t ifmaxx,ydist(x, y) = t, and we write diam(G) = t.Graph diameter has been studied for its combinatorial structure as wellas for the wide-ranging applications. A decision problem formulation is:Problem 3. Diameter-t Augmentation(G, k)Input: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set S of at most k edges such thatresulting graph (V,E ∪ S) has diameter t.This problem was first shown NP-hard for t = 3 in [128] and was latershown to remain hard for the t = 2 case [95]. Note that the case of t = 1 istrivially polynomial-time solvable as adding an edge between every pair ofnonadjacent vertices is necessary.We prove in this thesis that Diameter-2 Augmentation is W [2]-hardin Section 4.3. We show that Diameter-t Augmentation is W [2]-hardfor t > 2 in [57].1.2.4 Graph Modification ProblemsAn important class of problems studied in this thesis can be describedas graph modification problems. The general form of a graph modificationproblem takes a graph and asks if it can be altered using at most k operationsso that the resulting graph has a desired property.An example of a graph modification problem is the Feedback VertexSet problem, which is one of the earliest-known NP-complete problems [82].A feedback vertex set is a set of vertices whose removal from a graph resultsin an acyclic graph. The decision problem asks if there is a feedback vertexset of size at most k.Many of the above problems can also be stated as graph modificationproblems: for example, finding a maximum clique is equivalent to findingthe minimum number of vertices that can be removed to leave behind aclique.An edge edit is the operation of either adding or deleting an edge. Alarge class of graph modification problems revolve around edge edits, askingif there is a set of at most k edge edits in order to turn a given graph into81.3. Graph Classesone of a type-C, where C is some set of graphs having a desired property.This edge editing problem is usually called the C-editing problem. Whenrestricting the edge edits to edge deletions, the corresponding problem isusually referred to as a C-deletion problem. Similarly, the C-addition orC-completion problem only adds edges in order to obtain a graph of type C.A conceptually simple way to extract the dense clusters of a networkis to modify a given graph with the fewest number of edge edits in orderto leave behind a collection of disjoint cliques. This is precisely the clusterediting problem (definitions and discussion of cluster graphs appear in Sec-tion 1.3.1). Cluster editing and deletion has been studied extensively in thescope of FPT algorithms, and discussion of this will be revisited a numberof times in this thesis.Yannakakis shows that vertex-deletion problems to many types of targetstructures are NP-hard [148]. Elmallah and Colbourn give hardness resultsfor many edge-deletion problems [43].When the graph property of belonging to a class C can be characterizedby a finite list of forbidden induced subgraphs, it was shown by Cai [19] thatthe graph modification problem allowing up to k edge and/or vertex editsis fixed-parameter tractable.As we define several graph classes in the next section, we will not onlydiscuss their defining properties and characteristics but we will also discussthe current state of their associated graph modification problems.1.3 Graph ClassesA central concept that is used throughout this thesis involves specialgraph classes which are sets of graphs defined with a particular structure.The study of graph classes comes from the fact that modeling certain appli-cations on graphs often implies an inherent structure in the resulting graphwhich can be exploited.For example, if we are given the task of assigning rooms to n seminars,each of which has a specified start time and end time, then we can modelthis as a graph colouring problem by the set of all seminars being the vertexset and two seminars are adjacent if they have an overlap in their timeinterval (a time conflict). Then a colouring of these vertices corresponds toa room assignment of the seminars, each colour class having no edges (notime conflicts) within it. A minimum colouring corresponds to a solutionusing the fewest number of rooms.Karp’s consequences [82] showed that finding the chromatic number is91.3. Graph ClassesNP-complete (in fact, it showed that simply determining whether a givengraph can be coloured with 3 colours is itself an NP-complete problem).But our room-assignment problem can be solved in polytime due to the factthat the resulting graph obtained from intersections of intervals on a timeline has a special structure. Note, for example, that it is impossible forgraph obtained in this way to contain a chordless cycle of size 4 or more.The resulting graphs are called interval graphs and it can be shown that agreedy method can optimally colour an interval graph in linear time.Let H be some fixed graph. If a graph G does not contain an inducedsubgraph which is isomorphic to H, then G is called H-free. The discussionabove states that if a graph G is an interval graph, then G is Ck-free foreach k ≥ 4. There are many other configurations that interval graphs donot contain as well.A graph class C is called hereditary if it has the property that wheneverG ∈ C, every induced subgraph H of G is also in C. The property of beingH-free is a hereditary property. Every hereditary graph class has a forbiddeninduced subgraph characterization, although the list of forbidden inducedsubgraphs may not always be finite.1.3.1 Cluster GraphsA cluster graph is a disjoint collection of cliques. That is, every connectedcomponent is a maximal clique.If an edge xy of a graph denotes a symmetric relation between x and y,then the graph on a set of vertices is a cluster graph if and only if the relationis transitive. Cluster graphs have been used in a variety of applicationswhenever clustering of objects is studied or when consistent data is soughtamong noisy or error-prone data.A graph G is a cluster graph if and only if G is P3-free. Note that thecluster-completion problem in which one is allowed to only add edges to agiven graph is polytime solvable since the only solution would be to add allpossible edges within each connected component.The cluster-deletion problem is NP-complete [130] [111] and hard toapproximate [130].The cluster-editing problem has been proven hard several times indepen-dently and in different contexts [91] [130] [5]. Both these problems were againproven to be NP-hard in [86] where they were further showed to remain hardfor bounded-degree graphs and studied under alternate parameterizations.Indeed, as cluster graphs give a simple and fundamental way of partition-ing a network into clusters, the corresponding problems have been studied for101.3. Graph Classestheir algorithms and alternate parameterizations as well. The fastest-knownruntime for cluster-editing has been repeatedly improved [65], [121], [121], [71], [22],with the current best runtime standing at O(1.62k+m) [10]. Cluster-deletioncan be solved in O(1.415k + n3) [12].For some applications where enforcing cliques to be completely disjointfrom one another is too stringent, generalizations to clusters graphs havebeen studied. For instance, allowing two cliques to share at most t verticesor t edges was studied in [35]. Another is that of relaxing each component’sstructure from a clique to a k-plex [72]. The P3-free characterization ofcluster graphs allows for other natural generalizations, such as enforcing P4-freeness or more generally S-free graphs where S is any collection of graphswhich contain a P3.1.3.2 Quasi-Threshold GraphsA rooted forest is a disjoint union of trees, each tree having a designatedroot vertex. The root should be thought of as a “top level” node, its childrenas second-level nodes, and so on. Define reachability in a tree between twonodes to mean that u reaches v (or v is reachable from u) if and only ifthe unique path from the root vertex to v passes through u. That is, v isreachable from u if u is an ancestor of v (equivalently, v is a descendant ofu). The comparability graph of a rooted forest is a new graph with the samevertices as the rooted forest such that an edge is joined two vertices if oneis reachable from another.Figure 1.1: Wolk’s characterization for quasi-threshold graphs states that ifthere are four vertices having at least those edges shown in the graph on theleft, then it must be that one of the extra edges – shown in the situationson the right – must exist. The dashed line in the figure represents the factthat there may or may not be an edge there.Wolk [144] introduced the concept of comparability graphs of trees andgave conditions on a simple undirected graph G which imply that G can beoriented in the directed tree structure described. Figure 1.1 depicts Wolk’sformulation of the structural characterization for these graphs, and it is111.3. Graph Classesnot hard to confirm that this is essentially equivalent to stating that therecannot exist an induced subgraph isomorphic to a P4 or a C4 in the graph.The (P4, C4)-free graphs are now known as the class of quasi-thresholdgraphs2 [25], trivially perfect graphs3 [63], comparability graphs of trees [144],or arborescent comparability graphs [40].A vertex v is called universal on a set of vertices S if v is adjacent toevery vertex of S except possibly itself, in the case that v ∈ S.Every quasi-threshold graph can be generated through a compositionscheme that repeatedly performs any of two possible operations: (i) If Gand H are two quasi-threshold graphs, then the disjoint union G′ = (VG ∪VH , EG ∪ EH) is quasi-threshold; (ii) if G is quasi threshold, then G+ =(VG ∪ {v}, EG ∪ {vx | x ∈ VG}) is also quasi-threshold (that is, adding auniversal vertex to a quasi-threshold graph yields a quasi-threshold graph).Reversing this composition scheme says that every connected componentof a quasi-threshold graph has a universal vertex, and the removal of anyset of vertices - which is equivalent to taking an induced subgraph - also hasthe property that each of its connected components has a universal vertex.A generative process for quasi-threshold graphs which adds one vertexat a time also exists. This generation scheme is used by Chu [23] to decide,in linear time, whether a given graph is quasi-threshold and in the casethat it is not, the algorithm will produce a P4 or a C4 to certify this fact.The generative process uses one operation: add a new vertex v adjacent toany existing vertices u1, . . . , ut (possibly none) and further attach v to theconnected component of each ui. To use this as a recognition algorithm, wenote the a vertex of largest degree can always be assumed to have been thelast vertex added to a graph in this generation scheme, and a lexicographicBFS search can verify all the necessary adjacencies in linear time.1.3.3 CographsMany graph properties can be deduced from looking at individual con-nected components of a graph separately. In this sense, a graph decomposes2The term quasi-threshold comes from the fact that these graphs generalize thresholdgraphs, which are graphs created from weighted vertices, and two vertices u and v arejoined by an edge if and only if the sum of the weights of u and v is above a given fixedthreshold value. Threshold graphs are equivalently characterized as (P4, C4, C4)-free.3A graph was defined to be trivially perfect if every maximal clique intersects themaximum independent set of the graph. The name derives from the fact that these areeasily shown to be a subclass of perfect graphs which are an important class in algorithmicgraph theory. It turns out that a graph is trivially perfect if and only if it is (P4, C4)-free.121.3. Graph Classesinto its connected components. Some properties are further maintained un-der a decomposition into connected components of the complement of thegraph. When a graph G can decompose completely via connected compo-nents either in G or connected components in the complement G, the graphis said to be a complement-reducible graph or a cograph. Cographs havebeen another important class of graphs to the development of algorithmicgraph theory. The decomposition scheme defining the class of graphs hasbeen generalized to apply to general graphs in the form of the modular de-composition and further as the primeval decomposition and homogeneousdecomposition.While it is not obvious from the above definition, cographs can also becharacterized as P4-free graphs. These serve as a generalization to P3-freegraphs in that they allow P3s to exist in the graph provided the P3 does notextend further to a P4.Cographs can be recognized in linear time [31]. The vertex deletionproblem for cographs is NP-complete [94], and the edge deletion problemis also NP-complete [43]. Since cographs are self-complementary, the edgedeletion problem is equivalent to the edge addition problem for this class.The problem of edge editing (allowing for both deletions and additions) hasalso been determined to be NP-complete [96].The finite forbidden induced subgraph characterization for cographs im-plies that the above-mentioned NP-hard problems are fixed-parameter tractable.The problems of modifying a graph to a cograph has also been studied inthe context of kernelizations [69].Cographs have been generalized in a number of ways: P4-sparse graphswere defined in a way that generalizes the forbidden induces subgraphs [78]while still being perfect graphs and recognizable in linear time [79], but theyalso generalize cographs through a decomposition scheme [80]. These havefurther been generalized to (q, t) graphs, which are those graphs for whichevery set of q vertices induces at most t P4s. Cographs are (4, 0)-graphs andP4-sparse are (5, 1)-graphs. Cographs are also exactly the graphs of cliquewidth 2 [32], and bounded clique width graphs are known to have importantalgorithmic properties.1.3.4 P4-sparse GraphsOne generalization to the class of cographs is created by allowing P4s toexist in a graph but in restricted amounts. Hoa´ng [78] introduced P4-sparsegraphs to be those for which every induced subgraph on five vertices inducesat most one P4. This immediately implies a forbidden induced subgraph131.3. Graph Classes555 55 55 55 5 5 55 5 55 55 5555 5555 Figure 1.2: The forbidden induced subgraphs for P4-sparse graphs.characterization which restricts any subgraph of five vertices inducing twoor more P4s. We include these graphs in Figure 1.2.A special graph structure called a spider [79] commonly occurs in graphclasses of bounded cliquewidth. We define two types of spiders here:Definition 1.1. A graph G = (V,E) is a thin spider if V can be partitionedinto K, S and R such that:i) K is a clique, S is a stable set, and |K| = |S| ≥ 2.ii) every vertex in R is adjacent to every vertex of K and to no vertex inS.iii) each vertex in S has a unique neighbour in K, that is: there existsa bijection f : S → K such that every vertex k ∈ K is adjacent tof(k) ∈ S and to no other vertex in S.A graph G is called a thick spider if G is a thin spider. Note that thevertex setsK and S swap roles under graph complementation, that condition(i) and (ii) hold for thick spiders, and that statement (iii) changes to sayingthat every vertex in S has a unique non-neighbour in K. The sets K, S and141.3. Graph Classes5 5Figure 1.3: A thin (a) spider and a thick (b) spider with |K| = |S| = 5 and|R| = 2.R are called the body, feet and head of the spider, respectively. The edgeswith one endpoint in S are called thin legs or thick legs for thin spiders orthick spiders, respectively. Examples of spiders are given in Figure 1.3.Hoa`ng [78] defined a graph G to be P4-sparse if every induced subgraphwith exactly five vertices contains at most one P4. The following decompo-sition theorem for P4-sparse graphs was proven in [79]:Lemma 1.2. [79] Let G be a P4-sparse graph. Then at least one of thefollowing is true:i) G is disconnectedii) G is disconnectediii) G is a thin spideriv) G is a thick spider1.3.5 Chordal GraphsChordal graphs have been studied under the names rigid circuit graphs [39]or triangulated graphs. They are defined as graphs in which every cycle ofsize 4 or more contains a chord. Equivalently, chordal graphs are thosecontaining no induced cycles of size 4 or more. They have been instrumen-tal in the development of algorithmic graph theory, as lexicographic breadthfirst search (or LBFS) was first used to recognize chordal graphs in linear151.3. Graph Classestime, and LBFS has come to be a very important tool in algorithmic graphtheory. Chordal graphs also naturally give rise to the important decomposi-tion scheme known as the clique separator decomposition [134] [143], whichdecomposes graphs while maintaining certain structures.Every cluster graph (defined in 1.3.1) and split graph (defined in 1.3.6)is chordal, and in fact the class of split graphs are exactly the graphs whichare chordal and whose graph complement is also chordal. The importance ofgraph modification problems on chordal graphs has grown proportionatelyto the importance of treewidth which is a key graph invariant used heav-ily in algorithm design. Edge deleting [112], edge completing [149], edgeediting [111] and vertex deleting [94] a general graph to a chordal graph isNP-complete. These were also all shown to be fixed-parameter tractableproblems [104].For a set of sets S, an intersection graph of S is a graph where eachv ∈ V is an element of S and an edge uv is in the graph if and only if thesets u and v have nonempty intersection. An example of this is when S isa set of intervals on the real line. The resulting intersection graph of S iscalled an interval graph. Not every graph is an interval graph. One smallexample of a graph which is not an interval graph is a C4. For all possibleS, the resulting set of interval graphs is called the class of interval graphs.A chordal graph can be thought of as a generalization of interval graphs.An early characterization of chordal graphs shows that chordal graphs areexactly the graphs which are intersections of subtrees of a tree [59]. Whilethis easily shows that interval graphs are a subclass of chordal graphs, theforbidden induced subgraph characterization of interval graphs is not easyto describe [93].1.3.6 Bipartite and Split GraphsA graph is bipartite if it can be partitioned into two sets in such a waythat every edge of the graph has an endpoint in each partition. Bipartitegraphs arise from applications where vertices usually represent two differenttypes of objects. Some social networks are affiliation networks where a set ofactors are related through a set of affiliations. When an affiliation networkis created with affiliations and actors as vertices, and an edge representingwhen an actor is associated with a certain affiliation, the resulting networkhas a bipartite structure with one partition of vertices being the actors andthe other partition being the affiliations [139]. Deciding whether a graph isbipartite, and finding the vertex bi-partition, can be solved in linear time.A bipartite graph can also be described as graphs which can be parti-161.3. Graph Classestioned into two stable sets. A split graph is a graph which can be partitionedinto a clique and a stable set.Bipartite graphs do not have a finite forbidden induced subgraph char-acterization: they are the graphs with no induced odd cycles. Modifying agraph to a bipartite graph through graph edits have been studied under thename of odd cycle transversals and the first fixed-parameter tractable algo-rithm for this problem [124] is credited with the initiation of the iterativecompression method for algorithm design.Split graphs do have a finite forbidden induced subgraph characteriza-tion: a graph is split if and only if it is (2K2, C4, C5)-free, where a 2K2 isthe graph on four vertices and two disjoint edges. Interestingly, the problemof modifying a graph to a split graph is NP-complete when dealing withonly edge deletions or only edge additions [111], but it is polynomial-timesolvable when both of these operations are allowed [73]. Deciding whethera graph is split, and finding a corresponding clique and stable set partition,can be solved in linear time. This algorithm uses just the degree sequenceof the input graph to decide where to add and where to remove edges. Theedge-edit distance to a split graph is called the splittance of a graph (so splitgraphs have splittance 0).The edge editing problem to a bipartite graph is easily seen to be equiva-lent to the edge-deletion problem to bipartite graphs, since adding edges cannever help make a graph bipartite. The edge deletion problem to bipartitegraphs, known as edge bipartization, is NP-complete [147].17Chapter 2Social Communities2.1 Existing Methods for Cluster PartitioningSome community-finding methods exploit the fact that as much a com-munity should be densely connected within itself, it should not be as heavilyconnected to the rest of the network. There are several examples of defini-tions that require exterior sparsity in addition to interior density. In 1969,this idea was expressed in an LS-set which is a set S such that every vertexin S has more neighbours in S than in G− S [99].Many methods have been developed to identify dense clusters in net-works. The measure of betweenness centrality [53] has been used by [62]to identify cohesive groups. The strategy in the Girvan-Newman algorithmis to identify and remove edges of high betweenness centrality (see Defini-tion 4.3) since such edges are typically regarded as being edges that crossbetween separate communities. In the case of multiple choices of edges withequal and highest centrality, any arbitrary choice of these suffices. Upondeleting an edge, the algorithm recomputes the centrality of all edges inorder to find the next edge to be deleted. If an edge deletion results inpartitioning the graph into more connected components than there were be-fore the edge deletion, the modularity of the network is measured. Onceall edges have been removed in this manner, the step of this process whichresulted with the largest modularity score gives a natural partition of thenetwork into groups. The process runs in polynomial time and has beenshown to produce meaningful results on real networks; however, its focus onedges which are not in communities does not imply or suggest a structureof community that we are after.The Girvan-Newman algorithm is an example of a method that suc-cessively partitions a network via edge deletions. Other methods, such aspartitioning along a minimum cut of a network [150], also yield a hierar-chical decomposition of components of a network while removing multipleedges at once.182.1. Existing Methods for Cluster PartitioningAlgorithm 1: High-level description of the Girvan-Newman processfor cluster-finding.Algorithm GirvanNewman(G):Input: A Graph G = (V,E)Output: A hierarchical decomposition of Gwhile (G has at least one edge) doLet e = xy be an edge of largest betweenness centrality;G← G− e;if x and y are in different components thenRecord the new connected components;endend2.1.1 An Induced Subgraph VariationOther algorithms, similar to the Girvan-Newman algorithm, have beensuggested with betweenness centrality swapped out for another measure. Forexample, [122] observes that edges that cross dense groups are not typicallyin as many triangles as edges inside the dense clusters. Removing edges thatappear in the fewest triangles results in partitions similar to those found bythe Girvan-Newman method. These methods are also shown to generalize192.2. Cliques and BeyondLS-sets.Algorithm 2: High-level description of Radicchi et al.’s process forcluster-finding.Algorithm Radicchi(G):Input: A Graph G = (V,E)Output: A hierarchical decomposition of Gwhile (G has at least one edge) doLet e = xy be an edge involved in the fewest number of triangles(breaking ties arbitrarily, if needed);G← G− e;if x and y are in different components thenRecord the new connected components;endendMany of the methods which use a structural definition for community andseek out these structures in networks result in problem formulations whichare inherently NP-complete. Examples of this approach include clique, clan,club, and k-plex finding, and these are defined in the next section. Whilebeing NP-Complete is a computational obstruction, there are a variety ofalgorithmic techniques that can be used to extract these desirable structuresfrom networks. For instance, Integer LPs such as those in [3] offer an exactalgorithm for these problems, and also lend themselves readily to fasterapproximation algorithms. Fixed-parameter tractability (FPT) is anotheralgorithm-design technique that can sometimes be used to solve an NP-complete problem with reasonable time. Finding network clusters via clusterediting, for example, has been studied extensively in the FPT framework [11]with great success.The literature on finding cohesive subgroups of networks is vast, andmethods such as spectral methods [131] and probabilistic model-fitting [75]have been explored. Many such methods are summarized in the surveysby [50] and [127].2.2 Cliques and BeyondIn data where relationships (edges) between objects are expected to betransitive, the resulting graph that represents that relationship is expectedto arrange itself into a disjoint union of cliques. These graphs are known202.2. Cliques and BeyondFigure 2.1: The one-vertex extensions of a P3: (A) P4; (B) claw; (C) paw;(D) C4; (E) diamond.as cluster graphs and they form a hereditary class of graphs which canalternately be characterized as the P3-free graphs. We can thus say thatthis P3-free local structure on three vertices completely characterizes thenetwork structure.When data-gathering methods are incomplete, the resulting network willbe close to a cluster graph but with some missing edges. In other applicationswhere false-positives appear, the corresponding graph will contain extrane-ous edges between the implied components. The common approach takento identify the implied clusters of such networks is through graph editing:the addition or removal of as few edges as possible in order to obtain thedesired structure. The associated algorithmic problem is known as ClusterEditing.The idea of cluster editing is also known as correlation clustering [5] inmachine learning and other fields of computer science and approximationalgorithms have been designed for the problems of partitioning a networkto minimize the number of inter-cluster edges and/or maximize the intra-cluster edges.As clique components are a very stringent condition to impose on asocial collection, various generalizations to cliques have been explored in theliterature. Early attempts include that of Luce’s n-cliques [100] and Alba’sn-clans and n-clubs [2] [106]. An n-club of a network is an induced subgraphhaving diameter n. A clique is exactly a 1-club. The computational problemsof determining whether a graph contains an n-clique or n-club were shownto be NP-complete in [4]. In fact, the problem of finding a 2-club was shownto be NP-complete even on relatively simple input graphs, such as thosewhich are bipartite after deleting one vertex and graphs which are coveredby 3 cliques [74].Another generalization of cliques is the k-plex, which is a collection ofn vertices in which every vertex is adjacent to at least n− k other verticesin the collection [129]. Finding a k-plex of size n was shown to be NP-complete by [3]. Cluster graphs have also been generalized in such a way212.3. Cluster Deletionthat cliques - rather than being completely disconnected - may intersect in atmost s vertices or t edges [48]. Such graphs admit a finite forbidden inducedsubgraph characterization as well: for example, the graphs in which cliquesare allowed to intersect in at most one vertex are exactly the diamond-freegraphs, where the diamond is the one-vertex extension of a P3 depicted inFigure 2.1. Again, the structure of this class of graphs is characterized viaa characterization of its local structure.The class of cluster graphs can be generalized through its local structureby allowing P3s to exist in a graph, but forbidding some of the possibleextensions of a P3. A complete set of isomorphically-distinct one-vertexextensions of a P3 are given in Figure 2.1.2.3 Cluster DeletionCluster graphs have been used in a variety of applications wheneverclustering of objects is studied or when consistent data is sought amongnoisy or error-prone data [5].The parameterized Cluster Deletion problem takes a graph G andan integer k as input and asks whether it is possible to delete at most kedges F from G in order to make G \ F a cluster graph. This problem isknown to be NP-hard [130].Many NP-complete problems have been studied on restricted input inorder to find classes of instances for which efficient solutions to the problemcan be found. For example, the NP-complete problem Max Clique can besolved in polynomial time on perfect graphs [68], in O(n3)-time on weaklychordal graphs [77], and in O(m + n)-time on chordal graphs [59] and oncographs [31] and a host of other graph classes (for example, [26], [60], [16]).Clique remains hard on (C5, P5)-free graphs, and even on the relativelysmall class formed by the complements of square-and-triangle-free graphs [64].A general result of Cai [19] on graph modification problems implies thatCluster Deletion(G, k) is a fixed-parameter tractable problem, solvablein O∗(2k) time, where the notation O∗(f(n, k)) means O(p(n, k)f(n, k)) forsome polynomial function p (n is the number of vertices of G). This standardsearch method, finds a P3, and then branches on the two possible edge-deletions. An improved branching method was developed by Gramm etal. in [65], improving the runtime to O∗(1.77k). Later, the same authorsdeveloped an algorithm for Cluster Deletion running in O∗(1.53k) time.There have been further improvements on the above runtime using algo-rithms that rely on the property of being able to solve Cluster Deletion222.3. Cluster Deletionefficiently (i.e. in polynomial time) when exploiting certain structure. Dam-aschke [36] characterized the structure of graphs in which no edge is con-tained in three P3s and used this structure to optimally delete all remainingP3s in polynomial time. Using branching rules to reduce a general instanceto this special structure resulted in an algorithm running in O∗(1.47k) [36].This has been improved even further to O∗(1.415k) by Bo¨cker and Dam-aschke [12] using the fact that an optimal cluster deletion set can be foundin polynomial time on graphs which are formed as a clique and a constantnumber of other vertices attached to the clique.In light of these results, it is of interest to determine any significantsubclasses of input graphs on which Cluster Deletion can be solved inpolynomial time. For instance, the edge-deletion problem which asks how todelete the fewest number of edges from a graph in order to obtain a cographwas studied in [108] where improved algorithms with given by first deletingto a certain superclass of cographs, and then deleting remaining edges inpolynomial time.In this section, we present an algorithm that solves Cluster Deletionin polynomial time on cographs. This serves to fill Phase 2 of a general meta-algorithm (Algorithm 5) that is the focus of Chapter 5. We also observe thatthis result is close to the boundary of polytime solvability by noting thatCluster Deletion remains NP-hard on a slightly larger graph class. Asfar as we know, this is the first published algorithm that solves ClusterDeletion in polynomial time on a well-studied graph class.The literature on finding subclasses of graphs on which certain NP-Complete problems become polytime solvable is vast, particularly with re-spect to clique finding and vertex colouring. Graph modification problems,however, have not been extensively studied on graph classes, perhaps withthe exception of the chordal completion or treewidth problem. Chordal com-pletion is also known as the minimum fill-in problem, and examples of graphclasses on which minimum fill-in is solvable in polynomial time are chordalbipartite graphs [20], circle graphs and circular-arc graphs [84], and house-hole-domino-free (HHD-free) graphs [17].2.3.1 On the Hardness of Cluster DeletionThe proof of Natanzon [111] reduces from the NP-Complete Cliqueproblem to Cluster Deletion by first considering a general instance G ofthe Clique problem and completely joining a clique to it. We observe thatas long as the initial instance of G is NP-hard, the class of constructed graphsobtained will be hard instances for Cluster Deletion. Some example232.3. Cluster Deletionclasses of where Clique remains hard is on (C5, P5)-free graphs and on(2K2, 3K1)-free graphs.Call a vertex universal on a subgraph H if it is adjacent to every vertexin H (except itself).Lemma 2.1. Let graph G have no induced subgraph isomorphic to F , whereF is a subgraph without a universal vertex. Then the graph G+ obtained inNatanzon’s construction (G completely joined to a new clique) is also F -free.Proof. Assume on the contrary that G+ contains some induced subgraph Hisomorphic to F . Since G is F -free, H must contain at least one vertex fromthe clique joined to G. But every vertex in this clique is universal on G+and so must be universal on F , a contradiction. Corollary 2.2. Let (F1, F2, F3, . . .) be a sequence of graphs such that Fidoes not contain a universal vertex for each i. Then Cluster Deletionremains NP-hard on the class of (F1, F2, F3, . . .)-free graphs.Since each of C5, P5, 2K2, 3K1 does not have a universal vertex, it fol-lows that the constructed instances of Cluster Deletion from Natanzon’sproof are also free of these subgraphs when the original Clique instance is.Thus we have:Corollary 2.3. Cluster Deletion remains NP-hard on (C5, P5)-free graphsand on (2K2, 3K1)-free graphs.Shamir et al. [130] showed that Cluster Deletion is still hard whenenforcing that the input graph be deleted into exactly d ≥ 3 components.They also showed that when deleting to exactly d = 2 components, theproblem is polynomial time solvable.Although [87] [86] prove that Cluster Deletion is hard for graphs withmaximum degree 4, it also gives a O(n1.5 log2 n) polynomial time algorithmsolving Cluster Deletion on graphs with maximum degree 3.The above results explore the boundary of where the Cluster Dele-tion goes from being NP-hard to being polynomial time solvable. In lightof the results which use polynomial time solvability of certain subclassesin the design of general exponential-time algorithms, classifying the polyno-mial time instances of Cluster Deletion proves to be of great algorithmicimportance while additionally being an interesting investigation in its ownright.The Clique problem is known to be polynomial time solvable on perfectgraphs [68] and thus all of its subclasses. As mentioned earlier, Clique re-mains hard on (C5, P5)-free graphs and thus Cluster Deletion is hard on242.3. Cluster Deletion(a)(b)(c)(d)(e)Figure 2.2: (a) bull; (b) 4-pan; (c) fork; (d) gem; (e) co-gem; (f) co-4-pan.this graph class as well. Another class (call it Z-class) defined by forbiddingall of (C5,P5, bull, 4-pan, fork, co-gem, co-4-pan) is a superclass of cographs(see Figure 2.2). This class restricts 7 of the 10 possible ways a vertex canjoin a P4. Since every forbidden graph in the list defining the Z-class hasno universal vertex, the following lemma shows that Cluster Deletionremains hard on Z-graphs.Lemma 2.4. Clique remains NP-hard on class Z.Proof. Let G be a graph with m edges. Solving independent set on G isNP-complete. Following Poljak [120], construct graph F which replacesevery edge of G with a 3-edge path. This is called a 2-subdivision of G.Note that α(F ) = α(G) + m . So finding α(F ) will find α(G) and showsthat independent set is NP-complete on 2-subdivision graphs. Now a 2-subdivision cannot contain a C5, 4-pan, co-4-pan, bull, gem, co-fork, or aco-P5 as an induced subgraph. So 2-subdivisions are a subclass of co-Z, andso independent set is NP-complete on co-Z. Therefore clique is NP-completeon Z.4 2.3.2 Cluster Deletion on CographsA clique partition of a graph G is a partition CP = {V1, V2, . . . , Vt} ofthe vertex set such that each Vi ∈ CP induces a clique in G. Note thatCluster Deletion can be rephrased to ask for a clique partition suchthat the sum of the number of edges that join Vi to Vj over all i 6= j, is aminimum. For this reason, Cluster Deletion has also been studied underthe name of the minimum edge clique partition problem [38]. A greedy max4Using the same reasoning, the result remains true if the graphs co-C6, co-C7 and co-C8are added to Z.252.3. Cluster DeletionABCAFigure 2.3: A C4-free graph for which an optimal min edge cluster partitiondoes not contain the maximum clique in a single part of the partition.clique partition of a graph G is a clique partition {A1, A2, . . . , Ak} where A1is a maximum clique of G, and Ai is a maximum clique of G \⋃i−1j=1Aj .Note that greedily selecting maximum cliques does not necessarily yield aminimum edge clique partition in general. Figure 2.3 depicts a graph whichis a (K3, C4, C5)-free graph where choosing the unique maximum clique(which is induced by the set B ∪C in the diagram) and then the remainingtwo cliques (one induced by A and the other induced by D) yields a cliquepartition which realizes the cluster deletion solution of deleting the 6 edgesjoining A to B and the 6 edges joining C to D, for a total of 12 edgedeletions. However, A ∪ B is a clique and C ∪D is a clique, and choosingthose two cliques is equivalent to a solution to the cluster deletion problemwhich deletes the 9 edges joining B to C.The (K3, C4, C5)-free graphs is a fairly small subclass of P5-free graphs,so it is rather remarkable that we have the following theorem:Theorem 2.5. [56] If G is a P4-free graph, then the minimum edge cliquepartitions of G are precisely the greedy clique partitions of G.The proof of this theorem is not included here but can be found in [56].Although following the details of this proof is rather arduous, it results in atruly simple algorithm for solving cluster deletion on cographs.262.3. Cluster Deletion2.3.3 AlgorithmsWe present here the simple algorithm for solving cluster deletion oncographs in polynomial time, as implied by Theorem 2.5.Algorithm 3: Cluster deletion algorithm on cographs.Algorithm ClusterDeletion(G):Input: A cograph G = (V,E)Output: A set of edges S ⊆ E such that G− S is a cluster graph.S ← ∅;while |V (G)| > 0 doChoose a maximum clique C of G;For every edge cx with c ∈ C and x ∈ G− C, add cx to S;Remove C from G;endreturn S;This routine can be used to improve a bounded search tree method forsolving Cluster Deletion on general graphs. In particular, it can be usedas a realization of Phase 2 of Algorithm 5 (Chapter 5).We illustrate a trace of this algorithm performed on a cotree represen-tation of a cograph. For completion, we quickly describe cotrees here.Definition 2.6. A cotree of a cograph G is a rooted tree representation Tof G where every vertex of G appears as a leaf node of T and the internalnodes of T are labeled either 0 (for disjoint union) or 1 (for complete join).For every two vertices u, v of G, if uv is an edge then the lowest commonancestor node in T of the leaf nodes u and v is a 1 node, and is 0 in the casethat u is not adjacent to v.The 0-nodes can be thought of as a disjoint union of all the subgraphsrooted at the children of the 0-node. The 1-nodes can be thought of as acomplete-join between all the subgraphs rooted at the children of the 1-node.See [31] for a linear time algorithm for constructing a cotree of a cograph.To obtain a maximum clique of a cograph, we process the tree bottom-up,by first labeling each node with weight 1, and then every 0-node is labeledwith the maximum weight of its children and every 1-node is labeled withthe sum of the weights of its children. To make the cotree data structuremore conducive to our need of finding maximal cliques, let a 0-node pointtowards its children achieving the maximum weight among all its children.The size of a maximum clique in a cograph is then the label assigned to272.3. Cluster DeletionFigure 2.4: A cotree (left) and its corresponding cograph (right).the root. To find the nodes of the maximum clique, from the root one cantraverse down each branch from a 1-node and just a single branch that isoriented away from each 0-node, and all leaves reached are the nodes of amaximum clique.A cotree can be updated when vertices are deleted from a cograph simplybe deleting the corresponding leaf nodes of the cotree. When an internalnode N of the cotree has 0 children, N can also be deleted. If N has a singlechild which is a vertex (leaf) x, then x can be made a child of the parentnode of N and N can be deleted. If N has a single child which is anotherinternal node N ′, then the children of N ′ can be made children of the parentnode of N and both N and N ′ can be removed.Since the internal nodes will always have 2 or more children, the numberof internal nodes of a cotree will always be less than the number of leafvertices, and so computing the maximum clique size labels is a linear timeprocess.Figure 2.5 shows the process of removing the maximum clique and reduc-ing the cotree from a graph on 11 vertices and 45 edges. The first maximumclique found is of size 7, thus containing(72)= 21 edges in it. Once themaximum clique is extracted and the cotree is reduced, we find a clique ofsize 4 containing 6 edges. This implies the minimum size of a set of edgessolving Cluster Deletion on this graph is of size 45− 21− 6 = 18.When using a bounded search tree techniques (see Chapter 5), we boundand prune the search tree based on the numerical size of the solution, andso computing the number of edges not involved in the greedily-extractedmaximum cliques (by arithmetic) is all that is needed to use this algorithmas a subroutine for a general search. Constructing the actual edges in thesolution set would require multiple adjacency tests, each taking O(log (n))time.282.3. Cluster DeletionFigure 2.5: (a) The original cotree with labels on the internal nodes indicat-ing the max clique size below it. (b) Identifying the nodes of a maximumclique from the root by following oriented edges downward. (c) The cotreeafter the vertices of the maximum clique are removed. (d) The cotree af-ter removing interval nodes with 0 children. (e) The cotree after removinginternal nodes with 1 child.292.4. Quasi-Threshold Graphs as CommunitiesFigure 2.6: Two equally-weightedoutcomes of modifying a graph toa closest (P4, C4)-free graph.Figure 2.7: A community satis-fying Freeman’s transitively over-lapping clique condition (left) thatis not hereditary. Removing thetwo filled vertices yields a graph(right) which no longer satisfiesthe definition.2.4 Quasi-Threshold Graphs as CommunitiesWe define a new community structure by generalizing cluster graphsthrough local structure. Rather than each community being a clique as inthe case of cluster graphs, we relax that definition so that our components arefamily-like structures. As in the case of cluster editing where communitiesare found by finding a closest P3-free graph, we find familial groups of anetwork by finding a closest (P4, C4)-free graph.Definition 2.7. The familial groups of a network G are the connectedcomponents of a closest (P4, C4)-free graph.The measure of what a “closest” network can vary, but following theparadigm of the cluster editing problem, we shall use the measure of totaledge additions plus edge deletions. These are collectively referred to as edgeedits.Since there are several ways to “destroy” a P4 or C4 with edge edits, theresulting decomposition may not be unique (but this is a reality in P3-editingfor correlation clustering, and many other community-finding methods aswell). An example of two different outcomes of editing a given graph withan equal number of edge edits is shown in Figure 2.6. Under a framework ofweighted modifications, for instance, a cost of α for adding an edge and β fordeleting an edge, one could weigh one decomposition better than another.For instance, if we were interested more in seeing how a network decomposesinto groups, we would set α > β, and vice versa if we were more interested in302.4. Quasi-Threshold Graphs as CommunitiesFigure 2.8: (left) A quasi-threshold graph; (center) A tree arrangement ofthe graph on the left; (right) The comparability tree of the quasi-thresholdgraph on the leftseeing how individuals in a community are organized. Weights on individualedges with perturbations can ensure a unique optimal decomposition intofamilial groups if desired. Here, we only consider α = β = 1 and we areinterested in modifying networks using as few modifications as possible.2.4.1 Properties of Familial GroupsIn the following subsections, we give support to the idea that a quasi-threshold graph (a (P4, C4)-free graph) is an ideal structure for networkscomposed of family-like or hierarchically-organized communities.Hierarchical Representation of Familial GroupsQuasi-threshold graphs have many characterizations and properties (seeSection 1.3.2 for details) but we will be mainly interested in the forbid-den induced subgraph characterization and the rooted tree representationof quasi-threshold graphs, which we describe here.Every quasi-threshold graph G can be arranged into a forest-like struc-ture (a set of tree-like structures) in which every vertex is adjacent (in G) toevery descendant in the tree. In particular, the root of a tree T is adjacent(in G) to every vertex in T , and there does not exist an edge joining twovertices in separate trees. An example of a quasi-threshold graph and itsassociated comparability tree are given in Figure 2.8. Note that every leafin a tree is adjacent to all of its ancestors and that every set of vertices alonga root-to-leaf path forms a maximal clique of the graph.In the graph in Figure 2.8, vertex 5 is a universal vertex (a vertex adja-cent to every vertex). It is the root of the associated tree. The rest of thevertices form two connected components: {4} and {1, 2, 3, 6}. In the compo-nent {1, 2, 3, 6}, vertex 2 is universal and is the root of the subtree consisting312.4. Quasi-Threshold Graphs as Communitiesof vertices 1,2,3, and 6. The other subtree consists of a single vertex 4. Inthe subtree rooted at 2, the positions of 1 and 6 can be interchanged witheach other arbitrarily because they are structurally equivalent. In this ex-ample, the whole network is a familial group. If vertex 5 is removed from thenetwork, then we will have two separated smaller familial groups {1, 2, 3, 6}and {4}. In turn, after removing vertex 2 in the group {1, 2, 3, 6}, we willget two even smaller familial groups {1, 6} and {3}.Quasi-threshold graphs are natural structures that arise from modelingcertain applications. For instance, if a graph is created on a set of speciessuch that an edge is drawn between two species if and only if they have anancestor/descendant relationship, then the graph created will form a quasi-threshold graph if the information obtained was error-free. Another exampleis that of a corporate structure in which every employee (except one) has adirect supervisor, and that commands can be passed to an employee fromher supervisor or her supervisor’s supervisor, etc. When an edge is joinedbetween any two individuals on which a command can pass, the resultinggraph is a quasi-threshold graph.Familial Groups as Robust CommunitiesFreeman [54] gave a definition for social community which uses the ideaof clique overlaps. Two cliques overlap if they intersect in at least one vertex.The definition [54] can be summarized as follows: a set of maximal cliquesC1, C2, C3, . . . , Ck which induces a connected graph forms a community ifthe cliques Ci overlap transitively. That is, for any three cliques Ci, Cj , Ck,if Ci overlaps Cj and Cj overlaps Ck, then Ci and Ck must also overlap.Freeman rationalized his definition by stating that an individual should becontained in a single community (that is, a network should decompose intodisjoint communities), that it generalized cliques, and that it is applicable tonetworks in which only relationships (of unknown strength) between pairsof individuals was known. That is, his definition applies to undirected andunweighted graphs.We will enforce a level of robustness to this definition of community tocreate a tighter definition of community. The removal of any vertices froma graph G leaves behind a graph H which is an induced subgraph of G.The robustness we impose can be stated as follows: a set of vertices S willform a familial group if S and every connected induced subgraph of S sat-isfies the above transitively-overlapping clique property. Socially speaking,the community remains intact if the removal of any number of individualsleaves the group connected. Or, in the case that some “important” individ-322.4. Quasi-Threshold Graphs as Communitiesuals leave the community and disconnect it, then the remaining connectedcomponents will themselves form smaller communities. An example of acommunity which satisfies Freeman’s transitively-overlapping clique prop-erty, but not hereditarily, is depicted in Figure 2.7.We show that simply requiring Freeman’s transitively-overlapping cliquecondition to be hereditary yields a formulation of social community whichexactly corresponds to connected (P4, C4)-free graphs.Theorem 2.8. A connected set S of vertices satisfies Freeman’s transitively-overlapping clique condition in every connected induced subgraph if and onlyif S induces a connected (P4, C4)-free graph.Proof.If S satisfies the transitively-overlapping clique condition for every inducedsubgraph, then it cannot contain an induced path on 4 vertices abcd sinceeach edge is a maximal clique while ab overlaps with bc and bc overlaps withcd, but ab does not overlap with cd. Similarly, it cannot contain an inducedcycle on 4 vertices abcda for the same reason. So any graph satisfying thetransitively-overlapping clique condition must be (P4, C4)-free.Conversely, if a connected graph S is (P4, C4)-free, it must have a vertexu which is adjacent to all other vertices in S [146]. Since there is such auniversal vertex u in every connected component of a (P4, C4)-free graph,every maximal clique in a connected component must include u, and so allmaximal cliques in the connected component overlap, at least on vertex u.Consequently, the cliques overlap transitively. Familial Groups as an Extension of Triadic ClosureSome sociometric data not only measures when two objects are related,but also measures the strength of the tie between them. In 1973, Granovet-ter [67] formulated:The Weak-Tie Hypothesis: if a is strongly tied to b and a is stronglytied to c, then it is more likely than not that b and c are at least weakly tiedto each other.Granovetter observed that the weak-tie hypothesis can be used to assertthat the most unlikely triad to appear in a social group is when a is stronglytied to b and a is strongly tied to c, while b and c have no social relationbetween them. He goes on to propose a graph edit operation called triadicclosure which adds at least a weak tie between b and c. In the framework ofunweighted edges, this is exactly the condition that a social group is P3-freeas discussed previously.332.5. Hardness of Finding Familial GroupsWe generalize the forbidden restriction on triads to forbidding certainconfigurations on 4 nodes, the P4 (which contains two induced P3s) and theC4 (which contains four induced P3s).An intuitive argument further supports the restriction of long inducedpaths or cycles from social communities. A close-knit community shouldhave relatively low diameter, and the existence of two vertices of geodesicdistance d from each other would imply the existence of an induced path oflength d. Thus a social community should be Pd-free from some relativelysmall value of d. An argument against the existence of induced 4-cycles incommunities is that if a is tied to b, b to c, c to d, then d tied to a, it ishighly likely that a will get to know c or b to know d. That is, ac and bd arehighly-likely chords in the cycle abcda. As such, it is reasonable to expectsocial communities to be Pd-free and C4-free for relatively small values of d.While we will be concerned with (P4, C4)-free graphs here, larger andmore relaxed communities could be identified if the focus is changed to(P5, C5)-free graphs or (P5, C4, C5)-free graphs.Friedkin’s Horizon of ObservabilityIn 1983, Friedkin [55] studied the observability between two actors of anetwork, loosely defined as whether one actor knows what the other actoris doing. He writes that in a social control system, observability is a prereq-uisite for control and that for nonadjacent nodes in such a network, controlmust be indirect and can exist if and only if an observer can gain informa-tion from a node some distance away and that reactions can be somehowtransmitted through intermediate nodes.Friedkin states two hypotheses: (i) that observability declines with dis-tance between two nodes, and (ii) likelihood of observability between twoparticular nodes increases with the number of shortest paths between them.2.5 Hardness of Finding Familial GroupsIf G is a graph and S is a set for which S ⊆ E(G), we use G − S tomean the graph (V (G), E(G) \S). Similarly, when T is a set of vertex pairswhich are not in G, we use G+ T to mean the graph (V (G), E(G)∪ T (G)).When S and T are disjoint sets of pairs of vertices, note that G − S + Tis equivalent to G + T − S. The computational problem of finding familialgroups of a network is as follows:Problem 4. Quasi-Threshold Editing(G, k): Given a graph G and aninteger k, is there a set S of edge deletions and a set T of edge additions342.5. Hardness of Finding Familial Groupssuch that |S|+ |T | ≤ k and G− S + T is (P4, C4)-free?The quasi-threshold edge-addition problem has been studied in [70] andthe quasi-threshold edge-deletion problem in [108]. To our knowledge, thequasi-threshold editing problem has not yet been studied directly.Given any graph as input, the algorithm of [23] decides in linear time(O(m + n)) whether the input is quasi-threshold and in the case that it isnot, a P4 or a C4 will be produced. The computational status of the problemof finding the closest quasi-threshold graph (in terms of the number of edgemodifications) was stated as an open problem in [18], and then in [102], andagain in [97]. We resolve this open question by showing that this problemis NP-complete by observing some extensions to a theorem in [97].Theorem 2.9. Quasi-Threshold Editing is NP-complete.[43] proved that Cograph Deletion is NP-Complete by a reduction fromExact 3-Cover. [97] used the same construction to show that Cograph Edit-ing is NP-Complete by strengthening the proof used for Cograph Deletion.A quick description of the proof, without the details, is as follows: thereduction from Exact 3-Cover used by [97] to show that Cograph Editingis NP-complete constructs a graph G∗ which is also C4-free. The optimaledge-edit set for G∗ that destroys all P4s does not produce any C4.Since every quasi-threshold graph is a cograph, the number of edits re-quired to the closest quasi-threshold graph is at least the number of editsrequired to obtain the closest cograph.An algorithm solving quasi-threshold editing, applied to G∗, would de-stroy the P4s (and not have any C4s to worry about, as observed above) andwould thus provide a solution to the instance of Exact 3-Cover.The complete details of this proof are now provided.Proof of Theorem 2.9.We use the same construction and notation as those in [43], which wasalso used by [97].Let S = {s1, s2, . . . , sn} and C = {S1, S2, S3, . . . , Sm} be an instance ofExact 3-Cover. Since each set Si ∈ C contains three elements from S, anexact 3-cover of S would use exactly n3 sets from C, as the Exact 3-Coverproblem requires that each si is covered exactly once. We let n = 3t andr =(3t2). Construct an instance of quasi-threshold editing as follows:- Each si is a vertex, and the set S of these induces a clique.- For every Sj ∈ C, create two cliques Xj and Yj such that |Xj | = r and|Yj | = q, where q = 9(m− t)r + 3(r − 3t).352.5. Hardness of Finding Familial GroupsXXXX112233456mnsssssssXsYsY1Y2YY34Figure 2.9: An instance of Quasi-Threshold Editing when reduced from aninstance of Exact-3-Cover having S1 = {s1, s2, s4}, S2 = {s2, s3, s5} andS3 = {s4, s5, s6}.- Each of the three elements sa, sb, sc of Sj is adjacent to every x ∈ Xjand every x ∈ Xj is adjacent to every y ∈ Yj .- No other edges exist in this graph.The parameter to this instance of quasi-threshold editing is k = q3 =3(m− t)r + (r − 3t). This construction is depicted in Figure 2.9.We note that if the instance of Exact 3-Cover is nontrivial (if some siexists in at least two 3-sets) this constructed graph is not a quasi-thresholdgraph since there are many P4s, for instance, starting in some Yj , adjacentto a vertex in Xj , adjacent to si, and adjacent to another Xk. There are noinduced C4s in this graph, however.First, we prove that if we have a solution to the Exact 3-Cover instance,we can find at most k edge edits to turn this constructed graph into a quasi-threshold graph. Say that C′ is a collection of t subsets of C such that theunion of subsets in C′ is S. For every pair si and sj in S, delete the edgejoining si and sj if they do not coexist in a 3-set Sl in the solution C′. Theseamount to r− 3t edge deletions. Further, delete any edges from an Xi to Sif Si is not in C′. This adds another 3(m− t)r deletions. In total, this gives3(m− t)r + r − 3t = k edge edits, resulting in a (P4, C4)-free graph.Note that a quasi-threshold editing set of size at most k is also a cographediting set of size at most k. The same argument used in [97] can be used362.5. Hardness of Finding Familial Groupsto show that the editing set contains edge deletions only. For the sake ofcompleteness, we include the proof here.Assume we have a quasi-threshold editing set E′ of size at most k, whereE′ is a set of vertex pairs of G∗ (if uv ∈ E′ is an edge of G∗, this representsan edge deletion; if uv ∈ E′ is not an edge of G∗, this represents an edgeaddition). The modified graph G′ = (V (G∗), E(G∗)∆E′) is (P4, C4)-free,where ∆ denotes the symmetric difference of sets. Call a vertex affected if itis a vertex with at least one incident edge that was modified by the k-edgeedit set E′. Since each edge edit is incident on two vertices, there are atmost 2k affected vertices.Since |Yi| = 9(m − t)r + 3(r − 3t) = 3k > 2k, each Yi set contains anunaffected vertex. We show that the edge edit set E′ does not contain anyedge additions.Claim 1. E′ contains no edge from Xi ∪ Yi to Xj ∪ Yj .Proof.Assume there is an edge u = vivj from Xi ∪ Yi to Xj ∪ Yj , with i 6= j asXi ∪ Yi is already a clique. Then let yi ∈ Yi and yj ∈ Yj be unaffectedvertices. Since vi and vj are affected by u, they are distinct from yi andyj . It is readily seen that yivivjyj is a P4, contradicting the fact that G′ isquasi-threshold, so there can be no such edges. Claim 2. Every vertex si in G′ is adjacent to vertices of at most one Xj .Proof.Assume si is adjacent to xp ∈ Xp and xq ∈ Xq where p 6= q. From theprevious claim, xp is not adjacent in xq in G′. Let y ∈ Yp be an unaffectedvertex. Then ypxpsixq is a P4, contradicting the fact that G′ is quasi-threshold. Claim 3. If in G′ we have that si is adjacent to Xp and sj is adjacent to Xqwith p 6= q, then E′ must delete the edge sisj .Proof.Since the previous claim shows that each si is adjacent to at most one ofthe Xl, we have that si is adjacent to Xp and so cannot be adjacent to Xq.Similarly, sj cannot be adjacent to Xp. Since the first claim shows there isno edge from Xp to Xq, we have a P4 from Xp to si to sj to Xq, unless sisjis a deleted edge. Claim 4. Every set Xi has neighbour sj in the modified graph G′.Proof.The total number of edges in G∗ joining⋃Xi to S is 3rm, and Claim 2372.5. Hardness of Finding Familial Groupsimplies that the edited graph G′ has at most rn edges which join⋃Xi toS, which means that at least 3rm− rn = 3rm− 3rt = 3(m− t)r edges wereremoved by E′.Since the size of the edit set is |E′| ≤ k = 3(m− t)r + r − 3t edges andwe have described at least 3(m− t)r deletions in E′, we have at most r− 3tedits unaccounted for, which is less than the size of any Xi since |Xi| = r.Thus every Xi is adjacent to some sj ∈ S. Corollary 2.10. The number of edge edits which are not of the form sx forsome s ∈ S and x ∈⋃Xi is at most r − 3t. In particular, the number ofedge edits in⋃(Xi ∪ Yi) is less than r = |Xi|.Claim 5. If E′ is a minimum edge edit set, then E′ does not add any edgefrom Yi to sj .Proof.Assume there is some yi ∈ Yi that is adjacent to sj in the modified graphG′.Firstly, assume that there is set B ⊆ S of vertices sb such that sb isadjacent to sj in G′. Then since every Yi has an unaffected vertex y∗i , wehave a P4 = y∗i yisjsb for each sl ∈ B and so yisb would also have to existin E′. If this occurs, removing yisj and every yisb for each sb ∈ B from E′and adding the deletion sjsb would decrease |E′| by 1, contradicting the factthat |E′| is a minimum.Now we can assume sj is not adjacent to any other sb ∈ S in G′. If sj isadjacent to Xi, then the connected component containing sj in the graph G′must be the induced graph on vertex set Yi∪Xi∪{sj}. So if edge yisj ∈ E′,then this edge can be removed from E′, yielding a smaller edge-edit solution,since the graph induced by Yi ∪Xi ∪ {sj} in G is already (P4, C4)-free.On the other hand, if sj is adjacent to some Xp where p 6= i then consideran unaffected vertex yp ∈ Yp. Using a vertex xp ∈ Xp which is adjacent tosj as well as yp, we find the P4 ypxpsjyi.Finally, if sj is not adjacent to any vertex of any Xl in G′, then thereis a P4 sjyixisq in the case that Xi is adjacent to sq, or else {sj} ∪ Yi ∪Xiis a connected component in G′, and the added edge from sj to yi can beremoved from E′ yielding a better edit set. (Note that there is some xi forwhich yixi is not an edge deletion in E′ by Corollary 2.10.) Claim 6. If E′ is a minimum edge edit set of G such that the modified graphG′ is quasi-threshold with |E′| = k, then E′ has no edge additions.Proof.382.5. Hardness of Finding Familial GroupsThis follows from the previous set of claims, and we summarize thesehere.There is no edge added from any Xi ∪ Yi to any Xj ∪ Yj by Claim 1.There is no edge added from any Yi to any sj ∈ S by Claim 5. There is noedge added from any si ∈ S to an Xj since each Xi is adjacent to exactlyone sp, and these edges are in G∗ (before applying the edits E′) by Claims 2and 4. And finally, there cannot be any edges added from some si to somesj for i 6= j since all such edges exist in G∗. Claim 7. If E′ is an edge edit set of G∗ such that the modified graph G′ isquasi-threshold with |E′| = k, then we can find a collection C′ of t 3-setswhich is an exact cover of S.Proof.Recall that S = {s1, s2, . . . , sn} and C = {S1, S2, S3, . . . , Sm}. The proofof Claim 4 shows that the edited graph had at least 3(m− t)r deletions fromS to⋃Xi and so at most nr edges remain of the form sx with s ∈ S andx ∈⋃Xi.We must now describe the other (at least) k−3(m−t)r = r−3t deletions.Claim 3 deletes every sisj when si is adjacent to some xp ∈ Xp and sj isadjacent to some xq ∈ Xq with p 6= q. Since S induces r edges in theoriginal graph G∗, and we have (at least) r − 3t other deletions, we havethat S induces (at most) 3t edges in the edited graph G′, and this maximumis obtained only when S induces a disjoint union of t triangles in G′ suchthat each triangle is a triplet of vertices of S each belonging to one set inC. This partition provides a subset of Xi sets which cover each sj exactlyonce, giving our exact cover. This completes the proof of Theorem 2.9, as we have provided a polyno-mial time reduction from Exact 3-Cover to Quasi-Threshold Editing.2.5.1 Algorithms for Familial GroupsFrom the finite forbidden induced subgraph characterization of quasi-threshold graphs, the problem of modifying a graph to a closest quasi-threshold graph is fixed-parameter tractable when using either edge addi-tions or deletions or both [19]. The trivial algorithm for quasi-thresholdediting considers all possibilities of adding/deleting an edge between eachpair of vertices in a forbidden P4 or C4, and so finding a closest quasi thresh-old graph with k edits runs in O∗(6k)-time, where the notation O∗(f(n, k))means O(f(n, k)p(k, n)) for some polynomial p.392.5. Hardness of Finding Familial GroupsFigure 2.10: A graph which is 2 edge edits away from a quasi-thresholdgraph, with no single edge edit which reduces the total number of forbiddensubgraphs.The similar problem of modifying a graph to a quasi-threshold graphusing only edge deletions is throughly discussed in Chapter 5.For computational feasibility, we combined the above bounded searchtree method with greedy edge-edit choices according to the measure of count-ing the total number of induced P4s and C4s in the graph. By testing everypossible edge-addition and every possible edge-deletion, we (greedily) chosethe edge edit that resulted in the largest improvement (that is, the largestdecrease) in the total number of induced P4s plus the number of inducedC4s in the graph. Greedy choices were made until the brute-force exact al-gorithm was able to execute on the modified graph within reasonable time.We note here that only using greedy choices as described above mayresult in a process that cycles without reaching a quasi-threshold graph. InFigure 2.10, there are four P4s with an endpoint in {a, b} and the otherendpoint in {c, d}. This graph is not quasi-threshold, and can be turnedinto a quasi-threshold graph by deleting the two edges ax and bx. However,any possible single edge edit will increase the total number of obstructions,and so the above greedy method would choose an edge edit with 0 net-gainover one of the edges ax and bx as removing ax or bx would create more P4sthan it would destroy (deleting ax would destroy two P4s but introduce 4new ones: abx{u, v, w, y}, where abxS denotes the set abxs for each s ∈ S.For completion, we report the net gain of the remaining edge edits: bx, cy, dyall have net improvement of -2 for symmetric reasons to the ax removal. Adeletion of an edge joining {u, v, w} to {x, y} has score 0, as does addingan edge inside {u, v, w}. Adding an edge from any of {u, v, w} to any of402.5. Hardness of Finding Familial GroupsFigure 2.11: The degree of an actor does not determine its social rank.{a, b, c, d} will only create P4s without destroying any, so they also havenegative improvement scores. Deleting xy would destroy the 4 original P4sbut would introduce 12 new ones. Adding an edge of the form {a, b}y or{c, d}x causes a net gain of +3 obstructions. Adding an edge from {a, b}to {c, d} destroys just one P4 but adds 9 P4s, along with a C4. This showsthat an edit (such as removing ab) with net improvement of 0 would bechosen by the greedy process, and the edit can be toggled repeatedly. If werestrict ourselves from undoing an edge edit, the result can still be poor dueto deleting a large number of edges of 0 score without making progress (suchas deleting an edge of a trivial 2-vertex component).Observe also that the discussion in the previous paragraph shows thatFigure 2.10 also serves as an example that a similar cograph-editing greedyprocess may suffer from the same non-termination. We stress, though, thatour implementation utilized the greedy process only while significantly largeimprovements were being made, and that the brute-force (optimal) searchreplaced the greedy process once its computation was feasible.In the next section, we analyze a selection of social networks by comput-ing an approximate closest quasi-threshold graph with this combined searchand greedy heuristic method.2.5.2 Intra-communal RankingThe importance of individuals to a network or a subnetwork is oftenmeasured by means of various vertex centrality metrics. These range fromsimple local properties such as vertex degree to global properties such asbetweenness centrality.412.6. Case StudiesThe actors in a connected component of a quasi-threshold network nat-urally arrange themselves in a rooted tree representation. This correspon-dence can be used to extract an importance measure of each actor withinthe community. Intuitively, the root or top-most vertex of a familial groupis the most important node and the others are ranked by virtue of the factthat each node can be regarded as the root of a subtree. The size of a sub-tree under an individual will be the relevant measure of importance here,rather than a metric such as vertex degree.For instance, in the quasi-threshold community in Figure 2.11, vertex 6has degree 5 and “oversees” 2 others, while vertex 3 has a lower degree of 4but oversees 3 others. We perceive vertex 3 to have a more important rolethan vertex 6 in this community.Hence, we define the intra-communal rank of a vertex v in a quasi-threshold community to be the number of vertices beneath v in the corre-sponding comparability tree. In the case that M vertices are structurallyequivalent within the community in such a way that these M vertices alloversee d vertices beneath them in any associated comparability tree, thenthese M vertices can be given an intra-communal importance score of d +M−12 .This quantitative measure of intra-communal rank is merely one way toassign a value to a vertex that captures how important it is in its familialgroup. There are many possible ways such a measure could be defined, andan appropriate quantitative function is perhaps a topic for future research.2.6 Case StudiesWe present some example networks from the literature and the impliedcommunities from a close quasi-threshold graph we computed.2.6.1 Zachary’s Karate ClubZachary [150] studied the social relationships between individuals in auniversity karate club. The club suffered a division which split the clubinto two, and it was observed that the split very closely corresponded to amin-cut that separates the two opposing individuals of largest influence. Aminimum cut (G, s, t) is a smallest set of edges of G such that the removalof this edge set will leave nodes s and t in different components.The method of [62] for hierarchical clustering predicts roughly the samepartition that Zachary observed after the karate club experienced its so-cial fission, with the exception of vertex 3 being misclassified. The familial422.6. Case Studiespppp pppppppppppppppppppppppppppppppppp pppppppppppppppppppppppppp pppp pppppppppppppppppppppppppppppppppppppppp pppppppppppppppp pppppppppppppppppppppppppppppp11276511173310163115344202314138182225282629932272123192430Figure 2.12: (top) Zachary’s karate club network; (bottom) the quasi-threshold graph obtained after 21 edits.432.6. Case StudiesMmeThenardierThenardierMariusEponineMagnonGueulemerAnzelmaBrujonBrevetBamataboisCochepaille MotherInnocentFaucheleventChenildieuWoman1JudgeChampmathieuGribierBabetLtGillenormandMlleVaubois ClaquesousMmePontmercyMlleGillenormand CosetteGillenormandJavertPontmercyMontparnasseBlachevilleFameuilListolierFantineTholomyesPerpetueMargueriteDahliaFavouriteZephineSimpliceCountChamptercierMyrielMmeMagloireNapoleonMlleBaptistineGeborandCountessDeLoCravatteOldManEnjolrasBoulatruelleBossuetChild2Child1 GavrocheMmeBurgonBaronessTCourfeyracBahorelFeuillyCombeferreMmeHucheloupJondretteScaufflaireMmeDeRIsabeauGervais Woman2LabarreValjeanToussaintJolyProuvaire MotherPlutarchMabeufGrantaireFigure 2.13: (top) Characters in the novel Les Mise´rables. The networkis drawn in Cytoscape’s spring embedding, while the approximated familialgroups are distinguished by vertex shape and shading; (bottom) The familialgroups found from the top network. The bold dashed lines represent edgeadditions, and deleted connections are not shown.442.6. Case StudiesSN96Hook BumperSN4KringelSN9BeakTR99MN23FeatherRippleflukeDN21WaveMusJetQuasiZigFishToplessSN90DoubleUpbangMN83 ZapPLNotchKnitNumber1DN63OscarSN100BeescratchTR77ScabsGrinShmuddelTR120ThumperSN63WhitetipTSN83TR88ZipfelForkPatchbackMN105SMN5StripesTSN103DN16GallatinTR82SN89CCLWebFiveTriggerCrossMN60HaeckselJonahVauFigure2.14:(left)Networkofdolphinassociations.Femalesarecircle-shaped,malesaresquare-shapes,andthreeindividualsofunknowngenderaredepictedastriangles.Thethreemainfamilialgroupsfoundareshownwithlightshading,lightshadingwiththickoutline,anddarkshading;(right)Thecorrespondingcomparabilitytreesofthequasi-thresholdcommunitiesfoundinthedolphinnetwork.452.6. Case Studiesgroups of the karate network identified by our approach are depicted in Fig-ure 2.12 (top), where dashed lines represent edges from the network thatwere deleted and bold dashed lines represent new edges added in order tofind the closest quasi-threshold graph. The obtained quasi-threshold parti-tion groups the network into two groups, equivalent to the first two groupsproduced by the Girvan-Newman method. This network shows 215 edits,and this is optimal6.An interesting result of the tree structures revealed by our approach forthe two groups is that it predicts exactly two distinct components with rootsof vertices 1 and 34, while Zachary’s method had begun with knowing that 1and 34 are the conflicting leaders and found a minimum cut that separated 1and 34. Sub-communities of the two major communities can be identified assubtrees of the quasi-threshold tree. Consider the removal of vertex 1: thisleaves subtrees of {12}, {5, 6, 7, 11, 17}, {2, 3, 4, 8, 13, 14, 18, 20, 22}, whichimply overlapping subcommunities when vertex 1 is regarded as a memberof each of these subcommunities. We observe the similarity in the results im-plied by the dendrogram of Girvan and Newman, identifying a second-levelcommunity of {5, 6, 7, 11, 17} as well, and vertex 12 quickly being separatedfrom the remaining network. The larger of these three further decomposesinto overlapping subcommunities when looking under vertex 2, and thesecommunities are {1, 2, 18}, {1, 2, 22}, {1, 2, 20}, and {1, 2, 3, 4, 8, 13, 14}.2.6.2 Communities in the Les Mise´rables Network andCharacter ImportanceLes Mise´rables is a 19th-century novel by Victor Hugo containing 5 parts(or volumes) broken into 70 chapters. A network of 77 major and minorcharacters in the novel was constructed in [85] by joining two individualswith an edge if they exist in a chapter together.Figure 2.13 shows the Les Mise´rables network and the computed familialgroups. The familial groups found are distinguished by node shape andshading. The quasi-threshold graph obtained, after 64 edge edits, consistsof three large nontrivial components and several smaller ones. The predictedleaders (roots) of these three components Jean Valjean, Marius Pontmercyand Fantine with implied intra-communal scores 27, 19, 10 (respectively),are key characters in the novel as is witnessed by the fact that their names5We would like to thank Yarko Senyuta for noticing the error published in [110] whereonly 20 edits are reported.6We thank Yarko Senyuta for verifying the optimality of this solution by showing thatno solution exists in a bounded search tree of depth 20.462.6. Case Studiesare titles to 3 of the 5 volumes. This quasi-threshold graph correctly isolatesonly minor characters into trivial groups.2.6.3 Lusseau’s Dolphin NetworkLusseau [101] studied a population of dolphins over a period of 7 years,building the social network depicted in the left-side network of Figure 2.14by joining an edge between two dolphins if they were observed togethersignificantly more often than was statistically expected. The communitystructure of this network was studied in [113], where the main communitywas identified as predominantly female and the male community split intotwo upon a temporary disappearance of several individuals.Using 75 edge edits, our closest quasi-threshold graph found for the dol-phin network is shown in the right-side of Figure 2.14. Our familial groupingsupports the observed communities: it shows three main groupings, one ofwhich is almost entirely female while each of the other two are mostly male.The remaining 15 dolphins are shown in the network as white nodes.The number of small or trivial components in the edited dolphin networkseems to be uncharacteristic of these edited networks, which may be con-cerning at first glance. But according to Lusseau7, these bottlenose dolphinsdo not form hierarchies the same way as other social systems.2.6.4 Grassland SpeciesThe top network of Figure 2.15 is a network of grassland species interac-tions built in [37], and its hierarchical community structure was analyzed in[27]. The network contains 1007 induced obstructions (P4s or C4s) and weproduce a quasi-threshold graph that is 34 edge edits away from it, depictedon the bottom of Figure 2.15. Each node corresponds to a type of organismsuch as plants (circle-shaped nodes), plant-eating organisms (square-shapednodes) and parasitic organisms (the rest of the nodes.)Interestingly, the root node of every non-trivial familial group was foundto be a herbivore. It was found in [27] that several sets of parasites weregrouped together not because they fed on each other but instead becausethey all fed on the same herbivore. Our familial groups strongly show thatthe herbivores play central roles in the organization of these species.7Personal communication, May 30, 2013. “Socially though we do not have hierarchyformation like you might have in other types of social systems. The dispersal processes arealso a bit different to what you might expect in our (human) notions of familial groups...”472.6. Case Studies23391565985227831305419557557621634162424340458044751664244714 1334353638372736133 4818 1249 741152537242951 1 325082 6682565 672058568677816046173779Figure 2.15: (top) Network of grassland species. Node cate-gories are: plant(circle), herbivore(square), parasitoid(triangle), hyper-parasitoid(diamond), and hyper-hyper-parasitoid(hexagon). (bottom) Thecorresponding familial groups found after 34 edge edits. Bold dashed linesare edge additions and deleted edges are not shown.482.6. Case Studies2.6.5 College Football Network[62] gives a network joining two American college teams together if theyplayed against each other during the year 2000 football season. Evans writesthat the data is likely the 2001 season [45], and corrects some of the con-ference assignments in the data8. In that football season, the 115 teamsare grouped into 11 conferences, with a 12th group of independent teams.Teams are usually matched against their conference-mates, an average ofabout 7 games against teams within their own conference and 4 games out-side of conference. Girvan and Newman extracted the community structureof this network and found a near-match to the expected partitions definedby conferences.The network began with 613 edges, and our greedy method made 255edge edits on the network to arrive at a (P4, C4)-free graph.Quasi-threshold editing found exactly 12 connected components, almostperfectly matching the 12 conference groups as labeled by Evans. A tableof the familial groups found is given in Table 2.1. The numeric groupingsin the table correspond to the connected components found. The left andright icons in each row describe which conference that team is assigned toas given by the Newman dataset and the Evans dataset, respectively.Figure 2.17 illustrates the intra-communal ranking of the teams in group6, according to the discovered structure. The large score of Akron (the root)suggests that group 6 corresponds to the conference containing Akron, whichis the Mid American conference. The relatively-high score of Buffalo is astrong suggestion that Buffalo belongs to the same conference as Akron. Thevery low scores of 0 for Central Florida and Connecticut tell us that althoughthe structure of the scheduling that year seems to associate those two teamswith the Mid American conference, these associations are very weak, evenweaker than the ties for the other leaf-node teams of larger depth. Thefour teams ranked with 8.5 (Toledo, West Michigan, Miami Ohio, CentralMichigan) were found to be equivalently structured in the community and sothe placement order of those 4 teams in the comparability tree is arbitraryamongst each other. Similarly: Marshall, Ohio and Kent were found to bestructurally equivalent, as were Northern Illinois, East Michigan and BallState.To illustrate how the scores are determined, Buffalo scores 12 becausethere are exactly 12 nodes below it in this tree (and in every tree obtainedby permuting the order of any of the structurally-equivalent nodes). Toledo,8We thank one of the referees for bringing to our attention the corrected conferenceassignment made by Evans.492.6. Case StudiesFigure 2.16: (top) The football network drawn with yEd’s organic layout;(bottom) The corresponding familial groups found after 255 edge edits. In-terestingly, exactly 12 components were found, mostly corresponding to the12 conferences that partitioned the teams.502.6. Case StudiesFamilial Groups Found in the Football Network1◦ ◦ Clemson5 Alabama Birmingham9/ ⊕ North Texas◦ ◦ Wake Forest East Carolina ⊕ Utah State◦ ◦ Maryland Houston / ⊕ Arkansas State◦ ◦ North Carolina State Louisville J ⊕ Boise State◦ ◦ Florida State Memphis / ⊕ Idaho◦ ◦ Virginia Southern Mississippi / ⊕ New Mexico State◦ ◦ Georgia Tech Tulane10♥ ♥ Arkansas◦ ◦ Duke Army ♥ ♥ Auburn◦ ◦ North Carolina Cincinnati ♥ ♥ Alabama2• • Miami Florida6 Marshall ♥ ♥ Florida• • Virginia Tech Northern Illinois ♥ ♥ Kentucky• • Boston College Western Michigan ♥ ♥ Vanderbilt• • West Virginia Akron ♥ ♥ Mississippi State• • Syracuse Ball State ♥ ♥ South Carolina• • Pittsburgh Bowling Green State ♥ ♥ Tennessee• • Temple Buffalo ♥ ♥ Mississippi• • Rutgers Central Michigan ♥ ♥ Georgia Navy East Michigan ♥ ♥ Louisiana State Notre Dame Kent11J J Hawaii3. . Michigan State Miami Ohio J Texas Christian. . Indiana Ohio J J Fresno State. . Northwestern Toledo J J Rice. . Wisconsin Central Florida J J Southern Methodist. . Michigan Connecticut J J Nevada. . Iowa7? ? Oregon State J J San Jose State. . Purdue ? ? Arizona State J J Texas El Paso. . Ohio State ? ? California J J Tulsa. . Minnesota ? ? UCLA12J Louisiana Tech. . Illinois ? ? Arizona / Louisiana Monroe. . Penn State ? ? Washington / Middle Tennessee State4I I Missouri ? ? Oregon / Louisiana LafayetteI I Oklahoma State ? ? StanfordI I Baylor ? ? Washington StateI I Colorado ? ? Southern CaliforniaI I Kansas State8♣ ♣ Nevada Las VegasI I Kansas ♣ ♣ San Diego StateI I Texas Tech ♣ ♣ WyomingI I Iowa State ♣ ♣ Brigham YoungI I Nebraska ♣ ♣ UtahI I Texas A&M ♣ ♣ Colorado StateI I Oklahoma ♣ ♣ New MexicoI I Texas ♣ ♣ Air ForceTable 2.1: The 12 connected components found after greedily editing-out theP4s and C4s from the football network. The left symbol is the conferenceassignment as given in Girvan and Newman’s dataset, while the right symbolcorresponds to Evans’ corrected conference assignment. The grouping foundcorresponds almost exactly to the 12 conference groups described by Evans.The conference labels are depicted by the legend below.◦ Atlantic Coast Conference USA ? Pac 10 • Big East IA Independents ♥ SEC . Big 10 Mid American/ Sunbelt I Big 12 ♣ Mountain West J Western Athletic⊕ Big West512.7. SummaryFigure 2.17: The implied comparability tree corresponding to one particularfamilial group. The intra-communal ranking is also given for each team.This is group 6 in Table 2.1.however, is in an equivalence group {Toledo, West Michigan, Miami Ohio,Central Michigan}, and this group directly oversees 7 nodes below them.The intra-communal score of d+ M−12 with d = 7 and M = 4 gives each ofthese 4 teams a score of 8.5.2.7 SummaryThe main contribution of this chapter is a new definition for communitystructure (familial groups) based on the graph-theoretical concept of forbid-ding small induced subgraphs. We give evidence that editing-out P4s andC4s in order to obtain a quasi-threshold graph yields meaningful clusteringsin real networks.We show that the computational problem of edge-editing to a nearestquasi-threshold graph is NP-complete, resolving the open complexity statusof this problem as mentioned in [18], [102] and in [97].Familial groups are a natural and meaningful relaxation of many stan-dard structures widely-used in social community definitions such as cliquesand k-plexes and their generalizations. They are robust against the removalof network nodes, a characterization that is not guaranteed in most otherdefinitions of a social community. This robustness is in the sense that ifthe unique leader of a group is removed, the remaining individuals will ei-ther still form a familial group with a leader or will decompose into several522.7. Summaryfamilial groups, each with a respective leader.Familial groups automatically provide a ranking of the individuals withina group and a hierarchical arrangement of a group itself - a unique featurethat, to our best knowledge, no other existing model of community structureprovides. These communities also retain many of the properties that ahighly-connected group should have such as having a low diameter. In fact,the diameter of a familial group is at most 2, since the editing process ensuresa universal vertex in each connected connected.The notion of familial groups presented here can easily be modified tohandle more network information, including weighted nodes and edges, di-rected edges, and weighted edge-edit operations. The following chapter willexplore a possible way to incorporate directed and weighted edges into thetheory of familial groups.53Chapter 3Familial Groups forHierarchical Organization3.1 Historical PerspectiveIn 1994, Krackhardt [88] provided definitions for a purely hierarchicalorganization in which there is one “boss” who oversees any number of sub-ordinates, each of which possibly overseeing their own set of subordinates.Krackhardt represented these structures as directed rooted trees, where eachnode is directed toward each of its children. He observed that these “bare-bones” trees are perhaps too “fragile” and called extra edges “redundant”when they pointed from a boss to a subordinate’s subordinate. Krackhardtgave four defining properties of this structure which were then used to mea-sure how close a given directed network was to being perfectly hierarchicallyorganized.Recently, Everett and Krackhardt [46] noted that the four defining prop-erties were not completely independent from each other, and proposed mod-ifications to the scoring system to measure when a given network exhibitsthe desired or expected properties of a hierarchical organization.In 2002, independently from the works above, attempts to measure hi-erarchical organizations in networks came in from Ravasz and Baraba´si andothers [42] [1] [123] where the amount of organization in a network was hy-pothesized to be measurable through a density measure called the clusteringcoefficient.In 2007, Sales-Pardo et al. [126] comment that checking for desired (andseemingly irrelevant) properties like the clustering coefficient does not yieldan “unsupervised” method to extract these hierarchical organizations. Theyalso note that hierarchical clustering methods lack in any “sound generalcriterion to determine the relevant levels on the hierarchy.”We introduced familial groups [109] in Section 2.4 whose definition evolvedfrom imposing certain structural properties in social communities. An im-mediate benefit of the structure of familial groups was that the individuals543.2. Graph-theoretic Framework for Hierarchical Organizationin identified communities were found to naturally arrange themselves in ahierarchical organization.In this chapter, we show that familial groups can be used to define andcharacterize the structure of social hierarchy in networks. Furthermore, weextend the definitions of familial groups to handle directed and weightednetworks.3.2 Graph-theoretic Framework for HierarchicalOrganizationWe say that vertex u points to vertex v is there is a directed edge fromu to v, and we write (u, v) or uv when the context is clear. An out-tree [88]or an arborescence [136] [46] [40] is a directed rooted tree where every nodepoints to each of its children. It represents a hierarchical organization ofnodes where an individual node can - for example - pass orders or influencealong its directed edges (that is, to each of its children.) One can interpretthis hierarchical organization as a boss and his/her subordinates.It makes sense, then, that an edge can be directed from a boss to its sub-ordinate’s subordinate since demands can typically be passed down a chainof authority. Krackhardt [88] called these edges redundant9 and excludedthese edges from the study of hierarchical structures. Redundancy, however,serves as an integrity-check in that if an individual x is directed toward allof its subordinates and their descendants, it reinforces the hypothesis thatx is the root node of the hierarchical organization.Consider an arborescence T with root r. The comparability graph of anarborescence [40], G(T ), is a directed graph on the same node set as T inwhich (u, v) is an arc in G(T ) when v is a descendant of u. In particular,the root of such a tree is adjacent to and points toward every other node inthe tree. G(T ) is the transitive closure of the arborescence T , meaning thatwhenever there is a directed path from x to y in T , there is a directed edgefrom x to y in G(T ).The underlying graph of a directed graph D is the graph resulting fromD when the orientation of each edge is ignored. The class of graphs whichare a comparability graph of an arborescence is exactly the class of quasi-threshold graphs (defined in Section 1.3.2).9quoting: “More lines than [a tree] creates multiple paths and cycles between points.In a sense, these multiple paths are redundant in graph-theory terms, and they disruptthe stoic, bare-bones nature of the pure out-tree structure.553.3. Hierarchical Organization of Individuals in a NetworkTheorem 3.1. [145] A connected graph is a comparability graph of anarborescence if and only if it has no induced subgraph isomorphic to a pathon four vertices (a P4) or a cycle on four vertices (a C4).3.3 Hierarchical Organization of Individuals in aNetworkKrackhardt [88] gave four characterizing properties of out-trees:i) The digraph is weakly connected (that is, this condition only requiresconnectivity on the underlying graph).ii) The digraph is graph hierarchic, meaning that if u can reach v with adirected path, then v cannot reach u.iii) The digraph is graph efficient, meaning that it has no redundant edges.iv) Every pair of vertices has a least upper bound, that is, a closest bosswho has authority over both of them.For each of these four criteria, he gave a real-valued score function from0 to 1 for any given network, where 0 represents a high degree of violationof that criteria and 1 representing the expected structure of an out-tree. Weelaborate on the score function for each criteria below. The number of nodesin a given digraph is n.i) 1− t(n2), where t is the number of pairs of vertices which are unreachablefrom each other. This corresponds to a score of 0 if the digraph is anindependent set and 1 if it is weakly connected.ii) If the strongly-connected components of the digraph are C1, . . . , Ctthen this function is 1 −∑ti=1 (|Ci|2 )(n2). That is, every pair of verticeswhich is in a strongly connected component counts as a violation,summed over all possible pairs. This function is 0 if the entire digraphis strongly connected and 1 if it acyclic.iii) Say that a connected component Ci in the underlying graph has ninodes. Then a tree structure would expect ni − 1 edges. Let mi bethe number of edges of Ci. Then the ith component has mi − (ni − 1)violating edges. The function then is 1−∑i(mi−ni+1)∑i((ni2 )−(ni−1)). This function563.3. Hierarchical Organization of Individuals in a Networkevaluates to 0 if the underlying graph is a clique, and to 1 if theunderlying graph is a forest (i.e. a disjoint collection of trees.)iv) A violating pair is a pair of nodes, u and v, in the same weakly-connected component such that there is no x for which there is adirected path from x to u and x to v. The least upper bound x of uand v is allowed to be either u or v as well. The score function is 1− tzwhere t is the number of violating pairs and z is the total number ofnon-adjacent pairs in the same weakly connected component. That is,for each weakly connected component C with nC nodes and mC edges,z =(nC2)− mC . This function is 0 when every pair of nonadjacentnodes have no directed path joining them. For instance, if every nodeis a source or a sink, there are no directed paths of length 2 and thisfunction evaluates to 0. It evaluates to 1 when, for instance, there isa vertex which has a directed path to all other nodes in the weaklyconnected component, although this is not the only manner in whichit can score 1.Krackhardt measured these parameters in Erdo˝s-Re´nyi graphs (see Sec-tion 4.4) and observed that the four measures seemed to independently bevalued high (∼ 1) or low (∼ 0).Later, Everett and Krackhardt [46] showed that these four dimensionsof hierarchical organization are not independent of each other. Specifically,they showed that condition (ii) followed from the other three conditions.Weaker, but similar, statements of the four condition were proposed as re-placement. They also propose that in certain applications, for instance,where two-way arcs can exist, that only a certain triplet of the four condi-tions be used.The 1994 paper [88] concluded with a cautionary note that these mea-sures are very sensitive to density and that data-collecting methods may beprone to including many redundant edges, inadvertently reducing the mea-sures of hierarchical organization. This suggests that a method of analysiswhich filters-out, or at least embraces, redundancy could prove to be morefruitful in measuring organizational structure.For any directed graph G = (V,E), the transitive closure of G is thegraph H = (V,E ∪E+) where a directed edge (u, v) exists in E+ if there isa directed path from u to v in G. Adding all redundant edges, then, is akinto obtaining a transitive closure. The following observation is immediate:Observation 3.2. The transitive closure of an out-tree is a comparabilitygraph of an arborescence.573.4. Familial Groups in Directed NetworksIn general, the advantage of adding redundancy to a model is to rein-force the initial data and to further support any conclusions or predictionsmade by the model. We believe the same holds in this case of hierarchicalorganization in networks.3.4 Familial Groups in Directed NetworksUsing the method of familial groups in Section 2.4, we could attemptto extract hierarchical organization in a directed network by ignoring edgedirections, editing to a closest quasi-threshold graph, and then arranging theindividuals in the implicated tree structure and directing edges downward.But when ignoring initial edge directions, this could result in a hierarchicalstructure that is invalid or very inaccurate.Since a comparability graph of an arborescence is a directed structure(having a quasi-threshold underlying graph), instead of restricting the localstructure by P4s and C4s, we can impose restrictions on the local directedstructure and define an edit distance with respect to this.There are some situations in which it makes sense that between any twonodes u and v, only one directed edge can exist between them. That is, if upoints to v, then v cannot possibly point to u. For example, if u and v existin an extended family, u might be a descendant of v or v a descendant of u(or neither), but it is impossible for both to occur. A single directed edgecan be drawn between them (ancestor pointing to descendant) and it wouldbe meaningless to have two individuals pointing to each other.On the other hand, a trust network in which a directed edge (u, v) existsif v follows advice from u may very well contain edges (u, v) and (v, u)simultaneously. This can be made into a simple underlying graph by insteadjoining directed edge (u, v) if v follows advice from u more often than ufollowing v. In this case of following each others’ advice equally, from theirperspective, one is no higher than the other in the social hierarchy and sothe edge between them can be removed or ignored and their placing in thehierarchy will de determined by the other actors in the network.Again, depending on what desired property is being captured in thenetwork, ignoring the edge might not be an ideal approach, since u and vfollowing each other’s advice ten times per week may be more socially sig-nificant than u following the advice of w once. But in these cases wherecooperation or communication channels are of primary interest, the under-lying graph structure may be enough to maintain and analyze.With all these possibilities and considerations, we emphasize that the583.4. Familial Groups in Directed Networks(a) (b)Figure 3.1: (left) An out-tree as used by Krackhardt [88]; (right) the tran-sitive closure of the left out-tree.network model is dependent on the application it is modeling and the prop-erty that the network analyst is interested in. We elaborate on some of thesenetwork models in the following sections.3.4.1 Directed Networks with a Simple Underlying GraphIn an application where at most one directed edge can exist between apair of nodes, the familial group can be defined exactly as a comparabilitytree of an arborescence, with edges pointing away from the root vertex andtowards to leaves. Given a directed network, an edge-edit could be:i) the deletion of a directed edge of cost αii) the creation of a directed edge of cost βiii) the reversal of a directed edge of cost γWhile a reversal of an edge can be regarded as a deletion followed by acreation, it is not always appropriate to weigh a reversal edit as α+ β. Wepropose a default weighting of α = β = γ = 1 unless the application in themodel suggests otherwise. For instance, adding a redundant edge in order tocomplete the transitive graph structure may not necessarily be penalized asmuch as deleting a known connection, and so perhaps for those applications,adding an edge could have a very small (but positive) value > 0 assignedto β.Figure 3.1(a) depicts the ideal hierarchical structure as described in [88],and Figure 3.1(b) is the same out-tree shown in (a) but with all additional“redundant” edges included. Let us assign a name to this structure:593.4. Familial Groups in Directed Networks(a) (b) (c)Figure 3.2: The forbidden triadic configurations for transitive out-forestwhen the underlying network must be simple.Definition 3.3. A transitive out-tree is the transitive closure of an out-tree. A disjoint collection of transitive out-trees will be called a transitiveout-forest.Thus Figure 3.1(b) depicts a transitive out-tree. When data-collectingmethods include measurements for almost all pairs of individuals, we expectmany of these redundant edges to appear in the resulting network model.Thus we are interested in a closeness score to one of these transitive out-trees. Note that these directed structures are not exactly characterized astransitively-oriented quasi-threshold graphs, since reversing all the edges ofFigure 3.1(b) would still be a transitively-oriented quasi-threshold graphwhile not having the rooted structure. We are particularly interested in anorientation that specifically directs edges away from the root and towardsthe tree leaves.Analogous to how familial groups are defined for undirected graphs, wedefine a familial group of a directed simple graph G as a (weakly) connectedcomponent of a closest graph G′ which is a disjoint collection of transitiveout-trees.Definition 3.4. Given a directed network G with simple underlying struc-ture, the simple directed familial groups (or simply, the familial groups) of Gis the set of transitive out-trees obtained from a closest transitive out-forest.In order to edit a directed simple graph to a closest transitive out-forest,we can take a similar approach as in that for familial groups on undirectedgraphs and characterize the desired structure in terms of local structure.The transitive out-tree structure of a familial group, for instance, requiresthat if two vertices u1 and u2 point to v, then either u1 points to u2 or viceversa. The forbidden configurations of a transitive out-forest are depictedin Figure 3.2.In the framework where an edge deletion costs α, an edge addition costsβ, and an edge reversal costs γ, we can formulate the parameterized problemof finding simple directed familial groups of a directed simple graph:603.4. Familial Groups in Directed NetworksProblem 5. Transitive Out-tree Editing(G, k):Given a directed graph G = (V,E), is there a set of x edge deletions, y edgeadditions and z edge reversals such that the resulting graph is a transitiveout-forest and αx+ βy + γz ≤ k?As mentioned earlier, a natural definition of this problem would setα = β = γ = 1 unless another interpretation is suggested by the problemapplication.Since the forbidden configurations are finite in number, we can immedi-ately describe a fixed-parameter tractable algorithm to solve this.As every pair of vertices has three possible states of having an arc be-tween them (two directional arcs or no arc at all) there are 27 configurationsof a triplet in total. Although three forbidden configurations are given inFigure 3.2, when symmetries are counted, there are three forbidden config-urations of type (a), six forbidden configurations of type (b), two forbiddenconfigurations of type (c), for a total of 11 forbidden triadic configurations.A brute-force algorithm would find a forbidden triplet and branch on replacetheir adjacencies with one of the 27-11=16 valid configurations and repeat.Algorithm 4: Transitive out-forest editing algorithm.Algorithm TransOutEdit(G, k):Input: A Graph G = (V,E)Output: Yes if G can be edited to a transitive out-forest with atmost k edits, and No otherwise.while (there exists a forbidden triple of nodes) && (k > 0) doBranch on the possible ways of editing the triple into a validtriple as per Table 3.1, and reduce the parameter k accordingly;endif k ≥ 0 thenreturn Yes;endreturn No;The minimal branching rules are summarized in Table 3.1. Under theassumption that α = β = γ = 1, Algorithm 4 is fixed parameter tractableand the branching factor for each of the three branching rules are found tobe:i) T (k) = 4T (k − 1)→ T (k) = O∗(4k)ii) T (k) = 3T (k − 1) + 4T (k − 2)→ T (k) = O∗(4k)613.4. Familial Groups in Directed Networksiii) T (k) = 3T (k − 1) + 3T (k − 2)→ T (k) = O∗(3.792k)Theorem 3.5. Transitive out-tree editing using edge additions, deletionsand reversals can be solved in O∗(4k).Configuration Minimal Edge Edits Parameteru→ v and w → vdel (u, v) k k − αdel (w, v) k k − αadd (u,w) k k − βadd (w, u) k k − βu→ v and v → wdel (u, v) k k − αdel (v, w) k k − αadd (u,w) k k − βrev (u, v), add (w, u) k ← k − β − γrev (v, w), add (w, u) k ← k − β − γrev (u, v), add (u,w) k ← k − β − γrev (v, w), add (u,w) k ← k − β − γu→ v and v → w and w → udel (u, v) and (v, w) k ← k − 2αdel (u, v), (w, u) k ← k − 2αdel (v, w), (w, u) k ← k − 2αrev (u, v) k ← k − γrev (v, w) k ← k − γrev (w, u) k ← k − γTable 3.1: Branching rules for transitive out-tree editing with the addition,deletion, and reversal operations.3.4.2 Transitive out-tree editing without reversaloperationsFollowing the editing framework of [141] and [142] where the problem ofediting a directed graphs to a transitive directed graph, we consider here themodified problem of editing with only the addition and deletion operations.Note that arc reversals can be simulated with an arc deletion followed byan arc addition in the opposite direction, and so this framework does notloose expressibility power in terms of the problems it can tackle. Editingdirected graphs to make them transitive is equivalent to forbidding the con-figuration (b) in Figure 3.2. Weller et al. gave a O∗(2.57k)-time algorithmfor Transitivity Editing and O∗(2k)-time algorithm for TransitivityDeletion.623.4. Familial Groups in Directed NetworksNote that Algorithm 4 still solves the modified version of TransitiveOut-tree Editing with only edge additions and deletions allowed, butwe only have to specify a new set of minimal edits to branch on. We useα = β = 1 in Table 3.2 to describe these edits.Configuration Minimal Edge Edits Parameteru→ v and w → vdel (u, v) k k − 1del (w, v) k k − 1add (u,w) k k − 1add (w, u) k k − 1u→ v and v → wdel (u, v) k k − 1del (v, w) k k − 1add (u,w) k k − 1u→ v and v → w and w → udel (u, v), (v, w) k ← k − 2del (u, v), (w, u) k ← k − 2del (v, w), (w, u) k ← k − 2del (u, v), add (v, u) k ← k − 2del (v, w), add (w, v) k ← k − 2del (w, u), add (u,w) k ← k − 2Table 3.2: Branching rules for transitive out-tree editing using only theaddition and deletion operations.The number of nodes explored in a bounded search tree that uses theseediting rules results in a number of nodes satisfying these recurrences:i) T (k) = 4T (k − 1)→ T (k) = O∗(4k)ii) T (k) = 3T (k − 1)→ T (k) = O∗(3k)iii) T (k) = 6T (k − 2)→ T (k) = O∗(2.450k)We can also consider the related deletion-only problem:Problem 6. Transitive Out-tree Deletion(G, k):Given a directed graph G = (V,E), is there a set of k edge deletions suchthat the resulting graph is a transitive out-forest?The minimal deletion sets are summarized in Table 3.3.633.4. Familial Groups in Directed NetworksConfiguration Minimal Edge Edits Parameteru→ v and w → vdel (u, v) k k − 1del (w, v) k k − 1u→ v and v → wdel (u, v) k k − 1del (v, w) k k − 1u→ v and v → w and w → udel (u, v), (v, w) k ← k − 2del (u, v), (w, u) k ← k − 2del (v, w), (w, u) k ← k − 2Table 3.3: Branching rules for transitive out-tree edge-deletion.Clearly, the first two branching rules create O(2k) nodes. The thirdbranching rule results in a recurrence of T (k) = 3T (k− 2), with an effectivebranching factor of 1.733. Thus we have:Theorem 3.6. Transitive Out-tree Deletion(G, k) can be solved inO∗(2k) time.3.4.3 Weighted Directed FrameworkIn the case of weighted networks in which each edge or arc is associatedwith a score, we can still naturally define familial groups. Since edge weightsare usually a measure of the strength of the edge (except, perhaps, in net-works in which the weight is a distance measure) it should make sense thenthat deleting an edge should cost the weight of the edge10.In an undirected setting, familial groups would still be defined in termsof the forbidden P4s and C4s, and the weights would only be used in deter-mining the costs of editing.In the directed network setting in which data is expected to satisfy tran-sitivity, we propose that arcs can be added with a minimal cost if it satisfiestransitivity. As mentioned in an earlier section, these edges that observetransitivity are a form of data redundancy and if u→ v and v → w, then itmay make sense to allow u→ w for free, or perhaps a very small cost.An example to motivate this definitions is depicted in Figures 3.3 and 3.4.We consider 5 NHL hockey teams of one division after having played anentire 48-game season. The league consists of 30 teams divided into 2 con-ferences and each conference divided into 3 divisions. The five teams played10There are always exceptions to this model, of course. In a network in which edgesare roads or pathways and weighed with the amount of traffic that passes it, deletion orblocking a road of low traffic or of high traffic could feasibly require the same amount ofwork.643.4. Familial Groups in Directed NetworksFigure 3.3: The teams from the North-East Division of the Eastern Con-ference of the 2012-2013 NHL season. Team x points to team y if x beaty throughout the regular season more often than y beat x. Each edge isweighted by the number of more wins x had over y when playing againsteach other.all of their games within their conference that season. In Figure 3.3, wepoint u to v if u beat v more often than v beat u. The arcs are weighed bythe number of wins u had over v minus the number of wins v had over u. Iftwo teams, such as MTL and OTT, beat each other equally often, then anarc joining them would have score 0 and so it is not drawn.In editing the network into a closest directed out-tree, we find that threedeletions are required to break the directed cycles. Each of those deletionsare of weight 1. There is a set of five arcs that must be added to satisfy tran-sitivity to complete the out-tree structure, as shown in Figure 3.4. When arcadditions are allowed to be added with small cost to complete transitivityrequirements, this 3 + 5 edit set is the minimum possible solution. It turnsout that this arrangement correctly predicts the final arrangement of thesefive teams after their 48 game season: MTL won the division with 63 points,BOS had 62 points, and TOR, OTT and BUF earned 57, 56 and 48 points,respectively.There is still much work to be done in formalizing when arc additionscan be added with minimal cost. Perhaps a framework could be definedin which deletions are made to only destroy the configurations (a) and (c)from Figure 3.2, and then a graph square or even transitive closure could beapplied to add redundant arcs.We believe the discussion in this thesis shows that there is definite meritand opportunity to apply the concept of quasi-threshold graphs and tran-653.4. Familial Groups in Directed NetworksFigure 3.4: (Left) The three deletions (dotted lines) with total cost 3 todestroy transitivity obstructions. (Right) The addition of 5 edges (dashed)of consistent implication, each costing . The total weight of deletions andadditions is 3 + 5.sitive out-trees to weighted directed networks. A complete formalizationwould require an implementation and several test cases to justify the defi-nitions and we leave this as a future work.66Chapter 4Network Measures:Diameter and DistributionAs social networks are created to analyze a collection of relationships,a number of attributes of networks (or vertices or edges of a network) havebeen defined quantitatively for more meaningful and comparative analysis.We provide some of the fundamental definitions of social network attributeshere.The centrality of a vertex or an edge is a measure of how importantthat vertex or edge is to the network. This is, of course, vague and can befathomed in many different ways, and so there are many types of centralitymeasures in regular use. Perhaps the most simplest notion of vertex cen-trality is the degree centrality, which simply assigns a centrality score to avertex equal to its degree. The simplicity of this definition makes its userather limited: in Figure 4.1, the two labeled vertices have the highest de-gree centrality measure but are structurally very different. If, for instance,these nodes are individuals and an edge between two nodes represents anemail communication between them, then A could be regarded as some sortof spam machine which only communicates to seemingly unrelated nodes.On the other hand, B (with the same degree as A) communicates to a groupof individuals who also communicate among each other. This sort of qualitycan be captured by counting the number of edges in the neighbourhood ofa vertex, called the clustering coefficient.Definition 4.1. Clustering Coefficient. In a graph G = (V,E), theclustering coefficient Cv of a vertex v with open neighbourhood N(v) of sizeat least 2 is the ratio of the number of edges in N(v) and the number ofvertex pairs in N(v). That is,Cv =|{xy : x, y ∈ N(v)}|(|N(v)|2) .There is no standard definition for the clustering coefficient of a vertexof degree 1.67Chapter 4. Network Measures: Diameter and DistributionFigure 4.1: Nodes A and B are vertices of highest degree (degree = 7).Figure 4.2: Two networks, each with 6 nodes and 9 edges. The network onthe left has 0 average clustering coefficient while the network on the righthas a high average clustering coefficient.68Chapter 4. Network Measures: Diameter and DistributionIt has been observed that in real-world networks, nodes tend to clusterthemselves more than one would expect from a random collection of edges.That is, the average clustering coefficient in real-world networks is signif-icantly higher than a random graph on the same vertex set and with thesame number of edges [140].Note that the degree and clustering coefficient are strictly local propertiesin the sense that if Figure 4.1 only represents the part of a large networkto which A or B are adjacent to and that there are thousands of othernodes attached to the network depicted, the degree-values and clusteringcoefficients of A and B are unchanged. If Figure 4.1 represents the entirenetwork, then A is, in fact, an important actor since four other verticesrely on A as the only connection into the network. A centrality measurethat captures this importance of a node with respect to the entire networktopology is the betweenness centrality defined by sociologist Linton Freemanin 1977 [53].Definition 4.2. Betweenness Centrality. The betweenness centrality ofa node v is g(v) defined as:g(v) =∑s 6=v 6=tσst(v)σstwhere σst is the number of shortest paths from s to t and σst(v) is thenumber of shortest paths from s to t that pass through v.Often, betweenness centrality measures are normalized by dividing allbetweenness centrality values by the largest betweenness centrality valuesince a score of, say, 5 means very different things if 5 is the largest value inthe network or if it is overshadowed by a score of 20.In Figure 4.1 where vertices A and B have equal degree centrality, ifthe depicted graph is the entire network, then the normalized betweennesscentralities of the two labeled vertices are g(A) = 1.0 and g(B) = 0.36,indicating that vertex A is a central communication hub for the verticesaround it.A similar definition of betweenness for edges has been defined and appliedto a variety of network analysis applications, such as community-finding [62].Its definition is analogous to that of betweenness for vertices.Definition 4.3. Edge Betweenness Centrality. The edge betweennesscentrality of an edge uv in a connected graph is g(uv) defined as:g(uv) =∑s 6=tσst(uv)σst694.1. Degree Distribution and Power Lawwhere σst is the number of shortest paths from s to t and σst(uv) is thenumber of shortest paths from s to t that pass through the edge uv.Many other definitions of centrality exist to distinguish types of node oredge importance, for example page rank [51] or closeness centrality [125].4.1 Degree Distribution and Power LawVertex degrees are usually easy to calculate and visualize. The implica-tion of a node having many connections is meaningful for many aspects ofnetwork analysis, and so it may be premature to quickly dismiss the use ofdegree centrality simply because it is a local property that does not considerthe global network topology. Instead, we can gain information about thegeneral structure of a network by looking at the distribution of degrees inthe network. For a particular network, let P (k) denote the proportion ofnodes having degree k.Definition 4.4. Power Law. A graph has a power law degree distributionifP (k) ∝ k−γ for some constant γ > 0.(Typically, 2 < γ < 3). A network having a power law degree distributionis sometimes called a scale-free network.A power law degree distribution occurs when there are relatively manyhigh-degree nodes. A large variety of networks evolving from natural pro-cesses have been found to exhibit a scale-free structure, notably whenever apreferential attachment mechanism for network growth is in play (see Sec-tion 4.4).4.2 The Small-World PhenomenonIn addition to a power law degree distribution, real world networks areoften observed to exhibit a small world property. That is, two seeminglyunrelated individuals in a network are supposedly connected through a pathof relatively short length.As an example of this, the network of movie actors has roughly 1.5million nodes, where two nodes are joined by an edge if they appear ina movie together. The mathematician Paul Erdo˝s and the famous actorKevin Bacon have very little in common and are fairly unrelated. But whenincluding documentaries, Paul Erdo˝s was is the biographical film N is a704.3. Graph DiameterNumber (1993) with Tomasz Luczak; Luczak was in The Mill and the Cross(2011) with Michael York; York was in Transformers: Revenge of the Fallen(2009) along with Rainn Wilson, who was also in Super (2010) with KevinBacon. Hence Paul Erdo˝s is a distance of at most 4 from Kevin Bacon inthe network of movie actor collaboration.The small world phenomenon is often also referred to as six degrees ofseparation, hypothesizing that almost everyone coexisting in a network areseparated by a path of at most 6.As another amusing example, in the network of refereed published paperswhere individuals are joined by co-authorship, the author of this thesis isjoined to physicist Albert Einstein by a path of length 5.With the understanding that naturally-forming networks tend to have ascale-free topology and thus a significant number of hubs (vertices of largedegree), the small world phenomenon is entirely plausible, much in the sameway that two small airports s and t are likely connected by a length-3 orlength-4 path of direct flights using the hub airports closest to each of s andt for layovers.Using the definition of graph diameter from Section 1.2.3, we can rephrasethe small world phenomenon by simply saying that real-world networks tendto exhibit a diameter very small in comparison to the number of nodes (usu-ally logarithmic). An example of a graph with large diameter would be asquare grid graph (like a Cartesian grid) on n vertices. Its diameter is2√n = Ω(n0.5), which is polynomial in n. More generally, it has been shownthat a random planar graph on n vertices almost surely has a diameter ofn14+O(1) [21]. The small diameter observed in social networks is witnessedby some random models, such as the large component of an Erdo˝s-Re´nyigraph (see Section 4.4) in which the diameter is bounded by a logarithmicfunction of the number of vertices [13].4.3 Graph DiameterA small graph diameter is a fundamental trait of a close-knit community,as this allows for rapid communication or sharing of resources. Interest-ingly, terrorist organizations have used this property to conceal their plansby putting together individuals in their networks who are of relatively-fardistances away from each other. Krebs writes [90] that even Bin Laden de-scribed his strategy in organizing the World Trade Center attacks by usingindividuals in his network who did not know each other. Furthermore, thegroups to work together did not know the other groups. Krebs observes714.3. Graph DiameterFigure 4.3: A network of 19 individuals, linked by known trusted prior com-munication, who hijacked and crashed 4 planes in the 9/11 attacks. Diamondnodes crashed into the Pentagon, square nodes crashed in Pennsylvania, cir-cle nodes crashed into WTC South, and triangle nodes crashed into WTCNorth.that if a chosen individual is captured or compromised, his social distanceaway from the others minimizes the damage to the network. We also notethat if the terrorist network was known beforehand, observing a potentialgathering of those individuals who are mutually-distant may not trigger anywarnings.In order to coordinate their large project, it is known that brief meetingsbetween certain long distance pairs took place, creating temporary short-cuts in the network. Once the coordination was completed, those secretcross-ties were left to go dormant in effort to be unnoticeable.Krebs writes that 6 additional “shortcut” edges can be added to thisnetwork through several known documented meetings, and when these ex-tra edges are added to the network, the diameter of the network changedfrom 9 to 6. The original mean path length of 4.75 reduced to 2.79, withinsignificant changes to the overall edge density (from 16% to 19%) and clus-tering coefficient (from 0.41 to 0.42). Thus information flow in the networktemporarily gained a significant boost while remaining relatively covert.The selection of these shortcut edges is an interesting problem to lookat, namely, which edges can be added to a network in order to reduce thegraph diameter the most? A similar, but different, problem would ask howmany shortcut edges would have to be added in order to bring the networkdiameter down to a certain value. The latter problem is exactly Problem 3in Section 1.2.3. We show here that this computational problem is fixed-724.3. Graph Diameterparameter hard.4.3.1 Diameter Augmentation is W [2]-hardFor any two vertices x, y in a connected graph, the distance betweenthem dist(x, y) is the number of edges in a shortest path joining them. Agraph G has diameter D if dist(x, y) ≤ D for every pair of vertices x, y. TheDiameter-D Augmentation problem takes as input a graph G = (V,E)and a value k and asks whether there exists a set E2 of new edges so thatthe graph G2 = (V,E ∪ E2) has diameter D. This problem was first shownNP-hard for D = 3 [128] and was later shown to remain hard for the D = 2case [95]. Note that the case of D = 1 is trivially polynomial-time solvableas adding an edge between every pair of nonadjacent vertices is necessary.The proof in [95] reduced a restricted (but still NP-hard [58]) 3-Satproblem to a relaxed dominating set problem (which they called Semi-Dominating Set) which was then reduced to Diameter-2 Augmenta-tion. We provide a reduction to Diameter-2 Augmentation directlyfrom Dominating Set, which not only provides an alternate proof of NP-hardness but also establishes that Diameter-2 Augmentation is W [2]-hard.The Dominating Set problem is:Problem 7. Dominating SetInput: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set S ⊆ V of size at most k such thatfor every v ∈ V \ S there is some s ∈ S where {s, v} is an edge.Reduction from Dominating SetWe reduce Dominating Set to Diameter-2 Augmentation via a pa-rameterized reduction. That is, we give a mapping that sends a yes-instance(G, k) of Dominating Set to a yes-instance (G2, k2) of Diameter-2 Aug-mentation where k2 depends on k alone. Our mapping corresponds tok2 = k.Problem 8. Diameter-2 AugmentationInput: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set of at most k edges that can beadded to G so that the resulting graph has diameter 2.Let (G, k) be an instance of Dominating Set, where G = (V,E),|V | = n. Let V = {v1, v2, . . . , vn}. We construct a graph G2 with two734.3. Graph Diameterisomorphic copies of G called G = (V,E) and G′ = (V ′, E′), and letV ′ = {vn+1, vn+2, . . . , v2n}, such that for every pair i and j:i, j ≤ n, {vi, vj} ∈ E ⇐⇒ {vn+i, vn+j} ∈ E′.That is, the mapping φ(vi) = vn+i is an isomorphism from G to G′. Fori = 1, . . . n, we will refer to the pair vi and vn+i as twins and denote thisrelationship by vTi = vn+i and vTn+i = vi.The open neighbourhood of a vertex v is the set of vertices adjacent tov, denoted N(v). The closed neighbourhood of v is N [v] = {v} ∪ N(v). Inour constructed graph G2, on each vertex v in V , add the edge {v, φ(u)} forevery u in N [v]. Also add to G2 the vertex set Y where there is a vertexyij ∈ Y for every pair of indices 1 ≤ i, j ≤ 2n, and join yij to each of vi andvj . Add to G2 an edge between each pair of vertices in Y (so Y induces aclique with 2n vertices and(2n2)edges.)Finally, we create in G2 a vertex z adjacent to every vertex of Y andadjacent to no vertex in G1 ∪G2, and create a vertex x adjacent to z alone.(See Fig. 1.)In summary, given a graph G = (V,E) with V = {v1, . . . , vn}, we con-struct G2 = (V2, E2) such that:− V2 = {x, z} ∪ Y ∪ V ∪ V ′, where− Y = {yij : 1 ≤ i, j ≤ 2n}− V ′ = {vn+1, . . . , v2n}, and− E2 = E ∪ E′ ∪ {viφ(u) : 1 ≤ i ≤ n, u ∈ N [v]} ∪ {yy′ : ∀y, y′ ∈ Y } ∪{viyij , vjyij : 1 ≤ i, j ≤ 2n, i 6= j} ∪ {zy : ∀y ∈ Y } ∪ {xz}.When G has n vertices, G2 has 2n +(2n2)+ 2 ∈ O(n2) vertices, so thisreduction is polynomially-sized.Note that G2 has diameter at most 3 and also that every pair of verticesin G2 of distance 3 must be x with some vi ∈ V ∪ V ′. It is easy to seethat if a dominating set D of G contained k vertices, then the set of edges{{x, d}, d ∈ D} forms a diameter-2 augmenting set (also of size k) for G2.We must prove the converse.Theorem 4.5. G has a dominating set of size k if and only if G2 = (V2, E2)has an augmenting set of edges S such that H = (V2, E2 ∪ S) has diameter2.744.3. Graph DiameterG G'Y 12y vv vvx zvvvv34567816Figure 4.4: A small example of G2 constructed from G = P4. Only some ofset Y is shown.Before proving this theorem, we will first show that any augmentingedge set A for G2 with |A| = k that reduces the diameter of G2 to 2 canbe replaced with another augmenting set A′, also of size k, such that everyedge in A′ has the form {x, vi} for some vi ∈ V .If an augmenting set of G2 only contains edges from x to vertices in Vwe will call it proper. We can extract a dominating set of V (and thus ofG) from a proper diameter-2 augmenting set A′ of G2 simply by taking allthe vertices of V that are adjacent to x in A′.Say that A is a solution set of edges for Diameter-2 Augmentationon input G2. We will construct a proper augmenting set of at most the samesize as S. After adding the edges in A to G2, for any vertex v ∈ V ∪ V ′,there must be a 2-path (or less) joining x to v. If such a 2-path ever passesthrough the vertex z, we can remove edge {z, v} from A and add {x, v} toA instead. Note that such an edge-swap can never increase the diameter ofthe graph. We will provide a sequence of edge-swapping rules to the set Auntil we arrive at a proper augmenting set A′.Rule 1. If A has an edge {z, v} for any v ∈ G2 then remove {z, v} and add{x, v}.Rule 2. If S has any edge {x, vn+i} where vn+i ∈ V ′, remove {x, vn+i} andadd the edge {x, vi}.Rule 3. If A has an edge {x, yij} with vi adjacent to vj then remove {x, yij}and add the edge {x, vi}.To describe the rest of the rules, we partition V ∪ V ′ into the following754.3. Graph Diameterdynamic sets. Note that the vertex yij should be understood to be equivalentto the node yji.i) Vx = {v : v ∈ V ∪ V ′, xv ∈ A}ii) V − = {vi : vi ∈ V ∪ V ′ ∧ vi /∈ Vx ∧ xyij ∈ A}iii) V + = {v : v ∈ V ∪ V ′, v /∈ Vx ∪ V −}Clearly, these three sets are disjoint from each other and their union isexactly V ∪V ′. To arrive at a proper augmenting set, the edges of A joiningvertex x to the set Y will have to be swapped out. The sets Vx, V −, V + areupdated after every swap rule is applied. It should be easy to verify that allof the swapping rules will not increase the diameter of the resulting graphH = (V2, E2 ∪A).After applying Rules 1-3, all edges in A are of the form xy for somey ∈ Y or xv for some v ∈ V . We will use additional rules to remove theedges of the form xy, y ∈ Y .After applying any of the Rules 4 to 7 (given below), it should be un-derstood that Rule 3, and then Rule 2, may have to be re-applied in orderto maintain the invariant that all our edges in A join x to something inY ∪ V . Rule 1 will not have to be reapplied after applying it initially. Eachrule reduces the number of edges from x to Y , so this process must indeedterminate.Rule 4. If A has an edge {x, yij} with vi ∈ Vx then remove {x, yij} and addthe edge {x, vj}.Rule 5. If A has edge {x, yij} and vi is adjacent to some vertex in Vx thenremove {x, yij} and add the edge {x, vj}.Rule 6. If A has two edges {x, yab} and {x, ycd} such that va is adjacent tovc then remove {x, yab} and {x, ycd} and add {x, va} and {x, ybd}.Proposition 4.6. If no swap rule can be applied, the set V − is empty.Proof. If there are vi,vj in V − such that vivj ∈ E(G2) then Rule 6 could beapplied, so we have that V − is a stable set. If any edge vivj exists in G2 forvi ∈ V − and vj ∈ Vx, this would imply Rule 5 can be applied. Thus thereare no edges in G2 joining a vertex in V − to a vertex in Vx.Now consider any vertex v in V −: it must have an adjacent twin vertex(either φ(v) or φ−1(v) depending on whether v is in V or V ′, respectively)and call it vT .If vT is in Vx, then Rule 4 has not been exhausted.If vT is in V − as then either Rule 3 or Rule 7 can be applied, dependingon whether the yij vertex in NH(x) ∩NH(v) is or is not yi,n+i.764.3. Graph DiameterSo vT must be in V +. Every vertex in V + must have a 2-path to x inH, but the vertices in V + are not adjacent to any vertex in NH(x) ∩ Y , soevery vertex in V + must be adjacent to a neighbour of x in Vx. Now if vTis adjacent to some u ∈ Vx then so is v, which violates the fact that Rule 3has been exhausted.Hence no such v can exist, so V − is empty once these rules can no longerbe applied. Once the swap rules can no longer be applied, Proposition 4.6 tells usthat all edges in the augmenting set A must be from x to Vx ⊆ V meaningwe have arrived at a proper augmenting set.Now we complete the proof of Theorem 4.5:Proof. Given any augmenting set for the constructed graph G2, we applythe swap rules exhaustively until our augmenting set is proper. We canextract a dominating set of size at most k in V1. In the above notation, thisis exactly the set Vx when there are no more edge-swap rules that can beapplied. This provides a solution to the dominating set problem on G1, andso Diameter-2 Augmentation is W [2]-hard. 4.3.2 GeneralizationThe following theorem appears in [57], along with an alternate proof ofTheorem 4.5.Problem 9. Diameter-t Augmentation(G, k)Input: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set of at most k edges that can beadded to G so that the resulting graph has diameter at most t.Theorem 4.7. [57] Diameter-t Augmentation(G, k) is W [2]-hard forevery integer t ≥ 2.4.3.3 Additional ObservationsConsider the following problem, which asks if the diameter of a graphcan be improved (i.e. lowered) by adding at most k edges:Problem 10. Diameter ImprovementInput: A graph G = (V,E) and a positive integer k.Task: To determine if there exists a set of at most k edges that can beadded to G so that the resulting graph has a smaller diameter than G.774.3. Graph DiameterThe graph G2 resulting from the reduction from Dominating Set toDiameter-2 Augmentation has diameter 3. Finding an augmenting edgeset that improves this graph to diameter 2 will in fact solve the dominatingset problem on the original (pre-reduction) graph. This provides a proofthat Diameter Improvement is itself W [2]-hard (and NP-complete) evenwhen restricted to input graphs of diameter 3.Theorem 4.8. The parameterized problem DiameterImprovement(G, k) is W [2]-hard.Another observation about the structure of the constructed graph inthe reduction is that it will always contain a dominating clique. Clique-dominated graphs are general enough to include all split graphs. Bertossi [8]showed that Dominating Set is NP-hard for split graphs. Our reductionimplies W [2]-hardness for diameter augmentation for all clique-dominatedinput graphs.Theorem 4.9. The parameterized problem DiameterAugmentation(G, k) is W [2]-hard even when G has a dominating clique.Recently, independently from our results, the W [2]-hardness result ofDiameter Augmentation and of Diameter Improvement were provenwith a reduction from Set Cover [52].The next section gives a polynomial time algorithm to optimally solveDiameter Augmentation for a well-studied class of clique-dominatedgraphs.4.3.4 Diameter Augmentation for P4-sparse GraphsA P4-sparse graph is a graph G for which every set of 5 vertices of Gcontains at most one P4 as an induced subgraph. See Section 1.3.4 for moreinformation on these, and for their characterization in terms of thin andthick spider graphs (Lemma 1.2).If the complement G of a graph G is disconnected, then G is necessarilyof diameter at most 2.When a spider has |K| = |S| = 2, it is both a thin and thick spider.Consider a thick spider with |K| = |S| > 2: any two vertices in S have acommon neighbour in K, and so it is easy to verify that the diameter ofsuch thick spiders is 2. Every thin spider has diameter 3.Solving the diameter augmentation problem for P4-sparse graphs is thusreduced to the task of finding an augmenting edge set for thin spiders.784.4. Network ModelsTheorem 4.10. Let G = (V,E) be a thin spider with |K| = |S| = n. Thenthe minimum augmenting edge set A such that G2 = (V,E∪A) has diameter2 is of size n− 1.Proof. We construct the edge set of size n − 1 by choosing k1 ∈ K andcreating an edge from k1 to each si for i = 2 . . . n. Adding these edges willmake k1 adjacent to every vertex in G, and so the augmented graph hasdiameter 2.To see that there can not be an augmenting set of smaller size, we firstshow that any augmenting set solution on the thin spider can be replacedwith another augmenting set solution that contains only edges from S toK. If any augmenting set A contains an edge {si, sj}, fix i and consider allthe edges in A of the form {si, st} for any t. We can replace each {si, st}with {ki, st} without increasing the diameter. This process can be continuedwhile there is an edge in A joining two vertices in S.Next, if the augmenting set joins vertices si and sj by a 2-path througha vertex r ∈ R, then the edges {si, r} and {sj , r} can be replaced withthe two edges {si, k} and {sj , k} for any k ∈ K. This establishes that anyaugmenting set A that turns a thin spider into a graph of diameter 2 canbe replaced by another augmenting set A′ of the same size of A where everyedge of A′ joins a vertex from S to a vertex in K.Now consider any augmenting set A′ of size n−2 where every edge of A′has one endpoint in S and the other endpoint in K. Since there are n − 2edges, there must be two vertices sp and sq in S which are not incident withany edge in A′. Their distance must be 3, and so n−2 edges are insufficientto improve the diameter. Given a P4-sparse graph, one can identify the sets S, K, R in linear timesimply from the degree sequence [80]. This augmenting set of size n− 1 cantherefore easily be constructed in O(m+ n) time.4.4 Network ModelsWhen testing new ideas on networks it is often desirable to generate alarge number of network samples. As mentioned earlier in this chapter, anexpected topological property of social networks is the existence of hubs, andso randomly-generated networks should include a heavy-tailed distributionof its vertex degrees.We note that for applications in which network objects have a location insome space (such as positions on the earth), random geometric models [14]794.4. Network Modelshave been effective in modeling systems such as wireless networks [83] ordisease spreading [9].4.4.1 The Erdo˝s-Re´nyi ModelA simple way to randomly generate a network on n nodes is to specifya probability p and for every two nodes in this network, add an edge be-tween them with probability p. Each edge is added independently, so theprobability that a particular node v has degree k isP (deg(v) = k) =(n− 1k)pk(1− p)n−1−k.When np is a fixed constant c, this probability approaches ckeck! as n → ∞,so in this model, the chance that a node has large degree k is quite small.The clustering coefficient of a vertex in an Erdo˝s-Re´nyi random graphis easily calculated: recall Definition 4.1 which states that the (local) clus-tering coefficient of a vertex v is the number of edges in N(v) divided byto total number of possible vertex pairs in N(v). Since in this model, everyedge appears independently, the proportion of edges in N(v) that exist hasexpected value of exactly p. Typically, for large graph generation, p is chosento be O(1/n), so the amount of local clustering in an Erdo˝s-Re´nyi randomgraph is considered too low to model a realistic small-world network.4.4.2 The Watts-Strogatz ModelIn 1998, Watts and Strogatz proposed a model of random graphs tocapture the property of being small-world, as well as to exhibit the kind oflocal clustering observed in real networks. The construction is parameterizedby the number of nodesN and the average degree d such that ln(N) < d < Nand a rewiring parameter β ∈ [0, 1]. For simplicity, d is assumed to be even.The model begins with n vertices placed in a cyclic order. Each vertexv is made adjacent to the next d2 vertices in the cyclic order on each side ofv. Then, for each vertex u, for each edge uxi incident with u, rewire it withprobability β. To rewire uxi, choose some vertex xj uniformly at randomsuch that xj 6= u and uxj is not an edge. As β approaches 1, the resultingnetwork looks more like an Erdo˝s-Re´nyi random graph.As the initial structure (before rewiring) of the graph model createsmany triangles, the clustering coefficient is expected to be high if β is nottoo close to 1. It has been empirically shown that the small-world behaviour804.4. Network Modelsof Watts-Strogatz networks reveals itself for very small values of β (i.e.β = o(1)), [76] [114].While this model exhibits the small-world diameter and local clusteringexpected from real networks, it falls short in capturing the existence of hubs,that is, the heavy-tailed distribution in vertex degrees.4.4.3 The Baraba´si-Albert Preferential Attachment ModelBaraba´si and Albert found, in 1999, that the degree distribution of theWorld Wide Web (WWW) follows a power law distribution [6], as does anetwork of movie actors (with two actors being related by being cast in thesame movie). They also show a power-law behaviour of vertex degrees forthe network built on urban centres where two centres are joined by an edgeif there are high-voltage power lines connecting them, and also the networkof publications where one publication is (one-way) related to another by wayof citation.Baraba´si and Albert identify that two of the important properties thesenetworks have are growth and preferential attachment. That is, if one wereto model such networks with a randomly generated graph, the model shouldallow for a mechanism to add new vertices and for the new edges to begoverned by some rule of preference. For example, an Erdo˝s-Re´nyi randomgraph in which the number n vertices is specified and the probability p thanan edge exists is not sufficient to capture the structural properties of realnetworks.This model is sometimes referred to as the preferential attachment modelor the BA-model. Given a network and integer k, we add a new vertex v tothe network by attaching v to k existing vertices, randomly selected in such away that the probability of v being adjacent to an existing u is proportionalto the degree of u.Baraba´si and Albert showed that this preferential attachment modelexhibits a small diameter and a power-law distribution in its vertex de-grees. Empirically, the clustering coefficient has been shown to approximatea power law of the network size, which is uncharacteristic of real networks,but this still provides a level of clustering that is above that of Erdo˝s-Re´nyigraphs.4.4.4 The Random k-tree ModelWhile the BA-model has small diameter, a heavy-tailed degree distribu-tion and some clustering, we show in the following that it does not exhibit814.4. Network Modelsquite enough clustering when measured in terms of the number of cliques,from triangles to larger ones.One way to look past vertex degree as a measure of structure is to lookat the degree of a pair of vertices. We define a second-order structure calleda d-triangle which counts the degree of a pair of vertices which are joinedby an edge.Definition 4.11. The embeddedness of an edge e = uv in a graph G(V,E),denoted by degG(e), is defined to be the number of common neighbors of uand v. For an edge e = uv with degG(e) = d, the subgraph consisting of thevertices u, v, and their d common neighbors is called a d-triangle.We also say the edge embeddedness of e is d.We describe one last model here called the k-tree model and compare itto the BA-model in a number of ways.Starting with an initial k-clique Gk(k), a sequence of graphs {Gk(n),n > k} is constructed by adding vertices to the graph, one at a time. Toconstruct Gk(n+1) from Gk(n), add a new vertex vn+1 and connect it to thek vertices of a k-clique selected uniformly at random from all the k-cliquesin Gk(n). A graph obtained in this manner is called a random k-tree.One of the purposes of this section is to illustrate how the degree se-quence of a graph (a “first-order” property) is insufficient to capture manyof the internal structures of real social networks and to suggest the mixedk-tree and deleted k-tree models as a way to randomly generate the desirabledistributions of first- and higher-order structures.4.4.5 Cliques and Higher-Order StructuresIn our paper [133] the edge-embeddedness and degree distributions werecompared for two social networks. One network was created from a sam-pling of Facebook communication data and a second network was a randomsampling of the Orkut friendship network.For the generated samples, it was found that the degree distribution ofFacebook did not have a clear power-law shape while the Orkut networkdid exhibit such a trend. The edge embeddedness distribution was foundto behave similarly to the degree distribution on the respective networks.That is, the edge embeddedness distribution of Facebook did not indicate apower law while the Orkut network did.This suggests that when designing a random generation process for socialnetworks, it is insufficient to generate networks with a power-law distributionin the degree sequence when the edge embeddedness distribution is ignored.824.4. Network ModelsIndeed, the degree distribution tells very little about graphs in general:the class of graphs which are uniquely defined by degree sequence are the un-igraphs and these contain threshold ((P4, C4, 2K2)-free) graphs. The generalgraph property of whether a graph is split (partitions into a clique and sta-ble set) is also determinable by degree sequence, but this is a very stringentstructure. We argue below that going beyond degree distribution and evenbeyond edge embeddedness and looking into the distribution of higher-orderstructures (communities) is necessary to capture the topology of real-worldnetworks.Since there is no standard definition for a social community, we will followthe definition of Palla et al. [118], as their paper reports on the distributionof community size in real-world networks.Definition 4.12. A k-clique community in a network is defined as a unionof all k-cliques that can be reached from each other through a series ofadjacent k-cliques, where adjacency means sharing k − 1 nodes.One of the intriguing findings in the study of Palla et al. [118] is that thesize distribution of the k-clique communities follows a power law in severalreal-world networks such as the co-authorship networks, word-associationnetworks, and the protein interaction networks. In Figure 4.5, we reproduceplots on the size distribution of k-clique communities obtained [118].1 10 100 1000 10000110100100010000Co-Authorship NetworkCommunity SizeFrequency1 10 100 10001101001000Word Association NetworkCommunity SizeFrequencyFigure 4.5: Example Networks from Palla et al. with power-law communitysizes.In this section, we show that a simple variant of the random k-tree modelis able to capture the characteristic of the community structure much betterthan other existing models such as the BA model.834.4. Network Models1 10 100 1000 10000110100100010000Co-Authrs-ipph uNuresu swskomnrrCySkC1s izreFsnq10S0001 10 100 1000 10000 100000110100ko-Authrs-ipph uNuresu swskomnrrCySkC1s izreFsnq10S000Figure 4.6: K-Clique Communities in a Random Partial 4-Tree.Partial random k-tree model: A partial k-tree is a subgraph of ak-tree. A random partial k-tree Gk(n, r) is a graph obtained by removinguniformly at random r edges in a random k-tree Gk(n).When a graph is sufficiently dense, it is expected that all the nodeswould exist in one giant K-clique community for relatively small values ofK (such as K = 4). For relatively large values of K such as K equaling themaximum clique size of a network for instance, the K-clique communitiesare simply the max cliques. These two extremes create a trivial communitystructure which reveals nothing about the internal clustering of a network.We are interested in values of K for which there are many distinct K-cliquecommunities, and not just interested in when many vertices are in a K-clique community. It is the distribution of the sizes (in terms of the numberof vertices) of the resulting communities that we are interested here, but inorder to have a meaningful distribution, there must also be a large numberof communities.The study of Palla et al. [118] found meaningful k-clique communities inreal-world networks for k = 4.Using the clique percolation method of Palla et al. [118] to find the K-clique community sizes, we analyze the K-clique community sizes of Gk(n, r)for K ≤ k + 1 with various values for r. When comparing k-trees to BA-844.4. Network Models1 10 100 1000 100001101001000Co-Authrs-ipph uNeswukrsu smsnrArNrysSozFrrsqFcC00Wwukrsid s-ipph uNeaFrthr ?e1 10 100 1000 100001101001000Co-Authrs-ipph uNeswukrsu smsnrArNrysSozFrrsqFc?000Wwukrsids-ipph uNeaFrthr ?eFigure 4.7: Power Law Community Size in Partial k-trees.model graphs of similar density, the clique structure in the BA-model is fartoo sparse to compare the community size distributions. We show how easilypartial k-trees can reveal a power law distribution of 4-clique communitieswhile showing how difficult it is to produce any 4-clique distribution in theBA-model.Aside from directly comparing the partial k-tree model to the BA-model,we look at each model individually as well. For the partial k-tree, we lookfor the existence of K-clique communities (for K ≤ k) as r increases. Denoteby m the degree of a new vertex added to a network in the BA-model. Weset m large in the BA-model to see how far m must be taken in order tofind a meaningful K-clique community distribution.Power-law distributions reveal themselves as straight lines when data isplotted on log-log axes. All the plots in this section will be on log-log scales.The power law distribution of community size in random partial k-trees isimmediately visible even for small values of r and becomes more pronouncedas more communities are present. For a k-tree of 215.5 = 46341 nodes,removing only several hundred edges reveals a power-law distribution incommunity size and this remains even for r values past 15,000. We show aclear power-law distribution of community size in a 3-tree on 215.5 nodes infigure 4.6. The 4-clique communities become apparent with r values near500, and the figure shows that with a large number of deletions, r = 10000,3-clique communities can simultaneously be found in the same network.Figure 4.7 shows the distribution of 5-clique community size in a partial854.4. Network Models4-tree on 20000 vertices with varying values of r. The plots show r = 500and r = 2000 deletions from the same initial k-tree. While both reveal apower law, we also observe a giant community in the r = 500 case (theisolated point in the bottom right of the r = 500 plot.)1 10 100 1000 10000110100100010000Co-Authrs-ipph uNure-ipph uNwskumrnyrthr Sw1 10 100 1000110100zo-Authrs-ipph uNre-ipph uNwskumrnyrthr Sw1 10 100110Fo-Authrs-ipph uNure-ipph uNwskumrnyrthr SwFigure 4.8: K-Clique Communities in BA-model Graphs.In comparison to the above k-tree models generated with k = 3 or k =4, an equally-dense network generated with the BA-model cannot producesimilar clustering. Indeed, when a vertex is generated in the BA-model withm = 4 is very unlikely that the 4 vertices it attaches to will induce a clique.In order to see any 4-clique community existence, we had to increase m to9, and even there the structure was very sparse. We produce plots withm = 10 in figure 4.8 for networks on 12000 nodes. The BA-model withm = 10 could not reveal any significant K-clique community structure forK > 4. In order to observe a significant number of 4-clique communities ina BA model network, we had to adjust the m parameter to 13 or higher,but this produces networks of unrelated degree (first-order) measures whencompared to k-trees with k = 4.As mentioned above, the BA-model network for m=10 revealed littlecommunity structure even for 5-cliques. We generated a BA model networkwith m = 20 and since this creates many maximal cliques, the computationinvolved for community finding became computationally infeasible for largenumbers of nodes. Using a m = 20 network on 2000 nodes revealed only asingle 10-clique community. The only K-clique community structure foundfor K ≥ 5 in this network is shown in figure 4.9.Following the observations of Palla et al. [118] that real-world networkshave a power law distribution of K-clique community size, we give much864.5. A Graph Classes Perspective on Graph Generation1 10 100 1000110100Co-Authrs-ipph uNurepwk0msk000s inre1 10 100 1000110yo-Authrs-ipph uNurepwk0msk000s inre1 10 100110So-Authrs-ipph uNurepwk0msk000s inreFigure 4.9: K-clique communities in a BA-odel graph with m = 20.evidence here that the partial k-tree model can capture the desired distri-bution of higher-order structure, and that the strength of the distributionis easily controlled with adjusting the parameter r. We also found thatthe BA-model does not enjoy the luxury of easily adjusting its structure toproperly model a desired community structure.4.5 A Graph Classes Perspective on GraphGenerationThe process of growing a random graph has shown much importance inmodeling dynamic networks as many networks allow for growth mechanisms,like people joining a social network or a new paper being added to a library.The BA-model, using preferential attachment, gives rules for how a newvertex can attach to a existing network.We describe a list of graph operations in common use (in the study ofgraph classes) for the purpose of growing graphs and reveal implications ofcertain models, including the k-tree.i) Add a new vertex adjacent to no existing vertexii) Add a new vertex adjacent to only one existing vertexiii) Add a new vertex adjacent to all existing verticesiv) Add a new vertex adjacent to a clique874.5. A Graph Classes Perspective on Graph Generationv) Add a new vertex that is adjacent only to all the neighbours of anexisting vertexvi) Add a new vertex that is adjacent only to an existing vertex and allof its neighboursStarting with a single vertex and repeatedly applying rule (ii) will alwaysgenerate a tree, and every tree can be generated in that way as well. So rule(ii) completely characterizes a generative model for trees. If we allow anycombination of rules (i) and (ii), then this completely characterizes forests(a disjoint union of trees.)A graph is a threshold graph if the set of vertices can be assigned a realvalue where, for some fixed threshold value T , two vertices are adjacent inthe graph if and only if the sum of their real values exceeds T . Thresholdgraphs are completely characterized by a generation scheme consisting ofrules (i) and (iii).One of the early graph-generation methods to model the WWW networkwas proposed in [92], known as a copy model. This model is a randomprocess that repeatedly uses rules (v) and (vi). This rigid copy model wasused in a study of biological networks [24], while a more common use of themodel is to randomly select a subset of the neighbourhood of an existingvertex chosen uniformly at random. Various forms of the copy model havebeen shown to have power-law degree distributions [24] [92].A graph is chordal if every cycle of size 4 or more has a chord. It isknown that chordal graphs are completely characterized by the generativescheme defined by rule (iv). We observe that k-trees and mixed k-treesfall under this category: a new vertex added is adjacent only to a clique.When the parameter k (or the bounds of k as in the mixed case) are fixed,these random models do not generate all chordal graphs and so existingresults on randomly generated chordal graphs do not necessarily apply. Forinstance, the paper of Bender et al. [7] shows that under a similar chordalgraph generation scheme almost all chordal graphs are split. The authorsdo this by showing that, as the number of vertices n grows unboundedly, theremoval of a largest clique in a random chordal graph almost surely leavesbehind isolated vertices (no edges among them.) However, in the k-tree andmixed k-tree model, the maximum clique is easily seen to be bounded byk + 1 and the removal of this clique removes(k+12)edges, while an n-vertexk-tree has a quantity of edges on the order of nk. As n grows unboundedly,nk −(k+12)is much larger than 0 and so we contrast the result of [7] byobserving that while almost all chordal graphs are split, almost no k-tree issplit.884.5. A Graph Classes Perspective on Graph GenerationAs a future consideration, it would be interesting to investigate the ran-dom models defined by using various subsets of the vertex-addition rulesgiven above. Rather than just adding a vertex to a graph, many graphclasses are characterized by combining two smaller graphs (such as cographs,which are built by disjoint unions and complete joins). Perhaps the notionof adding a collection of nodes at one time can be used in future graphmodels, perhaps as a way of enforcing certain modularity expectations.89Chapter 5Bounded Search TreeMethodsA lot of research has been devoted to finding fixed-parameter tractablealgorithms for graph modification problems: Guo [70] studied edge dele-tion to split graphs, chain graphs, threshold graphs and co-trivially perfectgraphs; Kaplan et al. [81] studied edge-addition problems to chordal graphs,strongly chordal graphs and proper interval graphs.Cographs (Section 1.3.3) are an important class of graphs whose studyhas lead to a general theory of graph decomposition and modularity.As cographs are exactly the P4-free graphs, they also provide a general-ization to P3-free graphs (a.k.a. cluster graphs, see Section 1.3.1) which arefundamental to network cluster partitioning methods.While cographs can be recognized in linear time [31], it is also knownthat it is NP-complete to decide whether a graph is a cograph with k extraedges [43].Cai [19] showed fixed-parameter tractability for the edge deletion, edgeaddition, and edge editing problem to any class of graphs defined by a finiteset of forbidden induced subgraphs. The constructive proof implies thatk-edge-deletion problems to a class of graphs defined by a finite number offorbidden subgraphs is O(Mkp(m + n)) where p is some polynomial andM is the maximum over the number of edges in each of the forbidden in-duced subgraphs defining the graph class in question. For k-edge-deletionsto P4-free graphs in particular, Cai’s result implies an algorithm running inO(3k(m + n)) time. This algorithm would work by finding a P4: abcd in agraph and branching on the 3 possible ways of removing an edge in order todestroy the P4 (that is, removing either the edge {a, b}, {b, c} or {c, d}).Nikolopoulos and Palios studied the edge-deletion to cograph problemfor a graph G − xy where G is a cograph and xy is some edge of G [117].Lokshtanov et al. study cograph edge-deletion sets to determine whetherthey are minimal, but not a minimum edge-deletion set [98]. To the best ofour knowledge, ours is the first study that specifically addresses the edge-deletion problem to cographs. We present a bounded search tree algorithm90Chapter 5. Bounded Search Tree Methodsthat solves k-edge-deletion to cographs in O(2.562k(m+n)) time by perform-ing a search until we arrive at a P4-sparse graph and then optimally solvingthe remainder of the search space by exploiting the structure of P4-sparsegraphs (see Section 1.3.4).This chapter presents the first non-trivial algorithm for the cograph edge-deletion problem (running in O(2.562k)) and trivially-perfect edge-deletionproblem (running in O(2.450k) time.) We will give simple algorithms tofind minimum vertex-deletion sets to cographs and trivially perfect graphswhose runtime of O(3.303k) match the existing best methods. We will alsoillustrate how a careful branching strategy and refined analysis technique canimprove the runtime to O(3.115k) for the cograph vertex deletion problem.The problems studied in this chapter can be unified in the following way:Given a graph G, we want to delete vertices or delete/add/edit edges in Guntil the edited graph is in a class C. Consider some larger superclass Lof C. Modifying a graph to the less-restricted class of L has two benefits:the branching rules on the forbidden subgraphs of L in order to destroythe forbidden induced subgraphs of C often yields an improved (smaller)branching factor. Secondly, it may be the case that solving the C-editingproblem on a graph of type L is polytime solvable.The first step (phase 1) should be performed by making modificationsrequired to transform G into class C, but in such a way that will bring themodified graph into L first, creating a relaxed stopping condition. If L issomewhat close to C, it is conceivable that modifying an L-type graph toa C-type graph (phase 2) could be optimally solved in polynomial time, orsolved within the same time bound as that required for phase 1.In this Chapter, we use P4-sparse graphs for the superclass L to solvethe edge deletion and vertex deletion problems for cographs as C and quasi-threshold graphs as C.These results rely on the structure of P4-sparse graphs, as quasi-thresholdand cographs are proper subclasses of P4-sparse graphs, while the structureof P4-sparse graphs is simple enough to exploit.The definition of a spider appears in Section 1.3.4.Lemma 5.1. Let G be a spider with body K and feet S. Then every edge{k1, k2} with k1, k2 ∈ K is in exactly one P4 in K ∪ S.Proof.A P4 cannot contain 3 vertices of K. If G is a thin spider, let each ki beadjacent to each si. The edge {k1, k2} is only in the P4 {s1, k1, k2, s2}. If Gis a thick spider, let each ki be adjacent to every foot sj where i 6= j. Theedge {k1, k2} is only in the P4 {s1, k2, k1, s2}. 915.1. Edge-Deletion AlgorithmsAlgorithm 5: A meta-algorithm paradigm for graph modificationproblems.Algorithm MetaAlgorithm(G):Input: A Graph G = (V,E) and target class COutput: A minimally-modifed graph H from G such that H ∈ CLet L be an appropriately-chosen superclass of C;G′ ← G;(Phase 1)while G′ is not in L doFind and edit-out a forbidden substructure which defines class C,ideally one that is contained in a forbidden substructure of L;end(Phase 2)Polynomially and optimally solve the rest of the modificationproblem of G′ of class L to C;5.1 Edge-Deletion AlgorithmsIn this section, we give algorithms for two edge-deletion problems.Problem 11. Cograph Deletion (G, k):Given graph G = (V,E), does there exist a set S of at most k edges suchthat (V,E \ S) is a cograph?Problem 12. Trivially Perfect Deletion (G, k):Given graph G = (V,E), does there exist a set S of at most k edges suchthat (V,E \ S) is a trivially perfect graph?The idea of the algorithms in this section is to focus on the forbidden sub-graphs of P4-sparse graphs so that efficient branching rules can be designedsystematically. The usefulness of this idea depends critically on whetherthese problems can be solved polynomially on P4-sparse graphs. We firstshow how to solve the cograph deletion problem on P4-sparse graphs inlinear time.925.1. Edge-Deletion Algorithms5.1.1 Computing Cograph Edge-Deletion Sets on P4-sparseGraphs in Linear TimeWe show that a linear time divide-and-conquer algorithm can be designedto find the minimum cograph deletion set for P4-sparse graphs.Definition 5.2. Let G be a graph and G be the complement of G. Thesubgraphs induced by the vertex sets corresponding to the connected com-ponents of G are called the co-components of G. If G is connected, then wesay that G is co-connected.For a vertex set Vi, write G[Vi] for the induced subgraph from G onvertices Vi.Proposition 5.3. Let G = (V,E) be a P4-sparse graph and M(V ) be thesize of a minimum edge-deletion set required to turn G[V ] into a P4-freegraph.i) If G is disconnected with components V1, . . . , Vt, thenM(V ) =t∑i=1M(Vi)ii) If G is disconnected with co-components V1, . . . , Vt, thenM(V ) =t∑i=1M(Vi)iii) If G is a spider with head R, body K and feet S, thenM(R ∪K ∪ S) = M(R) +M(K ∪ S).Proof.(i) This follows from the fact that a P4 is connected and so any P4 is in onlyone connected component, even after some edge deletions.(ii) It is easy to verify that an edge joining two vertices in separate co-components can not be in a P4 (or else in the graph complement this wouldimply a P4 contains vertices in separate connected components as a P4 is self-complementary.) After any edge-deletions within a co-component are made,the vertex sets of separate co-components are still completely joined, and soany new P4s will not include any two vertices in separate co-components.935.1. Edge-Deletion Algorithms(iii) Call a leg edge any edge joining a vertex s ∈ S with a vertex k ∈ K,a head edge any edge joining some r1 ∈ R with some r2 ∈ R, a body edgeany edge joining two vertices in K, and call a neck edge any edge joiningsome r ∈ R with some k ∈ K.The structural definition of a spider implies that every vertex in K isadjacent to every vertex in K ∪ R, even after the removal of any leg edgesand head edges. Thus a P4 can never contain an edge {r, k} with r ∈ R andk ∈ K even after leg and head edge removals. We will show that there is anoptimal solution without body edges.Consider an edge-deletion set E′ such that G − E′ is a cograph, andlet E′′ ⊂ E′ be the set of body edges and neck edges in E′. Consider theP4s in G − E′ + E′′ (the P4s created when adding E′′ back to G − E′.) InG − E′ + E′′, K and R are completely joined and K is a clique and so noP4 uses a neck edge. So any P4s in G − E′ + E′′ are strictly in K ∪ S orstrictly in R. Since E′ is a cograph deletion set, the induced graph on R inG−E′+E′′ is P4-free. In K ∪S, the body edges added back may be in a P4with two leg edges, and if so, this P4 will be unique by Lemma 5.1. Addingthe body edges from E′′ cannot create a P4 involving a body edge not inE′′, so we just concentrate on the unique P4 that each of these added bodyedges may have created. By deleting one of these leg edges for each bodyedge that creates a P4, we create a new deletion set E′ − E′′ + E′′′ whereE′′′ is a set of leg edges and |E′′′| ≤ |E′′|, so this new edge deletion set is asolution no larger than E′ which does not contain body or neck edges. We note that parts (i) and (ii) of Proposition 5.3 apply to any graph G,and not just P4-sparse graphs.Lemma 5.4. Let G be a thin spider with body K = {k1, . . . , k|K|} and legsS = {s1, . . . , s|K|}, and {si, kj} is an edge if and only if i = j. Then aminimum cograph edge-deletion set for K ∪ S is {{si, ki}, i = 1 . . . |K| − 1}.Proof.It is obvious for |K| = 2, so assume that |K| > 2. Since K is a clique and Sis stable, every P4 in K ∪ S has its endpoints in S. Furthermore, every pairof vertices in S are in a unique P4. Deleting any |S|−1 thin legs will clearlydestroy all of the P4s, so this edge-deletion set is indeed a cograph edge-deletion set. To see that it is of minimum size, assume there is a deletionset of size |K| − 2 or less in which two legs are not part of the deletion set.Let these two legs be {s1, k1} and {s2, k2} and call them “permanent” inthis case. Since {s1, k1, k2, s2} is a P4 and the edges {s1, k1} and {s2, k2}are not in the deletion-set, it must be that {k1, k2} is in the deletion set.There at most |K| − 3 other edges in the deletion set. Now {s1, k1, kj , k2}945.1. Edge-Deletion Algorithmsinduces a P4 for every j = 3 . . . |K|. This means that the permanent edge{s1, k1} is still in |K|− 2 P4s and every pair of these P4s have distinct edgesaside from {s1, k1}. Thus it is impossible to destroy all of these remainingP4s with only |K| − 3 additional deletions or less. Lemma 5.5. Let G be a thick spider with body K = {k1, . . . , k|K|} and feetS = {s1, . . . , s|K|}, and {si, kj} is an edge if and only if i 6= j. Then aminimum cograph edge-deletion set for K ∪ S is {{ki, sj}, i < j}.Proof.Every edge in K ∪ S is in exactly one P4: an edge {ki, kj} is only in theP4 {sj , ki, kj , si} and any edge {si, kj} is only in the P4 {si, kj , ki, sj} so thenumber of P4s in K∪S is(|S|2), and since no two of these P4s share an edge, atleast(|S|2)deletions are required. Consider the edge set T = {{ki, sj}, i < j}.When deleting T from K ∪ S, K is still a clique and S is still a stableset, and so if there is any P4 in (K ∪ S) \ T , its endpoints must still bein S. But after deletion of T , we have that the neighbourhood of si isN(si) = {ki+1, . . . , k|K|} which means that N(si) ⊂ N(sj) for all i > j, andso no two vertices in S can be the endpoints of a P4. So T indeed destroysall the P4s in K ∪ S and since |T | =(|S|2), this is a minimum set. Theorem 5.6. Algorithm 6 correctly solves the cograph edge-deletion prob-lem for P4-sparse graphs and can be implemented in O(m+ n) time.Proof.The correctness of Algorithm 6 follows from Lemma 5.4, Lemma 5.5 andProposition 5.3.Algorithm 6 can be implemented in linear time, as the spider structureof P4-sparse graphs can be identified in linear time [79]. Identifying theconnected or co-components can also be done in linear time, as these typesof partitions are special cases of the more general notion of a homogeneousset or module, and there are a number of modular decomposition algorithmsrunning in linear time [105], [33]. Our algorithm to find cograph edge-deletion sets in P4-sparse graphs ispresented in Algorithm 6. This is an example of a realization of Phase 2 ofthe meta-algorithm, Algorithm 5.5.1.2 A Bounded Search Tree Algorithm for CographEdge-DeletionThe bounded search tree algorithm (Algorithm 7) finds 5-vertex subsetsthat induce at least 2 P4s, branches on the possible ways of destroying the955.1. Edge-Deletion AlgorithmsAlgorithm 6: Cograph edge-deletion algorithm for P4-sparse graphs.Algorithm Spider(G):Input: A P4-Sparse Graph G = (V,E)Output: A set T ⊂ E such that (V,E \ T ) is a P4-free graphif G (or G) is disconnected thenLet C1, . . . , Ct be the components or co-components of G;T ←⋃ti=1Spider(Ci);endG is a spider with K = {k1, . . . , k|K|} and S = {s1, . . . , s|K|};if G is a thin spider thenNotation: ki adjacent to sj if and only if i = j;Add edge {ki, si} to solution set T for every i = 1, . . . , |K| − 1;endif G is a thick spider thenNotation: ki adjacent to sj if and only if i 6= j;Add edge {ki, sj} to solution set T for every pair i < j;endReturn T ∪ Spider(R);P4s, and then finally arrives at a P4-sparse graph and calls Algorithm 6.This algorithm either terminates with a call to the subroutine (in the casethat a spider structure is encountered) or detects a cograph structure early,or else its integer parameter k has been reduced to 0 or less in which casethe number of allowed edge-deletions has been exhausted without reachinga cograph. This is an example realization of Phase 1 of the meta-algorithm,Algorithm 5.Refer to Figure 1.2 for the possible subgraphs the general search al-gorithm may encounter. We refer to specific edges as they are labeled inFigure 1.2 for each subgraph. The pseudocode description of the generalsearch algorithm branches on one of the deletion sets given in the tablebelow.Let H be one of the forbidden subgraphs from Figure 1.2. The possible965.1. Edge-Deletion Algorithmsedge-deletion sets to destroy the P4s in H are:H =Subgraph Minimal Edge Deletion SetsC5 {a,c}, {a,d}, {b,d}, {b,e}, {c,e}P5 {a,d}, {b}, {c}P 5 {a,b}, {e,c}, {d,e}, {c,d},{a,d,f}, {a,c,f}, {b,d,f}, {b,e,f}4-pan {a,d}, {a,c}, {b,c}, {b,d}, {e}co-4-pan {b,c}, {d}, {e}fork {a,b}, {c}, {d}kite {a,d}, {a,c,f}, {b,d,f}, {b,c}, {e}It is routine to verify that any edge-deletion set from each of the 7induced subgraph cases must contain one of the deletion set cases given inthe table. Since every P4 in the graph must be destroyed with an edgedeletion, encountering any of these 7 configurations necessitates the need toapply one of the corresponding deletions.The runtime of the algorithm is dominated by the size of the searchtree. The spider structure can be identified in linear time. When k is theparameter measuring the number of edge deletions left to make, the sizeT (k) of the search tree produced by this process is found from each branchrule separately:1. C5: five branches, each reducing the parameter by 2 gives T (k) =5T (k − 2) and so T (k) ≤ 2.237k2. P5: T (k) = 2T (k − 1) + T (k − 2) giving T (k) ≤ 2.415k3. P 5: T (k) = 4T (k − 2) + 4T (k − 3) giving T (k) ≤ 2.383k4. 4-pan: T (k) = T (k − 1) + 4T (k − 2) giving T (k) ≤ 2.562k5. co-4-pan: T (k) = 2T (k − 1) + T (k − 2) giving T (k) ≤ 2.415k6. fork: T (k) = 2T (k − 1) + T (k − 2) giving T (k) ≤ 2.415k7. kite: T (k) = T (k − 1) + 2T (k − 2) + 2T (k − 3) giving T (k) ≤ 2.270kThe size of the search tree is thus upper-bounded by the worst case ofdeleting P4s in a 4-pan: T (k) ≤ 2.562k.Theorem 5.7. Algorithm 7 correctly solves the cograph k-edge-deletion prob-lem in O(2.562k(m+ n)) time.975.1. Edge-Deletion AlgorithmsProof.Jamison and Olariu [79] give a linear time recognition algorithm for P4-sparse graphs. In the case that the graph being tested is not P4-sparse, thealgorithm terminates upon finding a 5-set of vertices isomorphic to one ofthe forbidden subgraphs shown in Figure 1.2. In O(m+n) time on a generalgraph, we can find one of the subgraphs in Figure 1.2 or else assert that ourgraph is P4-sparse.5.1.3 A Bounded Search Tree Algorithm for Edge-Deletionto Trivially Perfect GraphsIn [70], Guo studied the edge-deletion problem for complements of triv-ially perfect graphs. We know of no prior study of the specific problem ofdeleting edges to a trivially perfect graph. A na¨ıve solution would find asubgraph isomorphic to either a P4 or a C4 and then branch on the possibleways of deleting an edge from that subgraph, resulting in a worst-case searchtree of size O(4k). A minor observation that deleting any one edge from aC4 always results in the other forbidden subgraph, P4, allows us to branchon the 6 possible ways of deleting any 2 edges from a C4. This results in aworst-case search tree of size O(3k) due to the 3 edges in a P4.We use our strategy of branching towards a relaxation class of triviallyperfect graphs. The 6 possible ways of deleting 2 edges from C4 yield abranching factor of√6k≤ 2.45k, and since removing two edges from any C4is necessary to arrive at a (P4,C4)-free graph, our algorithm will begin byperforming this branching step before running a P4-sparse recognition algo-rithm. Then we proceed as in the previous section, finding any P4-sparseforbidden subgraph and branching on the ways of deleting P4s and C4s init. Once no P4-sparse obstruction exists, we solve the problem optimally onthe resulting specialized structure (a C4-free P4-sparse graph.) The branch-ing rules become simpler in that only 5 of the 7 graphs in Figure 1.2 needconsideration. In particular, the 4-pan that caused the bottleneck of Algo-rithm 7, is no longer considered and this changes the runtime of the processfrom O(2.562k) to O(2.450k).One main difference in this algorithm from Algorithm 7 is that C4s arefound and destroyed first, and after any of the P4-sparse deletions are made,the process restarts with looking for C4s to destroy again. Once the C4s aredestroyed and the resulting graph is P4-sparse, we proceed with removing985.1. Edge-Deletion Algorithmsedges with edge-deletion algorithm for thin or thick spiders (Algorithm 6).Algorithm 8: Bounded search tree algorithm finding a trivially perfectedge-deletion set.Algorithm TriviallyPerfectEdgeDeletion(G, k)Input: A Graph G = (V,E) and a positive integer kOutput: A set S of edges of G with |S| ≤ k where (V,E \ S) istrivially perfect if it exists, otherwise NoInitialize T = ∅;if k < 0 thenReturn No;endif G is trivially perfect thenTerminate here and return S;endif There exists H isomorphic to C4 thenFor each of the 6 possible pairs of edges, T ;TriviallyPerfectEdgeDeletion(G− T, k − 2);G← G+ T ;endApply a P4-sparse recognition algorithm;if G is P4-sparse thenT ← Spider(G);if |S|+ |T | ≤ k thenTerminate here and return S ∪ T ;endelseReturn No;endendelseA forbidden graph H from Figure 1.2 exists;foreach minimal edge-deletion set E′ for H doS ← S ∪ E′;T ← CographDeletion(G− S, k − |S|);if T == No thenS ← S \ E′;endendReturn No;end995.1. Edge-Deletion AlgorithmsThe correctness of decomposing the edge-deletion problem into separateproblems on K ∪ S and R depends a proposition similar to Proposition 5.3.Proposition 5.8. Let G = (V,E) be a C4-free graph and M(V ) be the sizeof a minimum edge-deletion set required to turn G[V ] into a (P4, C4)-freegraph.i) If G is disconnected with components C1, . . . , Ct, thenM(V ) =t∑i=1M(Ci)ii) If G is disconnected, then G is a complete join between a clique and asmaller C4-free graph, H, and M(V ) = M(V (H)).iii) If G is a spider with head R, body K and feet S, thenM(R ∪K ∪ S) = M(R) +M(K ∪ S).Proof.Case i): If G has more than one connected component, any edge deletionsmade in one component cannot create a P4 or a C4 in a different connectedcomponent.Case ii): Suppose G is disconnected. Let H be a set of at least 2 verticesinducing a connected component in G. Then H induces a C4-free graphin G since any induced subgraph of a C4-free graph is C4-free. Since H isconnected in G, there must be two non-adjacent vertices u, v of H in G. Letx and y be any two vertices not in H. Since H is a connected componentin G, every vertex in H is adjacent to every vertex outside of H. If x andy are not adjacent, then uxvyu is a C4 in G, which is impossible. So anyvertices outside of H must induce a clique in G. It follows, then, that no P4in G includes a vertex of G \H, and after any edge deletions in H, no P4 orC4 can include a vertex of G \H. Hence M(G) = M(H).Case iii): Notice that no C4 can include a vertex s from S in a spidereven after removals of leg edges and head edges since the neighbourhoodof s induces a clique. Since K is a clique, and every k ∈ K is adjacent toevery r ∈ R, there can not exist a C4 in K ∪R unless the C4 is completelycontained in R. So no C4 contains an edge from R to K. Therefore, anyedge e = {r, k} with r ∈ R and k ∈ K is not in any C4 in G, and for anysubset of leg edges and head edges E′ the edge e = {r, k} is not in any1005.2. Vertex-Deletion AlgorithmsC4 in G − E′. Combining this with Proposition 5.3 for P4s establishes thedecomposition.Proposition 5.8 shows us that since all the C4s are destroyed in thebranching stage of TriviallyPerfectEdgeDeletion(G, k), once we ar-rive at a C4-free spider, we are free to delete leg edges without creating anew C4.The runtime of Algorithm 8 is dominated by the branching rules onceagain. Encountering a C4 results in 6 branches which delete 2 edges each.The resulting recurrence is T (k) = 6T (k − 2) and so T (k) ≤ 2.450k. Hav-ing deleted all the C4s, we no longer include the P 5 or the 4-pan cases inour analysis. The runtime analysis for the rest remain unchanged: C5 :2.237k, P5 : 2.415k, co-4-pan: 2.415k, fork: 2.415k, kite: 2.270k. The searchtree is thus bounded by the C4 case of size O(2.450k). Finding a C4 directlyis a problem that is currently best-achieved using matrix multiplication [89],so this entire process as described runs in O(2.450knα) where O(nα) is thetime required for matrix multiplication (α ≤ 2.376 [29]).We can, in fact, modify the algorithm to run linearly in n and m byobserving that a graph is P4-free and C4-free if and only if it is a chordalcograph. By first running a certifying chordal recognition algorithm [135],we can either deduce that there is no C4 or else find a C4 or a C5 or a largerinduced cycle (and thus a P5) and branch on these subgraphs according tothe rules we gave, and if the graph is chordal then we apply a P4-sparserecognition algorithm to find one of the other forbidden induced subgraph,branch on it, and then re-apply the chordal recognition process.Theorem 5.9. Finding a trivially perfect k-edge-deletion set can be solvedin O(2.450k(m+ n)) time.5.2 Vertex-Deletion AlgorithmsThis section shows how our general method can be used to solve vertex-deletion version of our prior two problems:Problem 13. Cograph Vertex-Deletion (G, k):Given graph G = (V,E), does there exist a set S of at most k vertices suchthat G− S is a cograph?Problem 14. Trivially Perfect Vertex-Deletion (G, k):Given graph G = (V,E), does there exist a set S of at most k vertices suchthat G− S is a trivially perfect graph?1015.2. Vertex-Deletion Algorithms5.2.1 Vertex-Deletion to CographsSince removing a vertex set S from a graph G = (V,E) is equivalentto taking the induced subgraph on the vertex set V \ S, these problemsare also often named maximum induced subgraph problems. In our case ofasking if there is a vertex set of size at most k that can be removed to leavebehind a cograph, this is equivalent to asking if there is an induced cographsubgraph of size at least |V |−k. Removing a vertex from G can never createa new induced subgraph in G, and so deleting vertices to destroy inducedsubgraphs is commonly modeled as a Hitting Set problem. In this casein which each P4 maps to a 4-set in a Hitting Set instance, we havethe restricted problem of a 4-Hitting Set. Algorithms for such vertex-deletion problems should always be compared against the state-of-the-artalgorithms of d−Hitting Set if not anything else. d-Hitting Set is awell-studied NP-complete problem which admits fixed-parameter tractablealgorithms. The first improved analysis of d-Hitting Set by Niedermeierand Rossmanith [116] give a search tree of size O(3.30k), and a more detailedand involved analysis by Fernau [49] improves the bound to O(3.148k). Thisis the best known bound for 4-Hitting Set to date.The simple spider structure of P4 sparse graphs allows us to describe asimple algorithm for the vertex-deletion problem to cographs. The runtimeof this simple algorithm matches that of [66] and of [116]. The algorithmin [66] used branching rules that were designed by breaking the P4s in everysubgraph of size t. Testing various values of t deduced that rules basedon subgraphs of size 7 yielded the optimal runtime of an algorithm of thissort, with runtime O(3.30k). Their algorithm builds branching rules from447 graphs of size 7, while our algorithm only involves seven graphs on 5vertices (Figure 1.2.)In the following subsection, we use the analysis technique of [116] toshow that the runtime of our bounded search tree algorithm is O(3.115k),hence improving on Fernau’s O(3.148k). Our runtime could be improvedfurther if we were to use the methods of Fernau [49], but such an analysisis extensive and would sidetrack from the focus of this section.We describe the subroutine Spider Vertex-Deletion here. The algo-rithm works in the same way as Algorithm 6, taking as input a P4-sparsegraph and returning the optimal number of vertices to remove in order tobreak all P4s in the graph. For thin spiders, every pair of feet is the end-pairof a P4, and removing any |S| − 1 vertices from S will destroy all the P4s inthe body and legs. Removing less than |S| − 1 will leave at least two thinlegs and hence a P4, so |S| − 1 is necessary.1025.2. Vertex-Deletion AlgorithmsSince a set of 4 vertices induces a P4 in a graph G if and only if theyinduce a P4 in G, deleting any |K| − 1 vertices from K in a thick spiderwill destroy all the P4s in K ∪ S. In either the thin or thick spider case,the subroutine is then applied to head R. This concludes the description ofSpider Vertex-Deletion. The correctness of the algorithm is assertedby the following proposition:Proposition 5.10. Let G = (V,E) be a spider with head R, body K andfeet S. Let M(V ′) be the minimum number of vertices required to removefrom G[V ′] in order to turn G[V ′] into a cograph. Then M(V ) = M(R) +M(S ∪K).Proof.Deleting vertices from a graph can never create a new P4. We know fromProposition 5.3 that no P4 includes vertices from both K and R. Deletingany vertices from K ∪S will not destroy P4s in R, and vertex deletions fromR will not destroy any P4s in K ∪S. Hence M(V ) = M(R) +M(S ∪K). Corollary 5.11. The algorithm Spider Vertex-Deletion described abovecorrectly solves the cograph vertex deletion problem for spiders in linear time.This implies that Spider Vertex-Deletion serves as an implementa-tion of Phase 2 of the meta-algorithm, Algorithm 5. Algorithm 9 serves tofill Phase 1 of the meta-algorithm.For a general graph, we proceed as in the cograph edge-deletion algo-rithm. We find P4-sparse obstructions and branch on the possible ways ofdeleting vertices to destroy all P4s, repeating until the remaining graph isP4-sparse. The pseudocode description is given in 9.The branching rules for the vertex deletions are given in a table as before:H =Subgraph Minimal Vertex Deletion SetsC5 {1,2}, {1,3}, {1,4}, {1,5}, {2,3},{2,4}, {2,5}, {3,4}, {3,5}, {4,5}P5 {1,5}, {2}, {3}, {4}P 5 {1}, {3}, {4}, {2,5}4-pan {2}, {4}, {5}, {1,3}co-4-pan {3}, {4}, {5}, {1,2}fork {3}, {4}, {5}, {1,2}kite {2}, {4}, {5}, {1,3}The runtime of the algorithm is dominated by the branching steps. Theruntime T (k) for the C5 case depends on 10 branches, while each of theother cases have equivalent runtime analysis.1035.2. Vertex-Deletion Algorithms1. C5: ten branches, each reducing the parameter by 2 gives T (k) =10T (k − 2) and so T (k) ≤ 3.163k2. All others: T (k) = 3T (k − 1) + T (k − 2) giving T (k) ≤ 3.303kThe runtime of this vertex-deletion algorithm is bounded byO(3.303k(m+n)), matching the runtime of the cograph vertex deletion algorithm gener-ated by automated branching rule design [66].Theorem 5.12. Algorithm 9 solves the vertex-deletion problem for cographsin O(3.303k(m+ n)) time.5.2.2 Improvement using Hitting-SetThe 4-Hitting Set algorithm of [116] involves an analysis which countswhen a branch choice can be made on a 3-set. Without counting thesecases, an algorithm for 4-Hitting Set which only makes choices on 4-setswill have a search tree size of 4k. By keeping track of when 3-sets are createdin the search process and by branching on 3-sets whenever they are available,the authors of [116] are able to improve the upper-bound to the size of thesearch tree to O(3.30k).By using a similar strategy of choosing specific P4s to branch on, wehave shown that Cograph Vertex Deletion can be solved in O(3.115k)time [109].Fernau [49] gave an improved analysis to the Hitting Set algorithm,and we believe that a similar analysis would improve our runtime as well.The details, however, are very involved and lengthy, and would sidetrack usfrom our focus on branching strategies here.Comment 1. We show here that using a similar technique to that in [116]can improve our search tree size. For an instance (G, k) of cograph vertex-deletion, we will use an implicit instance of 4-Hitting Set where each setof 4 vertices inducing a P4 in G corresponds to a 4-set. A cograph deletionset for G will correspond to the hitting set of the set of 4-sets.We adapt the notion of dominance from Hitting Set to that of P4s: avertex v P4-dominates u if v exists in every P4 that u is in.Following the Hitting Set method, we observe that if v P4-dominatesu, then any hitting set that contains u could be replaced with a hitting setcontaining v. Using this observation, we mark u in the graph to signify thatit will not be in our solution. When we encounter P4s involving u, the P4only needs to be considered as a 3-set to hit. Marking u in G is equivalentto removing u in the implicit 4-Hitting Set instance.1045.2. Vertex-Deletion AlgorithmsOur vertex-deletion algorithm given in the previous subsection applies aP4-sparse recognition algorithm to find one of the 7 forbidden configurationsfrom Figure 1. We illustrate how to proceed to the branching step whenencountering a P5: {v1, v2, v3, v4, v5} with v1 and v5 as the endpoints:If we put v2 in S, remove v2 from the graph G and reduce the parameterk by 1. Any set in the hitting set instance H containing v2 is removed.Otherwise (if v2 is not in S) we mark v2 in G and remove v2 from H,possibly creating some 3-sets. If v3 is put in S, reduce k by 1 and remove v3from the graph, as before. Otherwise (if v3 is also not in S) then mark v3in the graph and remove v3 from H. If v4 is in S, reduce k by 1 and removeany set containing v4. Otherwise (if none of v2, v3, v4 are in S) we add v1and v5 to S, remove them from G and reduce the parameter k by 2.If we first ensure that P4-dominated vertices have been removed fromconsideration, some vertices in the P5 (or analogous forbidden subgraph)may be marked. We do not need to build branches on cases asking if avertex v is in S if v is already marked. If we encounter a P4 in one of ourforbidden subgraphs consisting of four marked vertices, we can terminatethat branch of the search tree and backtrack.When encountering any of the P4-sparse forbidden subgraphs: P5, P 5,kite, fork, 4-pan, co-4-pan, we have in each case 3 vertices whose removalwill break both P4s in the obstruction, or else two vertices which must beremoved together. Call those first 3 vertices the breaking vertices. Our pro-cess is summarized in the following algorithm:1055.2. Vertex-Deletion AlgorithmsAlgorithm 10: Using hitting-set for cograph vertex-deletion.Algorithm CographVertexDeletionHittingSet(G, k)Input: A Graph G = (V,E) and a positive integer k1. If any 3-set has been created, branch on that 3-set. Repeat untilthere are no more 3-sets;2. If any vertex is P4-dominated, mark it in G and remove it from thehitting set. Go to step 1.;3. Find one of P5, P 5, kite, fork, 4-pan, co-4-pan.;4. If there is a P4 all of whose vertices are marked, Stop andbacktrack.;5. Branch on the (up to 3) unmarked breaking vertices using thecases as described above. Go to step 1.;6. If all three breaking vertices are marked, include the other twovertices in S and go to 1.;7. If no such subgraph can be found, our graph is an extendedP4-sparse graph (See below.) Solve the remainder optimally withoutsearch.Let v and v′ be breaking vertices encountered after steps 1 and 2 cannotbe applied any further. If v is not in any other P4 besides the two P4s in theobstruction graph found in step 3, then v is P4-dominated by v′, but thiscannot happen if steps 1 and 2 are done to exhaustion. So we have that vmust be in another P4 not involving v′. When branching on v, we considerv ∈ S, in which case v is removed from G, or v /∈ S in which case we removev from H, creating at least one 3-set since we established that v must be inanother P4 not containing v′.Step 3 can be performed with a linear-time algorithm recognizing (P5, P 5,kite, fork, 4-pan, co-4-pan)-free graphs. These are called extended P4-sparsegraphs by Giakoumakis and Vanherpe [61]. This ensures we do not encountera C5 at this stage of the process. They showed:Theorem 5.13. [61] If C is a C5 in an extended P4-sparse graph, then C isa prime module.Let C be a C5 in our graph after reaching step 7 of our hitting-set process.Observe that every 4-set of C induces a P4, so no vertex of C is containedin a nontrivial module or else we will have one of the forbidden subgraphs ofextended P4-sparse graphs which we have already destroyed. Further, C can1065.2. Vertex-Deletion AlgorithmsFigure 5.1: An impossible configuration for a C5 in an extended P4-sparsegraph.not be a module in some P4 or else that P4 extends to one of the forbiddengraphs already destroyed (see Figure 5.1.) It must be that C is a set of 5vertices inducing a 5-cycle and not overlapping with any other P4. Since Cdoes not intersect with any other existing P4s left in G, we are free to chooseany two vertices of C to add to S and delete from G.After deleting every C5 from the extended P4-sparse graph, we have aconventional P4-sparse graph and we proceed with vertex deletions for spi-ders using Spider Vertex-Deletion described in the previous subsection.Let T (k) be the number of leaves in a search tree of our vertex deletionproblem, and let B(k) be the number of leaves in a search tree for thisproblem whose root branched on a 3-set. Step 4 of Algorithm 10, can (atworst) branch on each of the breaking vertices. If the first vertex is putin S, we reduce k by 1 and so we have a T (k − 1) branch. If we assumethe first vertex is not in S and select the second vertex to be in S, theparameter decreases by 1. Since this first vertex is not P4-dominated (orelse it would have been marked), it must be in another P4 and so markingthe it will create at least one 3-set in H, giving a B(k − 1) branch. Alongthe same lines, if we choose the third breaking vertex, we arrive at anotherB(k− 1) branch. In the final case of deleting the two non-breaking vertices,we create a T (k− 2) branch. Together this puts an upper bound on T (k) ofT (k − 1) + 2B(k − 1) + T (k − 2).Similarly, when branching on a 3-set, B(k) ≤ T (k − 1) + 2B(k − 1).Together, these two recurrences give a simultaneous system from which onecan show B(k) ≤ 3.115k and T (k) ≤ 1.115B(k) with a straightforwardinduction proof.Theorem 5.14. Algorithm 10 solves the cograph vertex-deletion problem inO(3.115k) time.The method of analyzing the search tree size created upon the existenceof a 3-set shows that our search tree size is smaller than the O(3.30k) for1075.2. Vertex-Deletion Algorithms4-Hitting Set found by Niedermeier and Rossmanith [116]. Fernau [49]refines this analysis process by keeping track of the the number of (d − 1)-sets in d-Hitting Set, arriving at O(3.148k) for 4-hitting set. Specifically,Fernau’s analysis involves expressions T i(k) for i = 0, 1, 2, 3 where i is thenumber of 3-sets in an instance of 4-hitting set (in our case, B(k) is T 1(k).)We are confident that a similar refinement in the analysis of our vertex-deletion algorithm would reveal further gains, but our presented algorithmis already shown to have a smaller search space.5.2.3 Vertex-Deletion for Trivially Perfect GraphsGiven a graph G, our task now is to find the largest induced triviallyperfect subgraph in G. Equivalently, given a value k, we want know whetherwe can delete at most k vertices in order to turn the graph P4-free and C4-free.In the edge-deletion version of this problem from the previous section,we deleted at least 2 edges from all C4s in the branching process since2 edges is necessary, and this was algorithmically appealing as it decreasedthe parameter by 2. The vertex-deletion problem does not share this luxury:there are 4 vertices in a C4 and only a single vertex removal is required toturn it into a (P4, C4)-free graph. This will result in a more complicatedprocedure to delete all remaining C4s in the P4-sparse graph that remainsafter the search process.We will proceed directly to finding the P4-sparse obstructions and branch-ing on them to turn each one into a (P4, C4)-free graph. This yields a worst-case runtime of O(3.303k), as summarized by the following table for eachobstruction graph H:H =Subgraph Minimal Vertex Deletion SetsC5{1,2}, {1,3}, {1,4}, {1,5},{2,3}, {2,4}, {2,5}, {3,4}, {3,5}, {4,5}P5 {1,5}, {2}, {3}, {4}P 5{1,2}, {1,3}, {1,4}, {1,5}, {2,3},{2,4}, {2,5}, {3,4}, {3,5}, {4,5}4-pan {2}, {4}, {1,3}, {1,5}, {3,5}co-4-pan {3}, {4}, {5}, {1,2}fork {3}, {4}, {5}, {1,2}kite {2}, {4}, {5}, {1,3}The runtime for each of these cases is summarized below:1085.2. Vertex-Deletion Algorithms1. C5: five branches, each reducing the parameter by 2 gives T (k) =10T (k − 2) and so T (k) ≤ 3.163k2. P5: T (k) = 3T (k − 1) + T (k − 2) giving T (k) ≤ 3.303k3. P 5: T (k) = 10T (k − 2) giving T (k) ≤ 3.163k4. 4-pan: T (k) = 2T (k − 1) + 3T (k − 2) giving T (k) ≤ 3k5. co-4-pan: T (k) = 3T (k − 1) + T (k − 2) giving T (k) ≤ 3.303k6. fork: T (k) = 3T (k − 1) + T (k − 2) giving T (k) ≤ 3.303k7. kite: T (k) = 3T (k − 1) + T (k − 2) giving T (k) ≤ 3.303kAfter all the forbidden configurations of P4-sparse graphs have been de-stroyed, we are left with a P4-sparse graph from which we must delete ver-tices to destroy the remainder of the P4s and C4s. While C4s do not existin a thin or thick spider, C4s will exist across co-components. Namely, ifA1 and A2 are two non-clique co-components, then any two nonadjacentvertices x1 and y1 in A1 and any two nonadjacent vertices x2 and y2 in A2will induce a 4-cycle. Since each connected component and co-componentof a P4-sparse graph must be a spider, every induced 4-cycle must be thetype that crosses non-clique co-components.In order for this P4-sparse graph to be C4 free, all but one of the co-components must be a clique. The only P4s that will be left to delete willbe those strictly in the non-clique co-component. To determine the optimalway at arriving at this point, let us introduce some notation: for a P4-sparsegraph G, let A1, A2, . . . , At be the co-components of G. Let ωi = ω(Ai) bethe size of a maximum clique in Ai, and ηi be the size of a minimum cographvertex-deletion set, as found by the algorithm Spider Vertex-Deletion.We seek to find i such that deleting all co-components Aj , j 6= i intocliques, plus Spider Vertex-Deletion(Ai) is a minimum. That is, wewant to find i that minimizesηi +∑j 6=i|Aj | − ωj .For a particular G,∑|Ai| = n is fixed, as is∑ωi. We see that theexpression above is minimized when i is chosen such that |Ai| − ωi − ηi is amaximum.1095.2. Vertex-Deletion AlgorithmsOur algorithm is as follows:Algorithm 11: Trivially Perfect Vertex-Deletion Algorithm.Algorithm TriviallyPerfectVertexDeletion(G, k)Input: A Graph G = (V,E) and a positive integer kOutput: Yes if there exists a set S of at most k vertices so thatG− S is trivially perfect, No otherwise.while G is not P4-sparse doLet H be a P4-sparse obstruction subgraph;Branch on the possible ways of deleting the P4s and C4s from H;Let k1 be the number of vertex deletions made in this stage;endG is P4-sparse. Let A1, . . . , At be the co-components of G;Fix i to be the lowest index maximizing |Ai| − ωi − ηi;for j 6= i doFix a maximum clique of Aj ;Delete any vertex of Aj which is not in this maximum clique;endLet k2 be the number of vertex deletions made in the for-loop;Let k3 be the number of deletions performed in SpiderVertex-Deletion(Ai);if k1 + k2 + k3 ≤ k thenreturn Yes;endreturn No;Since maximum cliques can be found in linear time on P4-sparse graphs,it is clear that this algorithm runs in polynomial time for any fixed k. Theruntime is dominated by the exponential factor from the search tree, whichwas shown to be O(3.303k).Theorem 5.15. Algorithm 11 is a fixed-parameter tractable algorithm whichsolves the vertex-deletion problem for trivially perfect graphs in O(3.303k).Of course, a hitting set-style improvement similar to the previous sectioncould be applied here.1105.3. Summary5.3 SummaryThis chapter focused on solving graph modification problems with thefixed-parameter tractability paradigm of a bounded search tree. Findinggraph structures that are clique relaxations was motivated in Chapter 2.Each problem studied in this chapter serves as a way of finding a closestor largest clique-relaxation structure and expressed as graph modificationproblems.We developed new and improved algorithms for many problems here, allfalling within the structure of the meta-algorithm presented as Algorithm 5.The main contribution (in the author’s opinion) of this chapter is thesystematic modularization of solving modification problems to a graph classby using a generalization of the target graph class to both: (1) shorten thedepth of search trees and (2) design efficient branching rules that handlemultiple forbidden configurations at once, thereby reducing the branchingfactor of search trees.The specific results in this chapter will be summarized in Chapter 6.1115.3. SummaryAlgorithm 7: Bounded search tree algorithm computing a cographedge-deletion set.Algorithm CographDeletion(G, k)Input: A Graph G = (V,E) and a positive integer kOutput: A set S of edges of G with |S| ≤ k where (V,E \ S) is acograph if it exists, otherwise NoInitialize T = ∅;if k < 0 thenReturn No;endif G is a cograph thenTerminate here and return S;endApply a P4-sparse recognition algorithm;if G is P4-sparse thenT ← Spider(G);if |T | ≤ k thenTerminate here and return S ∪ T ;endelseReturn No;endendelseA forbidden graph H from Figure 1.2 exists;foreach minimal edge-deletion set E′ for H doS ← S ∪ E′;T ← CographDeletion(G− S, k − |S|);if T == No thenS ← S \ E′;endendReturn No // All branches checked with no solution found;end1125.3. SummaryAlgorithm 9: Bounded search tree algorithm finding a cographvertex-deletion set.Algorithm CographVertexDeletion(G, k)Input: A Graph G = (V,E) and a positive integer kOutput: A set S of vertices of G with |S| ≤ k where (V \ S,E) is acograph if it exists, otherwise NoInitialize T = ∅;if k < 0 thenReturn No;endif G is a cograph thenTerminate here and return S;endApply a P4-sparse recognition algorithm;if G is P4-sparse thenT ← SpiderVertexDeletion(G);if |S|+ |T | ≤ k thenTerminate here and return S ∪ T ;endendelseA forbidden graph H from Figure 1.2 exists;foreach minimal vertex-deletion set V ′ for H doS ← S ∪ V ′;T ← CographVertexDeletion(G− S, k − |S|);if T == No thenS ← S \ V ′;endendReturn No;end113Chapter 6Concluding RemarksThis thesis was an attempt to bring in knowledge from the fields of al-gorithmic graph classes into network analysis, which has been studied bymany disciplines in distinctly different ways. Before this work was done, so-cial and complex network analysis made use of trees, cluster graphs, chordalgraphs (mostly via treewidth), and only recently have cographs made anappearance in the study of applied networks (see e.g. [151]). We have usedquasi-threshold graphs here in the study of social communities and hierar-chical organization, and P4-sparse graphs have made several appearances inthe associated algorithmic study of these community-detection problems.We still feel that the vast knowledge of many hundreds of graph classes [16](almost 1500 now on http://www.graphclasses.org) have use in definingthe future direction of network analysis. For instance, any pair of graphclasses comparable by set containment could potentially lead to improvedalgorithms in the form of Algorithm 5. Additionally, superclasses of clustergraph or quasi-threshold graphs can always be used as a type of clique-relaxation for the purposes of defining community structure, and mostlyevery class has an interesting structural property that can be used for fur-ther analysis of the community structure, just as the comparability-graphequivalence for quasi-threshold graphs serves as a way to find and measurethe hierarchical organization of the individuals.6.1 Summary of ThesisThis thesis began with a survey of definition from graph theory, graphclasses, and the algorithmic problems studied on graphs. In Chapter 2, welooked at the problem of finding clique clusters in networks and then weshowed the use of quasi-threshold graphs in defining social community as aclique relaxation.In Chapter 3, we showed how to extend the use of quasi-threshold graphsto the problem of detecting hierarchical organization in directed networks.In Chapter 4, we briefly surveyed some random network models and ar-gued that although measures such as the degree distributions are important1146.2. The Key of Contributions of this Thesisin characterizing an accurate network model, we must go further and under-stand the distribution of higher-order structures, such as community sizes,to fully capture the the structure of real-world networks.Chapter 5 builds a general framework for improving search-based al-gorithms for graph modification problems, and exhibits a number of best-known algorithms for several problems.6.2 The Key of Contributions of this ThesisThe specific results that are contained in this thesis are summarizedbelow.• We show that Cluster Deletion can be solved in polynomial timeon cographs.• We show that Cluster Deletion is NP-hard on a class of graphsslightly larger than cographs.• We define a new structure for social community called familial groupsand show that they naturally arise from sociologists’ previous work.• We show that quasi-threshold editing is NP-complete, settling an openproblem mentioned in 2006 [18], in 2008 [102], and again in 2011 [97].• We show that the problem of adding edges to a graph in order to makeits diameter equal to 2 is W [2]-hard. We show that it remains hardeven on clique-dominated diameter-3 graphs.• We implemented and tested familial groups on real-world networks toshow that the communities detected were meaningful and/or correct.• We showed how one might edit directed and weighted networks to findfamilial group communities and hierarchies.• We performed an empirical study of random BA-model graphs andshowed that the community structure is lacking while random partialk-trees do exhibit a distribution of community sizes consistent withthe findings of [118].• Using P4-sparse graphs, designed new algorithms with best-known run-times for vertex deletion and edge deletion modification problems toquasi-threshold graphs and cographs.1156.2. The Key of Contributions of this Thesis6.2.1 Future ConsiderationsFuture Work on Modifying to a Cluster GraphWe showed in Section 2.3 that the Cluster-Deletion problem can besolved in polynomial time when the input graph comes from the class ofcographs, while it is NP-hard on a class of graphs slightly larger thancographs. Although there is very little difference between these two classes,it would be interesting to know if one could find a dichotomy theorem thatsharpens the statement as to exactly when cluster deletion is polytime solv-able with respect to forbidding subgraphs.We also showed that a greedy maximum clique-finding algorithm wouldoptimally solve cluster deletion on cograph input. This property could applyto superclasses of cographs, even where cluster deletion is NP-hard, sinceon those classes it might be that detecting maximum cliques would not bea polynomial-time algorithm. Even with the loss of polytime solvability, itwould be interesting to determine the class of graphs for which obtaininggreedy cliques will solve the cluster deletion problem.Very recently, Bonomo et al. [15] showed that Cluster Deletion isNP-complete on weighted cographs, in contrast to our result showing it ispolytime solvable on unweighted cographs. They further show that Clus-ter Deletion is NP-complete on chordal graphs, even on P5-free chordalgraphs.Changing the problem consideration from edge deletions to edge editsmakes the problem far more complex. We do not believe cluster editing hasbeen studied on a well-known graph class before, and we do not know theanswers to the following problems:Problem 15. (Open) Given a cograph G and integer k, can we solve ClusterEditing(G, k) in polynomial time?A weaker version of this problem is:Problem 16. (Open) Given a quasi-threshold graph G and integer k, can wesolve Cluster Editing(G, k) in polynomial time?Future Work on Familial GroupsDespite the theoretical and computational justifications given for the useof familial groups, there is still work to be done in determining when thismethod of network clustering is desirable over other existing methods. Forinstance, [123] found that a scale-free topology of complex networks gives ev-idence of hierarchically-organized nodes, while networks without such struc-1166.2. The Key of Contributions of this Thesisture (such as those deriving from geographical data) are not hierarchical. Itwould be interesting to see if the method of familial groups would be con-sistent with their findings by yielding meaningful results only in scale-freenetworks.An aspect of (P4, C4) editing is that the edit set is not unique, and we canonly speculate at this point how varied the found communities would be inthe space of equally-weighted edit solutions. A specific question we can askis how one could define the intra-communal rank of individuals differentlyso that the importance of a vertex is perhaps measurable in the originalnetwork and not on the found edited graph, which is not unique. Alongthis line of reasoning, we wonder if there may be an easy way to predictthe individuals which will end up as leaders of groups after editing to aquasi-threshold graph.The computational results shown in Section 2.6 weighed an edge-additionand edge-deletion equally, for each of the C4 and the P4. But one of thereasons in justifying the removal of P4s from a community is that two verticeswhich are a distance 3 away from each other (that is, beyond the horizon ofobservability [55]) should likely be in separate close-knit communities. Withthis interpretation, perhaps it makes sense to consider only edge-deletions indestroying P4s, or maybe weighing the deletions more favorably than edgeadditions on those 4 vertices.Another possible future study is to analyze how much larger the foundcommunities become when relaxing the P4 restriction to a P5 restriction,and similarly relaxing the C4 to a C5. Many graph classes defined by theseforbidden induced subgraphs have already been studied, mostly in termsof their structure, but not necessarily in the context of modifying to suchgraphs. For example, while every connected (P4, C4)-free graph has at leastone vertex such that every other vertex in the component is adjacent to it, itis also known that every connected (P5, C5)-free graph has a clique in it suchthat every other vertex is adjacent to some vertex of that clique [34]. Asfar as we know, a graph modification problem (via edge deletions or edits)has not yet been studied for the class of (P5, C5)-free graphs. There aremany other possible graph classes that can serve as relaxations to P4 andC4-freeness as well.The computational problem of editing a graph to a nearest (P4, C4)-freegraph is still rather new. We showed here that it is in fact NP-complete, butthis does not rule out fast approximation algorithms or integer linear pro-gramming formulations. Even improved exponential-time exact algorithmswould be of interest, especially kernelization techniques which could reducethe size of large problem instances.1176.2. The Key of Contributions of this ThesisThere have been many studies on inferring global structure from localanalysis. With our definition of forbidding certain 4-vertex graphs, thisopens the door to new structural analysis possibilities, such as probabilisticmodeling techniques used in [27] or [47].Future Work on Branching StrategiesThe algorithms presented in Chapter 5 can all be regarded as instancesof Algorithm 5. All of the algorithms of Chapter 5 are deletion algorithms,and edge-edit version of analogous algorithms can also follow the paradigmif (i) all the required branching rules for the minimal edits can be describedand (ii) a polytime algorithm solving the editing problem on input froman appropriately-chosen superclass can be found. Indeed, shortly after ourresult containing algorithms for the cograph vertex and edge deletion andquasi-threshold edge deletion problems, the same approach (also using P4-sparse graphs) was successfully used to give an improved algorithm for thecograph edge editing problem [96]. Indeed, with the vast literature on struc-tural theorems of graph classes, we would expect that changing the super-class in Algorithm 5 from P4-sparse graphs to another simple class wouldlead to many more improved search-based FPT algorithms. Perhaps it isthat P4-sparse graphs are “simple” in the fact that they have bounded clique-width that allows one to polynomially-solve modification to its subclasses,in which case it might be worthwhile to consider replacing P4-sparse graphsto a class of larger, but still bounded, cliquewidth. It would be particu-larly interesting to see an example of a superclass/subclass pair that can beused in Algorithm 5 where the classes are not defined by a finite inducedsubgraph characterization, but either by an infinite number of forbiddenconfigurations, or even defined by the admittance of certain vertex orders.Aside from developing FPT algorithms, this search strategy could also beused to develop faster exponential time algorithms for other problems. Forexample, finding a maximum clique isW [1]-hard and a minimum dominatingset is W [2]-hard, but these are both solvable in linear time on cographs. Onecan imagine an implementation of an exponential-time algorithm for theseproblems which considers adding vertices to its solution set in a mannerthat destroys P4s, and once the remaining graph to be searched is P4-free,the linear-time solver for cographs can be used to efficiently complete thatsearch branch.118Bibliography[1] T. Vicsek A.-L. Baraba´si, E. Ravasz. Deterministic scale-free networks.Physica A, 299:559–564, 2001. → pages 54[2] R. D. Alba. A graph-theoretic definition of a sociometric clique. Jour-nal Mathematical Sociology, 3:113–126, 1973. → pages 21[3] B. Balasundaram, S. Butenko, I. V. Hicks, and S. Sachdeva. Cliquerelaxations in social network analysis: The maximum k-plex problem.Operations Research, 59(1):133–142, 2011. → pages 20, 21[4] B. Balasundaram, S. Butenko, and S. Trukhanov. Novel approaches foranalyzing biological networks. Journal of Combinatorial Optimization,10:23–39, 2005. → pages 21[5] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. MachineLearning, 56:89–113, 2004. → pages 10, 21, 22[6] A.-L. Baraba´si and R. Albert. Emergence of scaling in random net-works. Science, 286(5439):509–5122, 1999. → pages 81[7] E. A. Bender, L. B. Richmond, and N. C. Wormald. Almost all chordalgraphs split. Journal of the Australian Mathematical Society (SeriesA), 38:214–221, 4 1985. → pages 88[8] A. A. Bertossi. Dominating sets for split and bipartite graphs. Infor-mation Processing Letters, 19(1):37–40, 1984. → pages 78[9] A. N. Bishop and I. Shames. Link operations for slowing the spreadof disease in complex networks. EPL, 95:18005, 2011. → pages 80[10] S. Bo¨cker. A golden ratio parameterized algorithm for cluster editing.J. Discrete Algorithms, 16:79–89, 2012. → pages 11[11] S. Bo¨cker, S. Briesemeister, and G. W. Klau. Exact algorithms forcluster editing: Evaluation and experiments. Algorithmica, 60(2):316–334, 2011. → pages 20119Chapter 6. Bibliography[12] S. Bo¨cker and P. Damaschke. Even faster parameterized cluster dele-tion and cluster editing. Inf. Process. Lett., 111(14):717–721, 2011. →pages 11, 23[13] B. Bolloba´s. The diameter of random graphs. Trans. of the AmericanMathematical Society, (1):41–52. → pages 71[14] A. Bonato, J. Janssen, and P. Pra lat. A geometric model for on-line social networks. Proceedings of the International Workshop onModeling Social Media, (4), 2010. → pages 79[15] F. Bonomo, G. Duran, and M. Valencia-Pabon. Complexity of thecluster deletion problem on chordal graphs, subclasses of chordalgraphs, and cographs, 2014. → pages 116[16] A. Brandsta¨dt, V. B. Le, and J. P. Spinrad. Graph classes: a survey.Society for Industrial and Applied Mathematics, Philadelphia, PA,USA, 1999. → pages 2, 22, 114[17] H. Broersma, E. Dahlhaus, and T. Kloks. Algorithms for the treewidthand minimum fill-in of HHD-free graphs. In Rolf Mo¨hring, editor,Graph-Theoretic Concepts in Computer Science, volume 1335 of Lec-ture Notes in Computer Science, pages 109–117. Springer Berlin /Heidelberg, 1997. → pages 23[18] P. Burzyn, F. Bonomo, and G. Dura´n. NP-completeness resultsfor edge modification problems. Discrete Applied Mathematics,154(13):1824–1844, 2006. → pages 35, 52, 115[19] L. Cai. Fixed-parameter tractability of graph modification problemsfor hereditary properties. Inf. Process. Lett., 58(4):171–176, 1996. →pages 9, 22, 39, 90[20] M.-S. Chang. Algorithms for maximum matching and minimum fill-in on chordal bipartite graphs. In Tetsuo Asano, Yoshihide Igarashi,Hiroshi Nagamochi, Satoru Miyano, and Subhash Suri, editors, Algo-rithms and Computation, volume 1178 of Lecture Notes in ComputerScience, pages 146–155. Springer Berlin / Heidelberg, 1996. → pages23[21] G. Chapuy, E. Fusy, O. Gime´nez, and M. Noy. On the diameterof random planar graphs. In Proceedings of the 21st InternationalMeeting on Probabilistic, Combinatorial, and Asymptotic Methods in120Chapter 6. Bibliographythe Analysis of Algorithms, DMTCS Proceedings, volume AM, pages65–78. → pages 71[22] J. Chen and J. Meng. A 2k kernel for the cluster editing problem. J.Comput. Syst. Sci., 78(1):211–220, 2012. → pages 11[23] F. P. M. Chu. A simple linear time certifying LBFS-based algorithmfor recognizing trivially perfect graphs and their complements. Inf.Process. Lett., 107:7–12, June 2008. → pages 12, 35[24] F. Chung, L. Lu, T. G. Dewey, and D. J. Galas. Duplication modelsfor biological networks. Journal of Computational Biology, 10:677–687,2003. → pages 88[25] V. Chva´tal and P. L. Hammer. Aggregation of inequalities in integerprogramming. In B. H. Korte P. L. Hammer, E. L. Johnson andG. L. Nemhauser, editors, Studies in Integer Programming, volume 1of Annals of Discrete Mathematics, pages 145 – 162. Elsevier, 1977.→ pages 12[26] V. Chva´tal, C. T. Hoa`ng, N. V. R. Mahadev, and D. de Werra. Fourclasses of perfectly orderable graphs. J. Graph Theory, 11(4):481–495,1987. → pages 22[27] A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structureand the prediction of missing links in networks. Nature, 453:98–101,2008. → pages 47, 118[28] S. Cook. The complexity of theorem proving procedures. pages 151–158, 1971. → pages 5[29] D. Coppersmith and S. Winograd. Matrix multiplication via arith-metic progressions. J. Symb. Comput., 9(3):251–280, 1990. → pages101[30] T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction toAlgorithms. MIT Press, Cambridge, Mass, 2009. → pages 4[31] D. G. Corneil, Y. Perl, and L. K. Stewart. A linear recognition algo-rithm for cographs. SIAM J. Comput., 14:926–934, 1985. → pages 13,22, 27, 90[32] B. Courcelle and S. Olariu. Upper bounds to the clique width ofgraphs. Discrete Applied Mathematics, 101(1-3):77–114, 2000. →pages 13121Chapter 6. Bibliography[33] A. Cournier and M. Habib. A new linear algorithm for modular de-composition. In S. Tison, editor, CAAP, volume 787 of Lecture Notesin Computer Science, pages 68–84. Springer, 1994. → pages 95[34] M. B. Cozzens and L. L. Kelleher. Dominating cliques in graphs.Discrete Mathematics, 86(1-3):101–116, 1990. → pages 117[35] P. Damaschke. Fixed-parameter tractable generalizations of clusterediting. In Tiziana Calamoneri, Irene Finocchi, and Giuseppe Italiano,editors, Algorithms and Complexity, volume 3998 of Lecture Notes inComputer Science, pages 344–355. Springer Berlin / Heidelberg, 2006.→ pages 11[36] P. Damaschke. Bounded-degree techniques accelerate some param-eterized graph algorithms. In J. Chen and F. Fomin, editors, Pa-rameterized and Exact Computation, volume 5917 of Lecture Notes inComputer Science, pages 98–109. Springer Berlin / Heidelberg, 2009.→ pages 23[37] H. A. Dawah, B. A. Hawkins, and M. F. Claridge. Structure of para-sitoid communities of grass-feeding chalcid wasps. Journal of AnimalEcology, 64:708–720, 1995. → pages 47[38] A. Dessmark, J. Jansson, A. Lingas, E.-M. Lundell, and M. Persson.On the approximability of maximum and minimum edge clique parti-tion problems. Int. J. Found. Comput. Sci., 18(2):217–226, 2007. →pages 25[39] G. Dirac. On rigid circuit graphs. Abhandlungen aus dem Mathema-tischen Seminar der Universitt Hamburg, 25:71–76, 1961. → pages15[40] S. Donnelly and G. Isaak. Hamiltonian powers in threshold and ar-borescent comparability graphs. Discrete Mathematics, 202(1-3):33 –44, 1999. → pages 12, 55[41] R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer-Verlag, 1999. 530 pp. → pages 7[42] D. A. Mongru Z. N. Oltvai A.-L. Baraba´si E. Ravasz, A. L. Somera.Hierarchical organization of modularity in metabolic networks. Sci-ence, 297:1551–1555, 2002. → pages 54122Chapter 6. Bibliography[43] E. S. El-Mallah and C. J. Colbourn. Edge deletion problems: prop-erties defined by weakly connected forbidden subgraphs. Proc. Eigh-teenth Southeastern Conference on Combinatorics, Graph Theory, andComputing, Congressus Numerantium 61:275–285, 1988. → pages 9,13, 35, 90[44] L. Euler. Solutio problematis ad geometriam situs pertinentis. Com-ment. Academiae Sci. I. Petropolitanae, 8:128–140, 1736. → pages1[45] T. S. Evans. Clique graphs and overlapping communities. J. Stat.Mech., P12037, 2010. → pages 49[46] M. G. Everett and D. Krackhardt. A second look at krackhardt’sgraph theoretical dimensions of informal organizations. Social Net-works, 34(2):159–163, 2012. → pages 54, 55, 57[47] K. Faust. Triadic configurations in limited choice sociometric net-works: Empirical and theoretical results. Social Networks, 30(4):273–282, 2008. → pages 118[48] M. R. Fellows, J. Guo, C. Komusiewicz, R. Niedermeier, andJ. Uhlmann. Graph-based data clustering with overlaps. DiscreteOptimization, 8(1):2–17, 2011. → pages 22[49] H. Fernau. Parameterized algorithms for d-hitting set: The weightedcase. Theoretical Computer Science, 411(16):1698–1713, 2010. →pages 102, 104, 108[50] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75–174, 2010. → pages 20[51] M. Franceschet. PageRank: Stand on the shoulders of giants. CoRR,abs/1002.2858, 2010. → pages 70[52] F. Frati, S. Gaspers, J. Gudmundsson, and L. Mathieson. Augmentinggraphs to minimize the diameter. In Algorithms and Computation,pages 383–393. Springer, 2013. → pages 78[53] L. C. Freeman. A set of measures of centrality based upon between-ness. Sociometry, 40:35–41, 1977. → pages 18, 69[54] L. C. Freeman. Centrality in social networks: Conceptual clarification.Social Networks, 1:215–239, 1978. → pages 32123Chapter 6. Bibliography[55] N. E. Friedkin. Horizons of observability and limits of informal controlin organizations. Social Forces, 62(1):54–77, 1983. → pages 34, 117[56] Y. Gao, D. R. Hare, and J. Nastos. The cluster deletion problem forcographs. Discrete Mathematics, 313(23):2763–2771, 2013. → pagesiii, 26[57] Y. Gao, D. R. Hare, and J. Nastos. The parametric complexity ofgraph diameter augmentation. Discrete Applied Mathematics, 161(10-11):1626–1631, 2013. → pages iii, 8, 77[58] M. R. Garey and D. S. Johnson. Computers and Intractability: AGuide to the Theory of NP-Completeness. W. H. Freeman & Co.,New York, NY, USA, 1979. → pages 73[59] F. Gavril. The intersection graphs of subtrees in trees are exactly thechordal graphs. Journal of Combinatorial Theory, Series B, 16(1):47– 56, 1974. → pages 16, 22[60] V. Giakoumakis, F. Roussel, and H. Thuillier. On P4-tidy graphs.Discrete Math. Theor. Comput. Sci., 1(1):17–41, 1997. → pages 22[61] V. Giakoumakis and J.-M. Vanherpe. On extended p4-reducible andextended p4-sparse graphs. Theoretical Computer Science, 180(1-2):269–286, 1997. → pages 106[62] M. Girvan and M. E. J. Newman. Community structure in social andbiological networks. Proceedings of the National Academy of Sciences,99(12):7821–7826, 2002. → pages 18, 42, 49, 69[63] M. C. Golumbic. Trivially perfect graphs. Discrete Mathematics,24(1):105–107, 1978. → pages 12[64] M. C. Golumbic. Foreword 2004: The annals edition. In Mar-tin Charles Golumbic, editor, Algorithmic Graph Theory and PerfectGraphs, volume 57 of Annals of Discrete Mathematics, pages xiii–xiv.Elsevier, 2004. → pages 22[65] J. Gramm, J. Guo, F. Hu¨ffner, and R. Niedermeier. Graph-modeleddata clustering: Fixed-parameter algorithms for clique generation. InRossella Petreschi, Giuseppe Persiano, and Riccardo Silvestri, editors,Algorithms and Complexity, volume 2653 of Lecture Notes in Com-puter Science, pages 636–636. Springer Berlin / Heidelberg, 2003. →pages 11, 22124Chapter 6. Bibliography[66] J. Gramm, J. Guo, F. Hu¨ffner, and R. Niedermeier. Automated gener-ation of search tree algorithms for hard graph modification problems.Algorithmica, 39(4):321–347, 2004. → pages 102, 104[67] M. S. Granovetter. The strength of weak ties. American Journal ofSociology, 78(6):1360–1380, 1973. → pages 33[68] M. Gro¨tschel, L. Lova´sz, and A. Schrijver. The ellipsoid method and itsconsequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981. → pages 22, 24[69] S. Guillemot, C. Paul, and A. Perez. On the (non-)existence of poly-nomial kernels for Pl-free edge modification problems. In VenkateshRaman and Saket Saurabh, editors, Parameterized and Exact Com-putation, volume 6478 of Lecture Notes in Computer Science, pages147–157. Springer Berlin / Heidelberg, 2010. → pages 13[70] J. Guo. Problem kernels for NP-complete edge deletion problems:Split and related graphs. In ISAAC, pages 915–926, 2007. → pages35, 90, 98[71] J. Guo. A more effective linear kernelization for cluster editing. Theor.Comput. Sci., 410(8-10):718–726, 2009. → pages 11[72] J. Guo, C. Komusiewicz, R. Niedermeier, and J. Uhlmann. A morerelaxed model for graph-based data clustering: s-plex cluster editing.SIAM J. Discrete Math., 24(4):1662–1683, 2010. → pages 11[73] P. L. Hammer and B. Simeone. The splittance of a graph. Combina-torica, 1(3):275–284, 1981. → pages 17[74] S. Hartung, C. Komusiewicz, and A. Nichterlein. On structural pa-rameterizations for the 2-club problem. In SOFSEM, pages 233–243,2013. → pages 21[75] M. B. Hastings. Community detection as an inference problem. Phys.Rev. E, 74:035102, Sep 2006. → pages 20[76] B. Hayes. Graph theory in practics: Part ii. American Scientist,88(2):104–109, 2000. → pages 81[77] R. B. Hayward, J. P. Spinrad, and R. Sritharan. Improved algorithmsfor weakly chordal graphs. ACM Trans. Algorithms, 3, May 2007. →pages 22125Chapter 6. Bibliography[78] C. T. Hoa`ng. Perfect graphs, (Ph.D. thesis). School of ComputerScience, McGill University Montreal, 1985. → pages 13, 15[79] B. Jamison and S. Olariu. Recognizing P4-sparse graphs in linear time.SIAM J. Comput., 21(2):381–406, 1992. → pages 13, 14, 15, 95, 98[80] B. Jamison and S. Olariu. A tree representation for P4-sparse graphs.Discrete Appl. Math., 35:115–129, January 1992. → pages 13, 79[81] H. Kaplan, R. Shamir, and R. E. Tarjan. Tractability of parameterizedcompletion problems on chordal, strongly chordal, and proper intervalgraphs. SIAM J. Comput., 28(5):1906–1922, 1999. → pages 90[82] R. M. Karp. Reducibility along combinatorial problems. Complexity ofComputer Computations, Proc. Sympos. IBM Thomas J. Watson Res.Center, Yorktown Heights, N.Y.. New York: Plenum, pages 85–103,1972. → pages 5, 7, 8, 9[83] H. Kenniche and V. Ravelomananana. Random geometrix graphs asmodel of wireless sensor networks. Computer and Automation Engi-neering (ICCAE), 4:103–107, 2010. → pages 80[84] T. Kloks, D. Kratsch, and C. K. Wong. Minimum fill-in on circle andcircular-arc graphs. J. Algorithms, 28(2):272–289, 1998. → pages 23[85] D. E. Knuth. The Stanford GraphBase: A Platform for CombinatorialComputing. Addison-Wesley, Reading, MA, 1993. → pages 46[86] C. Komusiewicz. Parameterized Algorithmics for Network Analysis:Clustering & Querying. PhD thesis, 2011. → pages 10, 24[87] C. Komusiewicz and J. Uhlmann. Cluster editing with locally boundedmodifications. Discrete Applied Mathematics, 160(15):2259 – 2270,2012. → pages 24[88] D. Krackhardt. Computational organization theory. chapter Graphtheoretical dimensions of informal organizations, pages 89–111. 1994.→ pages 54, 55, 56, 57, 59[89] D. Kratsch and J. Spinrad. Between O(nm) and O(nα). SIAM J.Comput., 36(2):310–325, 2006. → pages 101[90] V. Krebs. Mapping networks of terrorist cells. Connections, 24(3):43–52, 2002. → pages 71126Chapter 6. Bibliography[91] M. Kriva´nek and J. Mora´vek. NP-hard problems in hierarchical-treeclustering. Acta Informatica, 23(3):311–323, 1986. → pages 10[92] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins,and E. Upfal. Stochastic models for the web graph. In Proceedingsof the 41st Annual Symposium on Foundations of Computer Science,FOCS ’00, Washington, DC, USA, 2000. IEEE Computer Society. →pages 88[93] C. Lekkerkerker and D. Boland. Representation of finite graphs bysets of intervals on the real line. Fund. Math., 51:45–64, 1962. →pages 16[94] J. M. Lewis and M. Yannakakis. The node-deletion problem for hered-itary properties is np-complete. Journal of Computer and System Sci-ences, 20(2):219–230, 1980. → pages 13, 16[95] C. Li, S. T. McCormick, and D. Simchi-Levi. On the minimum-cardinality-bounded-diameter and the bounded-cardinality-minimum-diameter edge addition problems. Oper. Res. Lett., 11:303–308, 1992.→ pages 8, 73[96] Y. Liu, J. Wang, J. Guo, and J. Chen. Cograph editing: Complex-ity and parameterized algorithms. In B. Fu and D.-Z. Du, editors,COCOON, volume 6842 of Lecture Notes in Computer Science, pages110–121. Springer, 2011. → pages 13, 118[97] Y. Liu, J. Wang, J. Guo, and J. Chen. Complexity and parameter-ized algorithms for cograph editing. Theoretical Computer Science,461(0):45 – 54, 2012. 17th International Computing and Combina-torics Conference (COCOON 2011). → pages 35, 36, 52, 115[98] D. Lokshtanov, F. Mancini, and C. Papadopoulos. Characterizingand computing minimal cograph completions. Discrete Appl. Math.,158(7):755–764, 2010. → pages 90[99] F. Luccio and M. Sami. On the decomposition of networks in mini-mally interconnected subnetworks. IEEE Trans. Circuit Th., 16:184–188, 1969. → pages 18[100] R. D. Luce. Connectivity and generalized cliques in sociometric groupstructure. Psychometrika, 15:169–190, 1950. → pages 21127Chapter 6. Bibliography[101] D. Lusseau. The emergent properties of a dolphin social network. Pro-ceedings of the Royal Society of London Series B-Biological Sciences,270:S186–S188, 2003. → pages 47[102] F. Mancini. Graph modification problems related to graph classes.Ph.D. thesis, University of Bergen, 2008. → pages 35, 52, 115[103] D. Marx. Parameterized Complexity and Approximation Algorithms.The Computer Journal, 51(1):60–78, 2008. → pages 7[104] D. Marx. Chordal deletion is fixed-parameter tractable. Algorithmica,57(4):747–768, 2010. → pages 16[105] R. M. McConnell and J. Spinrad. Modular decomposition and tran-sitive orientation. Discrete Mathematics, 201(1-3):189–241, 1999. →pages 95[106] R. J. Mokken. Cliques, clubs and clans. Quality and Quantity, 13:161–173, 1979. → pages 21[107] J. Nastos and Y. Gao. A note on the hardness of graph diameteraugmentation problems. CoRR, abs/0909.3877, 2009. → pages iii[108] J. Nastos and Y. Gao. A novel branching strategy for parameterizedgraph modification problems. In Proceedings of the 4th internationalconference on Combinatorial optimization and applications - VolumePart II, COCOA’10, pages 332–346. Springer-Verlag, 2010. → pagesiii, 23, 35[109] J. Nastos and Y. Gao. Bounded search tree algorithms for parame-terized cograph deletion: Efficient branching rules by exploiting struc-tures of special graph classes. Discrete Mathematics, Algorithms andApplications, 4(1), 2012. → pages iii, 54, 104[110] J. Nastos and Y. Gao. Familial groups in social networks. SocialNetworks, 35(3):439–450, 2013. → pages iii, 46[111] A. Natanzon. Complexity and approximation of some graph modifi-cation problems. MSc Thesis, Tel Aviv University, 1999. → pages 10,16, 17, 23[112] A. Natanzon, R. Shamir, and R. Sharan. Complexity classification ofsome edge modification problems. In Proceedings of the 25th Interna-tional Workshop on Graph-Theoretic Concepts in Computer Science,128Chapter 6. BibliographyWG ’99, pages 65–77, London, UK, 1999. Springer-Verlag. → pages16[113] M. E. J. Newman and M. Girvan. Finding and evaluating communitystructure in networks. Physical Review E, 69, 2004. → pages 47[114] M. E. J. Newman and D. J. Watts. Renormalization group analysis ofthe small-work network model. Physics Letters A, 263(4-6):341–346,1999. → pages 81[115] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. OxfordLecture Series in Mathematics and Its Applications, Oxford UniversityPress, 2006. → pages 7[116] R. Niedermeier and P. Rossmanith. An efficient fixed-parameter algo-rithm for 3-hitting set. J. Discrete Algorithms, 1(1):89–102, 2003. →pages 102, 104, 108[117] S. D. Nikolopoulos and L. Palios. Adding an edge in a cograph. InWG, pages 214–226, 2005. → pages 90[118] G. Palla, I. Dere´nyi, I. Farkas, and T. Vicsek. Uncovering the overlap-ping community structure of complex networks in nature and society.Nature, 435:814, 2005. → pages 83, 84, 86, 115[119] P. A. Pevzner, H. Tang, and M. S. Waterman. An Eulerian pathapproach to DNA fragment assembly. Proceedings of the NationalAcademy of Sciences of the United States of America, 98(17):9748–9753, 2001. → pages 1[120] S. Poljak. A note on stable sets and colourings of graphs. Commenta-tiones Mathematicae Universitatis Carolinae, 15(2):307–309, 1974. →pages 25[121] F. Protti, M. D. da Silva, and J. L. Szwarcfiter. Applying modulardecomposition to parameterized cluster editing problems. Theory ofComputing Systems, 44(1):91–104, 2009. → pages 11[122] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defin-ing and identifying communities in networks. Proc. Natl. Acad. Sci.USA, 101:2658–2663, 2004. → pages 19[123] E. Ravasz and A.-L. Baraba´si. Hierarchical organization in complexneworks. Physical Review E, 67:026112, 2003. → pages 54, 116129Chapter 6. Bibliography[124] B. A. Reed, K. Smith, and A. Vetta. Finding odd cycle transversals.Oper. Res. Lett., 32(4):299–301, 2004. → pages 17[125] G. Sabidussi. The centrality index of a graph. Psychometrika,31(4):581–603, 1966. → pages 70[126] M. Sales-Pardo, R. Guimera, A. A. Moreira, and L. A. N. Amaral.Extracting the hierarchical organization of complex systems. Proceed-ings of the National Academy of Sciences, 104(39):15224–15229, 2007.→ pages 54[127] S. E. Schaeffer. Graph clustering. Computer Science Review, 1:27–64,2007. → pages 20[128] A. A. Schoone, H. L. Bodlaender, and J. van Leeuwen. Diameterincrease caused by edge deletion. Journal of Graph Theory, 11(3):409–427, 1987. → pages 8, 73[129] S. B. Seidman and B. L. Foster. A graph theoretic generalization ofthe clique concept. J. of Mathematical Sociology, 6:139–154, 1978. →pages 21[130] R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification prob-lems. Discrete Applied Mathematics, 144(1-2):173–182, 2004. → pages10, 22, 24[131] P. Smyth and S. White. A spectral clustering approach to findingcommunities in graphs. Proceedings of the fifth SIAM internationalconference on data mining, 119, 2005. → pages 20[132] A. Sridharan. Topological features of online social networks. MScthesis, Univ. of Victoria, 2011. → pages iii[133] A. Sridharan, Y. Gao, K. Wu, and J. Nastos. Statistical behavior ofembeddedness and communities of overlapping cliques in online socialnetworks. CoRR, abs/1009.1686, 2010. → pages iii, 82[134] R. E. Tarjan. Decomposition by clique separators. Discrete Mathe-matics, 55(2):221–232, 1985. → pages 16[135] R. E. Tarjan and M. Yannakakis. Addendum: Simple linear-time algo-rithms to test chordality of graphs, test acyclicity of hypergraphs, andselectively reduce acyclic hypergraphs. SIAM J. Comput., 14(1):254–255, 1985. → pages 101130Chapter 6. Bibliography[136] W. T. Tutte. Graph Theory. Cambridge University Press, 2001. →pages 55[137] F. Wang, H. Du, E. Camacho, K. Xu, W. Lee, Y. Shi, and S. Shan.On positive influence dominating sets in social networks. TheoreticalComputer Science, 412(3):265–269, 2011. → pages 7[138] W. Wang, D. Kim, N. Sohaee, C. Ma, and W. Wu. A PTAS for min-imum d-hop underwater sink placement problem in 2-D underwatersensor networks. Discrete Mathematics, Algorithms and Applications,(1):283–289, 2009. → pages 7[139] S. Wasserman and K. Faust. Social Network Analysis: Methods andApplications. Cambridge University Press, 1994. → pages 16[140] D. J. Watts and S. H. Strogatz. Collective dynamics of ’small-world’networks. Nature, 393(6684):440–442, 1998. → pages 69[141] M. Weller, C. Komusiewicz, R. Niedermeier, and J. Uhlmann. Onmaking directed graphs transitive. In Algorithms and Data Structures,pages 542–553. Springer, 2009. → pages 62[142] M. Weller, C. Komusiewicz, R. Niedermeier, and J. Uhlmann. Onmaking directed graphs transitive. Journal of Computer and SystemSciences, 78(2):559–574, 2012. → pages 62[143] S. H. Whitesides. A method for solving certain graph recognition andoptimization problems, with applications to perfect graphs. Technicalreport, Ithaca, NY, USA, 1982. → pages 16[144] E. S. Wolk. The comparability graph of a tree. Proc. Amer. Math.Soc., 3:789–795, 1962. → pages 11, 12[145] E. S. Wolk. A note on the comparability graph of a tree. Proc. Amer.Math. Soc., 16:17–20, 1965. → pages 56[146] J.-H. Yan, J.-J. Chen, and G. J. Chang. Quasi-threshold graphs. Dis-crete Applied Mathematics, 69(3):247–255, 1996. → pages 33[147] M. Yannakakis. The node-deletion problem for hereditary properties.Tech Rep 240., Computer Science Labtratory, Princeton U, PrincetonNJ, 1978. → pages 17131Chapter 6. Bibliography[148] M. Yannakakis. The effect of a connectivity requirement on the com-plexity of maximum subgraph problems. J. ACM, 26(4):618–630, 1979.→ pages 9[149] M. Yannakakis. Computing the minimum fill-in is NP-complete. SIAMJournal on Algebraic and Discrete Methods, 2(1):77–79, 1981.→ pages16[150] W. W. Zachary. An information flow model for conflict and fission insmall groups. Journal of Anthropological Research, 33:452–473, 1977.→ pages 18, 42[151] E. Zotenko, K. S. Guimara˜es, R. Jothi, and T. M. Przytycka. Decom-position of overlapping protein complexes: A graph theoretical methodfor analyzing static and dynamic protein associations. Algorithms forMolecular Biology, 1(7), 2006. → pages 114132
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Utilizing graph classes for community detection in...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Utilizing graph classes for community detection in social and complex networks Nastos, James 2015
pdf
Page Metadata
Item Metadata
Title | Utilizing graph classes for community detection in social and complex networks |
Creator |
Nastos, James |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Social network analysis is a cross-disciplinary study of interest to mathematicians, physicists, computer scientists and sociologists. It deals with looking at large networks of interactions and extracting useful or meaningful information from them. One attribute of interest is that of identifying social communities within a network: how such a substructure should be defined is a widely-studied problem in itself. With each new definition, there is a need to study in what applications or context such a definition is appropriate, and develop algorithms and complexity results for the computation of these clusterings. This thesis studies problems related to graph clustering, motivated by the social networking problem of community detection. One main contribution of this thesis is a new definition of a specific kind of family-like community, accompanied by theoretical and computational justifications. Additional results in this thesis include proofs of hardness for the quasi-threshold editing problem and the diameter augmentation problem, as well as improved exact algorithms for cograph and quasi-threshold edge deletion and vertex deletion problems. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-04-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0074429 |
URI | http://hdl.handle.net/2429/53014 |
Degree |
Doctor of Philosophy - PhD |
Program |
Interdisciplinary Studies |
Affiliation |
Graduate Studies, College of (Okanagan) |
Degree Grantor | University of British Columbia |
Graduation Date | 2015-05 |
Campus |
UBCO |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2015_may_Nastos_James.pdf [ 2.75MB ]
- Metadata
- JSON: 24-1.0074429.json
- JSON-LD: 24-1.0074429-ld.json
- RDF/XML (Pretty): 24-1.0074429-rdf.xml
- RDF/JSON: 24-1.0074429-rdf.json
- Turtle: 24-1.0074429-turtle.txt
- N-Triples: 24-1.0074429-rdf-ntriples.txt
- Original Record: 24-1.0074429-source.json
- Full Text
- 24-1.0074429-fulltext.txt
- Citation
- 24-1.0074429.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0074429/manifest