Generalized Constraint-Based Inference by Le Chang B . E . , Tsinghua University, P.R.China, 2000 M . E . , Tsinghua University, P.R.China, 2002 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Computer Science) T H E UNIVERSITY OF BRITISH C O L U M B I A A p r i l 20, 2005 © L e Chang, 2005 Abstract ii Abstract Constraint-Based Inference (CBI) is a unified framework that subsumes many practical problems in different research communities. These problems include probabilistic inference, decision-making under uncertainty, constraint satisfaction, propositional satisfiability, decoding problems, and possibility inference. Solving them efficiently is important -for both research and practical applications. Along with the development of inference approaches for concrete C B I problems, researchers are increasingly aware that these problems share common features in representation and essentially identical inference approaches. As the first contribution of this thesis, we explicitly use the semiring concept to generalize various C B I problems into a single formal representation framework with a broader coverage of the problem space based on the synthesis of existing generalized frameworks. Second, the proposed semiring-based unified framework is also a single formal algorithmic framework that provides a broader coverage of both exact and approximate inference algorithms, including variable elimination, junction tree, and.loopy message propagation methods. Third, we discuss inference algorithm design and complexity issues based on the abstract representations of C B I problems and inference algorithms. These discussions are examples of applying the abstract knowledge to the concrete application domains. Researchers from various fields can (1) study the most important common characteristics of various C B I problems without representation barriers; (2) analyze and compare different inference approaches; (3) borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) significantly reduce the amount of implementation work target ted previously at the individual problems. Finally, we present a software toolkit named the Generalized ConstraintBased Inference Toolkit in Java (GCBIJ) as the last contribution of this thesis. G C B I J is the first concrete software toolkit that implements the abstract semiring approach to unify the C B I problem representations and the inference algorithms. Users can design their own task-specific semirings or simply use ones provided by the toolkit to solve their own concrete C B I problems through instantiating various already provided abstract inference algorithms. Users can also design their own inference algorithms on the abstract level and apply them to solve different problems. Furthermore, researchers can test, verify, compare, and analyze inference approaches based on this toolkit. The the experimental results based on G C B I J show that the generalized C B I framework is a useful tool for both research and practical problem-solving. Contents iii Contents Abstract ii Contents iii L i s t of Tables vi L i s t of Figures vii L i s t of A l g o r i t h m s viii Acknowledgements ix 1 Introduction 1.1 Motivation 1.2 Related Work 1.2.1 Generalized CBI Representations 1.2.2 Generalized Inference Algorithms 1.3 Thesis Outline 1 1 2 2 3 3 2 Background 2.1 Graphical Models for Constraint-Based Inference 2.1.1 Hypergraph Representations 2.1.2 Primal Graph Representations2.1.3 Junction Tree Representations 2.1.4 Factor Graph Representations 2.2 Topological Parameters to Characterize Graphs 2.2.1 Treewidth and Induced Width 2.2.2 Triangulated Graph and Treewidth Computing 2.2.3 Small World and Scale-Free Topology 2.3 The Basis of Abstract Algebra 2.3.1 Set, Group, and Semigroup 2.3.2 Ring, Semiring, and k-Semiring 2.4 Summary 5 5 5 6 6 7 8 8 9 10 12 13 13 16 3 A S e m i r i n g - B a s e d G e n e r a l i z e d F r a m e w o r k for C B I 3.1 A Semiring-Based Generalized Framework 3.2 Instances of the Semiring-Based Generalized Framework 18 18 20 Contents 3.3 3.2.1 Constraint Satisfaction Problems 3.2.2 Propositional Satisfiability Problems 3.2.3 Probability Inference Problems 3.2.4 Dynamic Bayesian Networks 3.2.5 Decision-Making Problems 3.2.6 Possibility Inference Problems 3.2.7 Coordination Graph 3.2.8 Maximum Likelihood Decoding Summary iv 20 22 23 25 25 26 27 28 29 4 E x a c t Inference A l g o r i t h m s 4.1 Variable Elimination Algorithm 4.1.1 Generalized Variable Elimination Algorithm 4.1.2 Generalized V E Algorithm for k-Semiring 4.1.3 Instances of Variable Elimination Algorithm 4.1.4 Space and Time Complexity 4.1.5 Discussion of Applying the V E Algorithm 4.2 Junction Tree Algorithm 4.2.1 Generalized J T Algorithm 4.2.2 Generalized J T Algorithm for fc-Semiring 4.2.3 Generalized J T Algorithm for Idempotent Semirings . . . 4.2.4 Generalized J T Algorithm for Invertible Semirings . . . . 4.2.5 Instances of Junction Tree Algorithm 4.2.6 Space and Time Complexity 4.2.7 Discussion of Applying the J T Algorithm 4.3 Summary 31 31 31 34 34 35 36 36 36 39 39 41 42 46 52 54 5 A p p r o x i m a t e Inference A l g o r i t h m s 5.1 Motivation for the Design of Approximate Algorithms 5.2 Algebraic Approximation for V E and J T 5.2.1 Approximate Variable Elimination Algorithm 5.2.2 Approximate Junction Tree Algorithm 5.2.3 Discussion • 5.3 Thin Junction Tree Algorithm 5.3.1 Thinning through Variable Contraction 5.3.2 Generalized Thin Junction Tree Algorithm . 5.3.3 Discussion 5.4 Loopy Message Propagation 5.4.1 Building a Junction Graph 5.4.2 Generalized Loopy Message Propagation Algorithm . . . 5.4.3 Loopy Message Propagation As an Exact Inference . . . . 5.4.4 Discussion 5.5 Hybrid Junction Tree Algorithm 5.6 Summary 57 57 57 58 60 60 61 62 63 64 64 65 66 67 68 68 69 Contents v 6 Generalized C B I Toolkit in Java - G C B I J 6.1 G C B I J Motivation 6.2 G C B I J System Architecture 6.3 G C B I J Implementation and Interfaces 6.3.1 Semiring 6.3.2 Parser 6.3.3 CBITask 6.3.4 Inference 6.3.5 Graph 6.3.6 Utilities 6.3.7 G C B L L M a i n . 6.4 Summary 7 G C B I J A p p l i c a t i o n s : Case Studies 83 7.1 Test Suites 83 7.2 Studies of Heuristic Triangulation Algorithms 83 7.3 Empirical Studies of Exact Inference Algorithms 86 7.3.1 Exact Inference Algorithms with Invertible Semiring . . . 86 7.3.2 Exact Inference Algorithms with Idempotent Semiring . 88 7.3.3 Exact Inference Algorithms for General C B I Problems . . 89 7.4 Empirical Studies of Approximate Inference Algorithms 89 7.4.1 Approximate V E and J T for CSPs 89 7.4.2 Approximate V E and J T for Probability Inference . . . . 92 7.5 Loopy Message Propagation As Exact Inference Algorithm . . . 95 7.6 Impact of Small World Topology . 96 7.6.1 Small World Topology of C B I problems . . , 96 7.6.2 Inference in the Small World 99 7.7 Summary 101 8 C o n c l u s i o n s and F u t u r e W o r k 8.1 Conclusions . . . 8.2 Possible Improvements to Inference Approaches 8.2.1 From Variables to Values 8.2.2 From Systematic to Stochastic 8.3 Future Work Bibliography 71 71 72 74 74 75 76 76 79 80 81 81 102 102 103 103 105 105 107 A H e u r i s t i c s Search A l g o r i t h m s for T r i a n g u l a t i o n 113 A . l Minimal Width Heuristic Triangulation 113 A.2 Minimal Induced Width Heuristic Triangulation .114 A.3 Minimal Fill-in Heuristic Triangulation . . 115 A.4 Minimal Induced Fill-in Heuristic Triangulation 116 A.5 Lexicographic B F S Heuristic Triangulation, variant Perfect . . . 117 A.6 Lexicographic B F S Heuristic Triangulation, variant Minimal . . . 118 A.7 Maximum Cardinality Heuristic Triangulation 119 List of Tables vi List of Tables 2.1 2.2 Heuristic Triangulation Algorithms Summary of Different Commutative Semirings 4.1 Computation Costs of Two Equivalent Tasks 4.2 . V E and J T Complexity Comparisons 4.3 Upper Bounds of V E and J T Running Times 4.4 Running Times for Different Operations in Java 4.5 Concrete V E and J T Algorithms in Different Fields Methods Methods Methods Methods Methods Methods Methods of of of of of of of the the the the the the the 11 16 . 6.1 6.2 6.3 6.4 6.5 6.6 6.7 Public Public Public Public Public Public Public Class Semiring Interface Ildempotent Interface IReversible . . Class Parser Class InferAlg Class ApproxInferAlg Class Cluster 7.1 7.2 7.3 7.4 7.5 7.6 7.7 The Basic Information of Test Suites Upper Bounds of Treewidth for Graphs, Different Heuristic Searches Running Times of Exact Inference Algors. with Inverses Running Times of Exact Inference Algors. with Idempotency . . Running Times of General Exact Inference Algorithms The Maximum Number of Satisfiable Constraints for a C S P . . . Parameters of the Small World Topology for Graphs 8.1 Detecting Elements of Some Commutative Semirings 31 53 54 54 56 74 75 75 76 77 79 82 84 85 88 89 89 90 98 105 List of Figures vii List of Figures 2.1 2.2 2.3 Graphical Representations of an Example C B I Problem Axioms in the Definitions of Group and Related Concepts Axioms in the Definitions of Ring and Related Concepts 5.1 Graphical Interpretation of Approximate V E 59 5.2 Graphical Interpretation of Approximate J T 60 6.1 System Architecture of G C B I J 73 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 Upper Bounds of Treewidth, Based on Various Heuristics . . . . 86 Bayesian Network of the Diagnosis Problem 87 Accuracy and Running Time of the Approx. V E for a CSP . . . 91 Accuracy and Running Time of the Approx. J T for a C S P . . . 92 Bayesian Network of the Insurance Problem 93 Marginal,of the Approx. V E for a Probability Inference Problem 94 Marginal of the Approx. J T for a Probability Inference Problem 95 Accuracy and Time Complexity of loopy M P for a CSP . . . . . 96 Small World Parameter Distributions for the Pigs Network . . . 97 Small World Parameter Distributions for the Link Network . . . 99 Small World Parameters for Different Re-written Probabilities . . 100 Time Complexity of loopy M P in the Small World 100 ... 7 14 15 List of Algorithms viii List of A l g o r i t h m s 2.1 2.2 2.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 A.l A.2 A.3 A.4 A.5 A.6 A.7 Triangulating a Graph, Given an arbitrary Ordering ( R A N D ) . . Small World Modelling [59] '.. Scale-Free Modelling Generalized Variable Elimination Algorithm ( G V E ) Generalized Bucket Elimination Algorithm ( G B E ) Generalized V E Algorithm for k-Semiring ( k G V E ) . Generalized J T Algorithm ( G J T ) . Generalized J T Algorithm for k-Semiring ( k G J T ) . Generalized J T Algorithm for Idempotent Semirings (GJT-Idemp) 10 12 12 33 33 34 38 40 41 Generalized J T Algorithm for Invertible Semirings (GJT-Inv) . . 43 Generalized J T Algorithm for LS Architecture ( G J T - L S ) 44 Generalized J T Algorithm for H U G I N Architecture ( G J T - H U G I N ) 45 The Approximate Marginal of Combination Algorithm (ApproxMC) 58 Generalized Approximate V E Algorithm (GVE-Approx) 59 Generalized Approximate J T Algorithm (GJT-Approx) 61 Variable Contraction (Contract(v, C\)) 62 Generalized T h i n Junction Tree Algorithm ( G T h i n J T ) 63 Building a Junction Graph for a C B I Problem 65 Generalized Loopy Message Propagation Algorithm (GLoopy) . . 66 Generalized Loopy Message Propagation Algorithm with Inverses (GLoopy-Inv) 67 Hybrid Junction Tree Algorithm (HybridJT) 70 Minimal W i d t h Heuristic Triangulation ( M W ) [47] 113 Minimal Induced W i d t h Heuristic Triangulation (MIW) [37] . . 114 Minimal Fill-in Heuristic Triangulation ( M F ) [47] 115 Minimal Induced Fill-in Heuristic Triangulation (MIF) [37] . . '. 116 Lexicographic B F S Heuristic Triangulation, variant Perfect ( L E X P ) [51] . . . . . . . . 117 Lexicographic B F S Heuristic Triangulation, variant Minimal ( L E X M ) [51] 118 Maximum Cardinality Heuristic Triangulation ( M C ) [56] 119 A ckn owledgem en ts Acknowledgements I would like to thank my supervisor, Professor Alan Mackworth, for his guidance and support in this work. I would not be here without his inspiration and encouragement. Many thanks also to Professor David Poole and Professor Kevin Leyton-Brown for their valuable comments. I would also like to thank Ben D'Andrea for his hard work on my thesis proof-reading. Especially, I would like to give my special thanks to my wife Ming Yan whose patient love enabled me to complete this work. Chapter 1. Introduction 1 Chapter 1 Introduction 1.1 Motivation Constraint-Based Inference (CBI) is a general term covering many different problems in several research communities. It consists of discovering new constraints from a set of given constraints over individual entities. These new constraints reveal previously undiscovered properties of those entities. Practical problems from many different fields can be seen as constraint-based inference problems, including probabilistic inference, decision-making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference. A constraint here can be seen as a function that maps possible value assignments to a specific value domain. Along with the development of inference approaches for concrete C B I problems, researchers are increasingly aware that these problems share common abstract representation features. Many inference algorithms, described differently, implicitly have essentially identical ideas underlying them. Understanding the common features and characteristics of C B I representations helps research communities exchange ideas and design more efficient inference algorithms. The purpose of this thesis is to use the algebraic semiring concept to generalize a wide range of C B I problems and different exact and approximate inference algorithms into a single formal framework based on the synthesis of existing generalized representation and algorithmic framework from different fields. We aim to analyze the common characteristics of inference algorithms based on the abstract problem representation, then apply the abstract knowledge learnt from the unified framework to improve the concrete inference algorithm design in specific application domains. To demonstrate the representation power of the proposed semiring-based unified framework, a toolkit named Generalized Constraint-Based Inference. Toolkit in Java (GCBIJ) is implemented. It can be used to verify, test, compare, and analyze generalized approaches for C B I problems. The flexible architecture of G C B I J enables implemented generalized inference algorithms to be applied to solve concrete problems simply through specifying different semirings. G C B I J ' s extensibility enables users to design their own task-specific semirings and use available generalized inference algorithms. This not only demonstrates the feasibility of using semirings to unify the C B I problem representations and the various inference algorithms, but also shows that G C B I J is a suitable platform for both research and practical problem-solving. Chapter 1. 1.2 Introduction 2 Related Work Generalized representation and inference algorithms for C B I problems have been studied for the past ten years. A l l of these studies are based on the following observation: there are two essential operations in constraint-based inference: (1) c o m b i n a t i o n , which corresponds to an aggregation of constraints, and (2) m a r g i n a l i z a t i o n , which corresponds to the projection of a specified constraint onto a narrower domain. Different generalized representations express these two operations through various tools or notions. Given the distributivity property of the two operations, generalized inference algorithms can be abstracted that exploit that property. The semiring-based unified framework proposed in this thesis is developed based on the synthesis of these generalized representation and algorithmic frameworks. 1.2.1 Generalized C B I Representations Semiring-based C S P (SCSP) [6] is a generalized representation framework for soft constraint programming, which includes max CSPs, fuzzy CSPs, weighted CSPs, and probabilistic CSPs. It generalizes these soft constraint programming problems, as well as classic CSPs, into a single abstract framework. The csemiring, a commutative semiring with the additive idempotency property, is the key notion in SCSP that represents both the combination and marginalization operations. The additive idempotency property excludes general C B I problems like probabilistic inference from the SCSP framework. Given that the C S P is an instance of the generalized C B I problem, the SCSP framework inspired us to propose the semiring-based unified framework in this thesis. In addition to the generalized representation of hard and soft CSPs, SCSP also generalizes the soft local consistency approaches. Extended from local consistency techniques for classic CSPs, generalized soft local consistency can be instantiated to solve a wide range of soft constraint programming problems. The success of the SCSP framework [5] demonstrates that a generalized problem representation provides opportunities to migrate an existing approach for one problem to solve another problem, if the two problems have the same abstract representation. That is our motivation for proposing a semiring-based unified framework to represent CBI problems from other disciplines as well as constraint programming. The valuation algebra framework [35] is an abstract framework for describing probabilistic inference problems in expert systems. It is inspired by the formulation of simple axioms that solve global problems through local computing. Knowledge or information in the framework is called valuation. Inferences are made by computing the marginal of a combination of several valuations. The framework of the valuation algebra is abstracted to include many different formalisms. Bayesian networks, Dempster-Shafer belief function models, Spohn's epistemic belief models, and possibility theory are shown [35] to fit into the valuation algebra framework. The authors also suggested that constraint satisfaction problems, propositional logic, and discrete optimization problems could be expressed in the valuation algebra framework. Chapter 1. 1.2.2 Introduction 3 Generalized Inference Algorithms Many algorithms have been proposed for probabilistic inference. The variable elimination algorithm [64] and the junction tree algorithms [55] [39] [29] were among the first inference algorithms for probability inference in Bayesian networks. Originating in dynamic programming, the variable elimination and junction tree algorithms explicitly use the distributivity of arithmetic additive and multiplicative operations. The generalization of the bucket elimination algorithm [14], a variant of variable elimination, can be used in belief assessment, most probable explanation, maximum a posteriori hypothesis, and maximum expected utility problems. The bucket elimination approach was also applied in constraint satisfaction problems. It was proved that the generalized bucket elimination algorithm is applicable to both probabilistic and deterministic inference [15]. The Generalized Distributive Law (GDL) [1] is a general message-passing algorithm, a synthesis of work in the information theory, signal processing, statistics, and artificial intelligence communities. G D L generalizes the computational problems as "Marginalize a Product Function" (MPF) problems through using commutative semirings to represent the combination and marginalization operations. G D L can be seen, in the A l community, as a junction tree algorithm. As a semiring-based generalized inference algorithm, G D L can represent a wide range of inference algorithms. More recently,'a unified algorithmic framework was proposed [32] to represent tree-decomposition algorithms as a generalized bucket-tree elimination algorithm for automated reasoning tasks. Different from G D L , the generalized bucket-tree elimination does not rely on semirings. Abstract concepts are used to represent combination and marginalization operations, though the distributivity is also the key idea behind the algorithm. Using those concepts, the message-passing within the tree structure can be abstractly represented. The authors also suggested using their generalized algorithmic framework for constraint satisfaction problems. 1.3 Thesis Outline This thesis is organized as follows: related definitions and background for C B I problems, graphical models, and abstract algebra are introduced in Chapter 2. In Chapter 3 we present a semiring-based unified framework for C B I problems. We also show that a series of problems from different fields can be seen as instances of our proposed framework. In Chapter 4, two popular exact inference algorithms, variable elimination algorithm and junction tree algorithm, are incorporated into the semiring-based unified framework. Instances of these algorithms are discussed. Theoretical and empirical complexities of these algorithms are also discussed. It is well known that complexities of exact inference algorithms are exponential in the parameters of their underling graphical representations. For many practical C B I problems with intractable parameters, we Chapter 1. Introduction 4 will resort to approximate techniques. We discuss approximate inference algorithms within our abstract framework in Chapter 5. To put the ideas of the semiring-based unified framework into concrete form, a Generalized ConstraintBased Inference Toolkit in Java (GCBIJ) is designed and implemented. We present the design specification of G C B I J in Chapter 6. We also discuss the use and limitations of G C B I J in this chapter. In Chapter 7, a series of experiments are conducted based on G C B I J platform, which include problems from probability inference, CSP, SAT, and MaxSAT. Although our experimental results are not totally new for research communities, as a whole they verify the feasibility of using the semiring to generalize the representations of various C B I problems. Generalized exact and approximate inference algorithms, based on the proposed semiring-based unified framework, are proven suitable for solving concrete applications. Finally, conclusions and future work appear in Chapter 8. Chapter 2. Background 5 Chapter 2 Background 2.1 Graphical Models for Constraint-Based Inference Generally speaking, inference means discovering previously unknown facts from a set of given facts (or knowledge). In an inference problem, knowledge can be represented by a set of (hard or soft) constraints on individual entities. Usually individual entities are represented by variables. In this thesis, a constraint-based inference problem is characterized in terms of a set of variables with values in a finite d o m a i n and a set of constraints on these variables. The inference task for a C B I problem is then to make explicit new constraints over some variables, based on the given constraints. Here the new constraints could specify the property of one variable (unary), several variables, or the whole problem. A n example of C B I problem is given in Example 2.1. The formal definitions of a CBI problem and related tasks appear in Chapter 3. E x a m p l e 2.1 Consider a constraint-based inference problem with 5 variables Vi,---,Vs, Vi € {0,1}. There are 3 constraints defined over these variables: fi(Vi, V , V ), / 2 ( V , V , Vi) and fo(Vz, V ) , which specify the set of tuples permitted by these constraints respectively. An inference task in this example is to discover tuples permitted by the derived constraint over V2 and V3. 2 3 2 3 5 In the following chapters of this thesis, we use uppercase letters to denote variables and lowercase letters to denote values in the domain of a variable. Constraints over variables, usually represented as functions, are denoted by lowercase letters as well. Given a function / , Scope(f) denotes the subset of variables that appear in the domain of / : For convenience, we also use uppercase letters to denote subsets of variables. The meaning of uppercase letters can be easily distinguished given the context. 2.1.1 H y p e r g r a p h Representations In many cases, we can use graphs to represent C B I problems. The most straightforward graphical representation is the hypergraph representation, where nodes represent variables and hyperarcs represent constraints. Given a C B I problem as described in Example 2.1, the corresponding hypergraph representation is shown in Figure 2.1(a). 6 Chapter 2. Background 2.1.2 Primal Graph Representations Although the hyper-graph representation is straightforward, it is hard to represent practical C B I problems using hyperarcs in graphical representations. In a primal graph the hyperarcs in a hypergraph are replaced by cliques. The primal graph representation of a C B I problem is an undirected graph Q = (V, £), where V = {Vi, • • •, V } is a set of vertices, each vertex Vj corresponding to a variable Xi of the C B I problem; and £ = {(Vi, Vj)\Vi, Vj G V} is a set of edges between Vi and Vj. There exists an edge (Vi, Vj) if and only if corresponding variables Xi and Xj appear in the scope of the same constraint. A moralized graph of the Bayesian Network (BN), which is obtained by adding edges among vertices with the common child vertex in the corresponding B N , is an example of a primal graph representation of probabilistic inference problems. A constraint graph of binary CSP is another example of a primal graph representation. In this thesis, we use Neighbors(Vi) to denote the set of vertices that have edges to Vj and Family{Vi) = Neighbors(Vi) U {Vi}. Given a C B I problem as described in Example 2.1, the corresponding primal graph is shown in Figure 2.1(b). n 2.1.3 Junction Tree Representations Sometimes the primal graph of a C B I problem is re-organized as a secondary structure to achieve better computational efficiency. The junction tree is a widely used secondary structure in graphical models. A junction tree is an undirected graph T = (C,S). C = { C i , • • •, C „ } is a set of clusters, where each cluster d corresponds to an aggregation of a subset of vertices Vd Q V in the primal graph Q = (V, £). S = {Sij, •, Si } is a set of separators between clusters, where Sij is the separator of clusters Ci and Cj, corresponding to the vertices of Vc fl Vc • In addition, the following junction tree properties have to be satisfied: m t t 1. Singly connected property: T = (C,S) is a tree; 2. Running intersection property: V C j , C j £ C , Vc D VCJ Q Vc any cluster Ck on the path between Cj and Cj\ t k holds for 3. Constraint allocation property: For any constraint / of the C B I problem, 3d € C s.t. Scope(f) C d. Typically a junction tree is undirected. In some computational schemes, we can pick one cluster as the root of the tree and assign directions to all separators. A separator Sij = (Ci,Cj) has a direction from Ci to Cj if Ci is in the path from the root to Cj. For each cluster C j , Parent(Ci) denotes the cluster that points to d; Children(d) denotes the set of clusters that Cj points to. Also, we use Root(T) to denote the root of junction tree T . Given a C B I problem as described in Example 2.1, a corresponding junction tree representation is shown in Figure 2.1(c). Chapter 2. Background (c) 7 (d) Figure 2.1: Graphical Representations of an Example C B I Problem, (a) Hypergraph (b) Primal graph (c) Junction tree (d) Factor graph 2.1.4 Factor G r a p h Representations The factor graph [38] is another graphical representation for C B I problems, widely used in information theory research. A factor graph is a bipartite graph that expresses the structure of the factorization. A factor graph has a variable node for each variable, a factor node for each constraint, and an edge connecting a variable node to a factor node if and only if the variable is in the scope of the constraint. Given a C B I problem as described in Example 2.1, the corresponding factor graph representation is shown in Figure 2.1(d). Chapter 2. Background 2.2 8 Topological Parameters to Characterize Graphs In the following chapters we will find that the success of different inference algorithms for the inference task for a C B I problem largely depends on the characteristics of its corresponding graphical representation. In this section, related topological parameters to characterize a graph are introduced. 2.2.1 Treewidth and Induced Width Treewidth, or equivalently, induced width [22], has been proved to be an important notion in computational complexity theory. Many AfV-h&rd graph problems, such as the maximum independent set problem and the Hamiltonian cycle problem, can be solved in polynomial time for graphs with treewidth bounded by a constant. We will show in the following chapters that the time and space complexities of generalized inference algorithms for C B I problems are polynomial in the size of graphical representations with bounded treewidth and have a constant factor which is exponential in the treewidth of primal graphs. Induced Width Definition 2.2 (Induced Width) Given an undirected graph Q = (V, £) and an ordering <r = (V\, - • • ,V ) of vertices in V, the ordered graph is a pair {G,cr). The width of a vertex in an ordered graph (Q, a) is the number of its lower-indexed neighbors in a. The width of an ordered graph, denoted by w(a), is the maximal width of all vertices. The induced graph of an ordered graph (G,o~), denoted by {Q*,a), is obtained by processing the vertices in V recursively, from the vertex with the largest index to the vertex with the smallest index; when vertex Vi is processed, all its lower-indexed neighbors are connected to each other. The induced width w*(a) of an ordered graph (G,a) is the width of the induced graph {Q*,o). • The induced width of a graph Q, denoted by w*(Q), is the minimum induced width for all possible orderings of Q. n According to the definition, computing the induced width of a graph Q is to find an ordering a with a minimum induced width of the ordered graph {Q,cr). In this thesis, we also call such an ordering an elimination ordering, because it corresponds to eliminating a vertex after connecting all its lower-indexed neighbors. Treewidth Treewidth [50] is the measure for the study of tree decomposition in the context of graph minor research, it is proved to be an equivalent concept of induced width [22]. 9 Chapter 2. Background Definition 2.3 (Treewidth, Robertson and SeymourfSO]) A tree decomposition of graph Q = ( V , £) is a pair ( T , X), where T = ( C , S) is a tree with vertex set C and edge set S, and X = {X ,c € C} is a family of subsets of V, one for each vertex of T, such that c 1- U c 6 C * c = V , 2. for every edge (v^, vj) € £, there is a c 6 C with V{ '€ X c and Vj € X , and c 3. for all i, j, k S C, if k is on the path from i to j in T, then Xi D Xj C X^ • The width of a tree decomposition is max c\X \ cS c — 1. The treewidth of a graph Q = (V,£), denoted by w*{Q), is the minimum width over all possible tree decompositions of Q. Once these definitions are compared, it is obvious that the junction tree representation is a tree decomposition of the primal graph of a given C B I problem. 2.2.2 Triangulated G r a p h and Treewidth Computing A graph is triangulated if every cycle of length at least four contains a chord, that is, two non-consecutive vertices on the cycle are adjacent. Triangulated graphs are also called chordal graphs due to the existence of a chord in every cycle. The triangulated graph of graph Q = (V, £) is obtained by adding edges St to Q such that Q = (V, £ U £ ) is a triangulated graph. There exists an equivalent relation between treewidth of a graph and maximum clique size in its triangulated graph [8]: t t Theorem 2.4 (Bodlaender [8]) Let Q = (V, £) be a graph, and let k > 0. The following statements are equivalent: 1. The treewidth of Q is at most k. 2. The maximum clique size of an triangulated graph of Q is at least k + 1. Corollary 2.5 Computing the treewidth of graph Q is equivalent to finding an optimal triangulated graph of Q, with minimum max-clique size. Computing the treewidth or the induced width of a graph, as well as finding an optimal triangulated graph, is Af'P-hard [2] in general. For fixed k, there exists some linear time algorithms [7] to determine whether a graph has treewidth k. However, these algorithms are very limited because the running time contains a large constant that is exponential in k. There also exist some approximate algorithms with factors 4, 4~ and 0(log(n)) [21] which can be calculated in polynomial time, even though they contain a large constant which is exponential in k as well. The maximum clique size (minus one) of any optimal triangulated graph of Q provides an upper bound of the treewidth of Q. Although having no guarantee of Chapter 2. Background 10 Algorithm 2.1 Triangulating a Graph, Given an arbitrary Ordering (RAND) Input: A connected graph Q — (V, £), an ordering a = (v\, • • • ,v ) Output: Triangulated graph Q = (V,£ U £ ) with w*(5) < w*(£t) = w 1: for each v G V do 2: M ( D ) : = Neighbor(v) 3: end for 4: £ := 0, w := 0 5: for i = 1 to i = |V| do 6: if \M(vi)\ > w then n t t t 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: W := |M(v;)| end if for each x,y £ M(vi) and (x, y) ^ £ U £ do . £ := £ U [x,y) M{x) M(x) Uy and M{y) := M{y)Ux end for for each u G M ( V J ) do M(u) := M(u) \Vi end for end for t t t returning an optimal triangulation, we can use heuristic triangulation algorithms (which are linear and do not contain exponentially large constans) to get the upper bounds of the treewidth. Given an arbitrary elimination ordering a = (v\, • • •, v ), a graph Q = (V, £) can be triangulated according to the procedure in Algorithm 2.1 [47]: Appendix A lists several heuristic triangulation algorithms. Basically they use various heuristics to find an elimination ordering, then triangulate the graph according to Algorithm 2.1. The time complexities of these algorithms are compared in Table 2.1. n = |V| is the number of vertices and m' = |£| + |£ | is the number of edges in the triangulated graph. Chapter 7 presents empirical comparison of different heuristic triangulation algorithms. Details of heuristic triangulation algorithms and their empirical evaluations can be found in [33], n t 2.2.3 Small World and Scale-Free Topology Small world topology [59] exists widely in graphical representations of many real world problems. In a graph with small world topology, vertices are highly clustered yet the path lengths between them are small. By comparison, random graphs with a similar number of vertices and edges have short path lengths but little clustering, while regular graphs like lattices tend to have high clustering but large path lengths. Watts and Strogatz [59] shows that small world topologies exist widely in social graphs (e.g., collaboration graph of actors in feature files), biological graphs (e.g., the neural network of the nematode worm C.elegans), and manmade graphs (e.g., the electrical power grid of the western United States). A 11 Chapter 2. Background Alg. 2.1 A.l A.2 A.3 A.4 A.5 A.6 A.7 Table 2.1: Heuristic Triangulation Algorithms Heuristic Short Random Ordering [47] Minimal Width [47] Minimal Induced Width [37] Minimal Fill-in [47] Minimal Induced Fill-in [37] Lexicographic B F S , var. Perfect [51] Lexicographic B F S , var. Minimal [51] Maximum Cardinality [56] RAND MW MIW MF MIF LEXP LEXM MC Time 0(n + m') 0(n + m') 0(n + m') O(nm') 0(m' ) 0 ( n + m') O(nm') 0(n+m') 2 small world topology may have a significant impact on the behavior of dynamical systems. Walsh [58] shows that the cost of solving search problems with small world topologies has a heavy-tailed distribution. In this thesis, we are interested in how the small world topology affects the cost of inference in different C B I problems. Given an undirected graph Q = (V, £), the parameters to characterize the small world topology include: • L: The averaged path length over all pairs of vertices; • L d'- The averaged path length over all pairs of vertices in a random graph Q d = ( V ^ £ ) w i t h |V| = | V | and |£| = | £ „ d | ; ran r o n ran r o n d r o n d ra • C: The averaged clustering coefficient of all vertices, where the clustering coefficient of a vertex is defined as the fraction of the number of existing 'edges among the neighbors of the vertex, and maximum possible edges among them (a vertex with k neighbors will at most have k(k — l ) / 2 edges among its neighbors); • C and'- The averaged clustering coefficient of all vertices in a random graph r Grand = ( V r a n d, £ • (j,: ratio oiC/L r a n d ) w i t h |V| = | V normalized by r a n d | and |£| = |£ nd|;. r 0 C dlL dran ran Watts and Strogatz [59] propose a procedure to generate graphs with small world topologies. It can be described as Algorithm 2.2. Chapter 7 presents empirical results of small world topology parameters of different C B I problems and their impacts on inference algorithms. Another typical real world topology of the graphical representation of C B I problems is the scale-free network [3], where the degree distribution has a powerlaw form. The distribution is not related to the scale of the network. In other words, scale-free networks have a large number of vertices with a few connecting edges and a small number of vertices with many connecting edges. It is shown [3] that scale-free networks exists widely in the real world. The Internet, W W W , and metabolic networks are all examples of scale-free networks. Chapter 2. Background 12 Algorithm 2.2 Small World Modelling [59] Input: Number of vertices n, average degree k, rewritten probability p Output: A graph with Small World topology 1: Generate vertices V i , • • •, V ; 2: for i = 1 to n do 3: for j = 1 to k do 4: Uniformly generate a random number p 6 [0,1); n 0 5: if po < V then 6: k := i + -V • \j/2]{modn) 1 7: else 8: 9: 10: 11: 12: Uniformly generate a random number k £ {1, • • •, n) \ {i} end if Add an edge between Vi and Vk\ end for end for Algorithm 2.3 generates a graph with scale-free topology. This algorithm is based on the definition of scale-free topology in [3], Empirical results in Chapter 7 show that many scale-free networks are also in small world topology. In other words, scale-free networks usually exhibit high clustered vertices and small path lengths. In contrary, a graph with the small world topology does.not necessarily have a power-law form degree distribution. For example, the degree distribution of a small world graph generated by Algorithm 2.3 is uniform. 2.3 The Basis of Abstract Algebra As mentioned in Chapter 1, there are two essential operations in real world CBI problems: (1) combination, which corresponds to an aggregation of conAlgorithm 2.3 Scale-Free Modelling Input: Number of vertices n, number of edges m Output: A graph with Scale-Free topology i: Generate vertices V i , • • •, V ; 2: Let di be degree of the vertex Vi 3: for i — 2 to n do 4: Connect Vj to Vj according probability pj = dj/J2k dk, k = {I, - • • n 5: end for 6: for i = l to m — n + 1 do 7: Uniformly choose i € {1, • • •, n} 8: Connect Vj to Vj according probability pj = dj/J2k^k, W; 9: end for k = {1, •••,«•} \ Chapter 2. Background 13 straints, and (2) marginalization, which corresponds to focusing of a specified constraint to a narrow domain. These two operations provide possibilities of using algebraic structures to generalize existing inference algorithms from different fields. Algebraic structures are also useful tools for exploring the commonality across algorithms. More specifically, the proposed unified framework for C B I problems is based on the semiring, an important notion in abstract algebra. This section introduces basic algebraic structures and related variants. 2.3.1 Set, Group, and Semigroup Definition 2.6 (Set) A set is defined as an arbitrary collection of objects or elements. There are no predefined operations between elements in a set. The number of elements in the set is referred to as the cardinality of the set. Definition 2.7 (Group) Let G be a set and • be a binary operation defined on G. (G, •) is a group if the operation satisfies the following four conditions: • Closure: a • b = c G G, Va, b G G; • Associativity: (a • b) • c = a • (b • c), Va, o, c G G; • Identity: 31 G G s.t. a • 1 = 1 • a = a, Va G G; • Inverses: Va 6 G, 3 a - 1 € G, s.t. a • a - 1 =1 A group (G, •) is said to be commutative, or Abelian, if the operation • satisfies one more condition: • Commutativity: a • b = b • a, Va, b € G ; A semigroup is a group that does not need an identity element and whose elements do not need inverses within the semigroup. Only closure and associativity hold for a semigroup. A semigroup with an identity is called a monoid. A semigroup is commutative if the associated binary operation is commutative. Figure 2.2 shows relations among axioms in group and related notions. 2.3.2 Ring, Semiring, and k-Semiring Definition 2.8 (Ring) Let R be a set. Let © and <g> be two closed binary operations defined on R. Here we assume operation ® has precedence over operation ©. (R, ©, <g>) is a ring if the operations satisfy the following axioms: • Additive associativity: Va, b,c € R, (a © b) © c = a © (b © c); • Additive commutativity: Va, b G R, a © b = b © a; • Additive Identity: 30 G R, s.t. Va G R, a © 0 = 0 © a = a; • Additive Inverses: Va G R, 3—a G R, s.t. a © — a = —a © a = 0; • Multiplicative associativity: Va, b,c G R, (a ® b) ® c = a ® (b <S> c); 14 Chapter 2. Background Commutative G r o u p Commutative Monoid Commutative Semigroup <S#AssoGiatiyity>«;g) Semigroup I Monoid >-'- % i : : A ! t "-^' • \ J Group Figure 2.2: Axioms in the Definitions of Group and Related Concepts • Left and right distributivity: Va, 6, c 6 R, a ® (6 © c) — a ® 6 © a (8> c and (&ffic)<g>a = 6 ® a © c ® a . Therefore, a ring is a commutative group under addition and a semigroup under multiplication. 0 is called the additive identity element (of operation ©), and 1 is called the multiplicative identity element (of operation ®). D e f i n i t i o n 2.9 (Semiring) Let S be a set. Let © and <g> be two closed binary operations defined on S. (5,©,®) is a s e m i r i n g if the operations satisfy the following axioms: • Additive associativity: Va, b,c € S, (a © 6) © c = a © (b © c); • Additive commutativity: Va, b £ S, a © b = b © a ; • Multiplicative, associativity: Va, 6, c € 5, ( a ® 6 ) ® c = fl®(!)®c); • Le/t and ria/i£ distributivity: Va, i , c e S , a ® ( i i ® c ) = o 0 i ) $ a ® c and ( b f f i c ) ® a = &<g>a©c®a. A semiring (5,ffi,®) is a ring that need not have either additive (©) inverses or an additive identity element. In other words, a semiring is a commutative semigroup under addition and a semigroup under multiplication. Chapter 2. Background 15 Commutative Semiring { Semiring d2 ... ! s L e f t Ring a n d Right'Distributivrty, ^Inverses.. Addition - Multiplication v Figure 2.3: Axioms in the Definitions of Ring, Semiring, and Commutative Semiring Definition 2.10 (Commutative Semiring) A commutative semiring is a semiring that satisfies the following additional conditions: • Multiplicative commutativity: Va, b G S, a ® b = b ® a; • Multiplicative identity: there exists a multiplicative identity element 1 G S, such that a ® l = l ® a = a for any a G S; • Additive identity: there exists an additive identity element 0 G S, such that a © 0 = 0 © a = a for any a G S; A commutative semiring (5,©,®) is a commutative monoid under both addition and multiplication. Figure 2.3 shows the relations among axioms of ring, semiring, and communicative semiring. Here we definefc-semiringas a generalization of semiring with a ./-semiring corresponding to the definition of a semiring. Chapter 2. No. 1 2 3 4 5 6 7 8 Background 16 Table 2.2: Summary of Different Commutative Semirings Add. Identity Multi. Identity R © ® {true, false} V false A true max min 1 0 [0,1] X 1 max 0 [0,1] max (-co, 0] —oo 0 + max 0 0 [0,oo) + X 1 0 [0,oo) + X 1 [0, oo) max 0 (—oo, oo) min oo 0 + D e f i n i t i o n 2.11 ( ( C o m m u t a t i v e ) fc-Semiring) A (commutative) k-semiring is a tuple (S, opo,opi, • • •,opk) such that for each i 6 {1, • • •, k}, (S, opj,opi) is a (commutative) semiring for every j < i. The definition of (commutative) fc-semiring is sometimes too strong in practice. For example, we may need only (S, opi-\,opi) to satisfy the (commutative) semiring properties. We do not necessarily need (S,opj,opi) to be a (commutative) semiring for V? < i always, although the strong definition provides computational flexibility in algorithm designs. ([0, oo),+, x) and ({0,1}, V,A) are all examples of commutative semirings. Table 2.2 shows further examples of commutative semirings. ([0, oo), max, +, x) is an example of a commutative ^-semiring. Furthermore, we say that 0 is a multiplicative absorbing element if a <8> 0 = 0 ® a = 0, and 1 is an additive absorbing element if a © 1 = 1 © a = 1, for any a £ S. We say that © is idempotent if a © a = a, and <8> is idempotent if a (g> a = a for any a € S. We notice that the idempotency of © defines a partial ordering < over the set S, i.e., a < b a © b = b for any a,b € S. Semiring and fc-semiring are enough to generalize the representation of a wide range of C B I problems. However, to characterize inference algorithms for these problems, we will use commutative semiring and commutative ^-semiring in the following chapters. The commutativity and identity elements provide many computational benefits for solving these problems. 2.4 Summary This chapter introduced the basic definitions and notions used in this thesis. Constraint-based inference problems can be graphically represented as primal graphs. Primal graphs can be reorganized as secondary structures, to achieve computational efficiency. The success of inference algorithms for specific problems is largely decided by the parameters of the underlying graphs. One of these graphical parameters is treewidth, or induced width. Computing treewidth, or finding the optimal triangulated graph, is known as AfV-hard [2]. Though some polynomial time Chapter 2. Background 17 approximation algorithms have been designed to compute treewidth with constant factors, they are not practical since the time complexities always contain some constant that is exponentially large. For practical scenarios, heuristic searches are used to estimate the upper bound of treewidth. In this chapter we briefly discussed several heuristic triangulation algorithms. Two other important notions to characterize the graphical representation of C B I problems are the small world topology and scale-free networks. This chapter also defined and explained the parameters used to characterize the small world topology and scale-free networks. As powerful tools to generalize C B I problems, this chapter introduced algebraic structures, and more specifically, semirings and commutative semirings. Basic definitions and axioms of abstract algebra were given. We also discussed the relations among these definitions. 18 Chapter 3. A Semiring-Based Generalized Framework for CBI Chapter 3 A Semiring-Based G e n e r a l i z e d F r a m e w o r k for C o n s t r a i n t - B a s e d Inference 3.1 A Semiring-Based Generalized Framework A constraint-based inference problem is defined in terms of a set of variables with values in finite domains and a set of constraints on these variables. We use commutative semirings to unify the representation of constraint-based inference problems from various disciplines. Formally: Definition 3.1 (Constraint-Based Inference (CBI) Problem) A constraint-based inference (CBI) problem P is a tuple (X, D, R, F) where: • X = {X\, • • • ,X } n is a set of variables; • D = {£>!, • • •, D } is a collection of finite domains, one for each variable; n • R = (S,ffi,® ) is a commutative semiring; • F = {fi, • • •, fr) is a set of constraints. Each constraint is a function that maps value assignments of a subset of variables, its scope, to values in S. Before defining various tasks for C B I problems, we define the two basic constraint operations as follows, using the two binary operation ® and © of the commutative semiring R. Please note that we use the same two symbols <g> and © to represent the two constraint level operations. The meaning of them can be easily distinguished given the context. Definition 3.2 (The Combination of Two Constraints) The combination of two constraints f\ and fa is a new constraint g = f\® fa, where Scope(g) — Scope(fa) U Scope(fa) and g(w) = / i ( w (/ )) ® /2(w e ( / ) ) for every value assignment w of variables in the scope of the constraint g. 1 S c o p e 1 i 5 c o p 2 Definition 3.3 (The Marginalization of a Constraint) The marginalization of X{ from a constraint f, where Xi € Scope(f), is a new constraint g = @ . /, where Scope(g) = Scope(f)\Xi andg{w) = ® / ( x f , w ) for every value assignment w of variables in the scope of the constraint g. x X i € D o m a i n { X i ) Chapter 3. A Semiring-Based Generalized Framework for CBI 19 Definition 3.4 (The Inference Task for a CBI problem) Given a variable subset of interest Z C X, let Y = X \ Z, then the inference task for a CBI problem P = (X, D, R, F) is defined as computing: 5 c B / ( ^ ) = 0(g)/ (3.1) feF Y Given a C B I problem P = (X,D,R,F), the assignment task for a C B I problem. if © is idempotent, we can define Definition 3.5 (The Assignment Task for a CBI problem) Given a variable subset of interest Z C X, letY = X\Z, the assignment task for a CBI problem P = (X, D, R, F) is to find a value assignment for the marginalized variables Y, which gives the result of the corresponding inference task gcBi{Z). Formally, a solution to the assignment task is: y = arg0(g)/ (3.2) feF Y where arg is a prefix of operation ©. In other words, arg ffi /(x) returns x$ s.t. fix®) = ®xf(x)x If a total ordering of S exists, we can define the optimization task for a C B I problem as maximizing (or minimizing) the computed result of the corresponding C B I inference task by finding a value assignment to variables Z. Formally Definition 3.6 (The Optimization Task for a CBI problem) Given a variable subset of interest Z C X, letY = X\Z and R = (S, max, ©, ®) be a commutative 2-semiring, then the optimization task for a CBI problem P = (X, D, R, F) is defined as computing: goPT = max I 0 (g) Vy f^F / I (3.3) ) - The assignment task for an optimization task is then to compute: 0(g)/ z = arg max \ Y feF (3.4) J In general, ® is a combination operation for C B I problems that combines a set of constraints into a constraint with a larger scope; © y = @x\z is a marginalization operation that projects a constraint over the scope X onto its subset Z, through enumerating all possible value assignments of Y = X \ Z. As discussed in Chapter 2, a C B I problem can be graphically represented by a primal graph, which is an undirected graph with variables as its vertices, Chapter 3. A Semiring-Based Generalized Framework for CBI 20 and there exists an edge between two vertices if the two corresponding variables appear in the scope of a same constraint. Also it can be re-organized as a junction tree or a factor graph to achieve computational benefits. Practically, sometimes there are observations of, or evidence about, the values of some variables. In such cases, constraints with observed variables in their scopes can be modified through instantiation before performing the inference. Although there are many observation-handling techniques to accelerate the inference process, we will not incorporate them into our framework here. Another issue concerning practical C B I tasks is the normalization after the computation. In this thesis, we omit normalization constants since they are simple to define and compute in specific application scenarios. 3.2 Instances of the Semiring-Based Generalized Framework Many C B I problems from different disciplines can be embedded into the proposed generalized C B I framework. In this section we will briefly introduce these problems as instances of the framework. 3.2.1 Constraint Satisfaction Problems A constraint satisfaction problem (CSP) is denned over a set of variables X = {Xi, • • •, X }, a collection of domains D = {D\, • • •, D } for variables, and a set of constraints or evaluation functions F = {/i, • • •, f }, where each is denned on a subset of variables Si = Scope(fi) C X. The inference task for a constraint satisfaction problem is to find if there exists a value assignment that satisfies all the constraints. The assignment task for a constraint satisfaction problem- is to find such a value assignment or all assignments according to some criteria. n n r Classic CSPs Classic C S P [40] is a class of constraint satisfaction problems such that all constraints have to be satisfied by a valid value assignment. Constraint fi of classic CSP maps possible value assignments of variables in Scope(ji) into logical true if the assignment satisfies the constraint, or into logical false if not. In terms of our semiring-based framework, a classic C S P is a tuple (X,D,R,F), where X, D, and F are defined before, and R = ({false, true}, V, A) is a commutative semiring, where V is the binary O R operation with false as the identity element and A is the binary A N D operation with true as the identity element. Given Z = 0 , the inference task (decision problem) of a classic C S P is to compute: 9CSP = V A f Chapter 3. A Semiring-Based Generalized Framework for CBI 21 9CSP = true means there exists a value assignment that satisfies all constraints. In such a case, the assignment task for classic C S P is to compute: x = arg \f X f\ f f€F Soft C S P s Constraints in classic CSPs are sometimes called hard constraints, which means that all constraints have to be satisfied simultaneously. However, in practice many problems are over-constrained. Several frameworks have been proposed to handle over-constrained problems, mostly by introducing soft constraints that may be (partially) violated. These frameworks include: (1) the Max CSP framework that tries to maximize the number of satisfied constraints, which is equivalent to minimizing the number of violations; (2) the Weighted C S P framework that associates a degree of violation to each constraint and minimizing the sum of all weighted violations; and (3) the Fuzzy C S P framework [52] that associates a preference to each value assignment of a constraint. We generalize these soft CSPs into our semiring-based unified framework using various semirings, following the results of the Semiring C S P [6] and Valued C S P [53] frameworks. M a x C S P s and Weighted C S P s Sometimes we do not need all the constraints to be satisfied at the same time. In Max CSPs, the assignment task is to find a value assignment that satisfies the maximum number of constraints. A Max C S P is same as a Classic C S P except that it is denned on a different commutative semiring R = ([0, oo), max, +). The inference task for a Max C S P is to compute: 9MaxCSP = max f And the assignment task for a Max C S P is to compute: x = arg max f A Weighted C S P is slightly different from a Max C S P in that satisfying different constraints has different weights, whereas weights are always the same for all constraints in a Max CSP. The same, semiring used in the Max C S P applies in the representation of the Weighted CSP in our semiring-based unified framework. Fuzzy C S P s Fuzzy C S P [52] extends Classic C S P by mapping all possible value assignments of each constraint into preference levels. The levels of preference are defined on some discrete numbers from 0 to 1, where 0 stands for not allowing such an assignment. A solution of Fuzzy CSP is a value assignment that has the maximum value of the minimum preferences of all constraints (max-min Chapter 3. A Semiring-Based Generalized Framework for CBI 22 value). In terms of the semiring-based unified framework, a fuzzy CSP is a tuple (X,D, R,F), where X, D and F are defined beforehand R = ([0,1],max,min) is a commutative semiring with 0 as the identity element of max, and 1 as the identity element of min. Given Z = 0 , the inference task for a Fuzzy CSP is to compute: = max X 9FuzzyCSP min feF / And the assignment task for a Fuzzy CSP is to compute: x = are max min 3.2.2 f feF X 6 J P r o p o s i t i o n a l Satisfiability Problems The propositional satisfiability problem (SAT) is a central problem in logic and artificial intelligence, which consists of a logical propositional formula in n variables. For fc-SAT, the formula consists of a conjunction of clauses, and each clause is a disjunction of at most k variables, any of which may be negated. For k > 3, these problems are NP-complete. SAT Like the constraint satisfaction problem, SAT can be abstracted in the semiringbased unified framework as a tuple (X, D, R,F): • X = {Xi, • • •, X } is a set of n variables; n • D = {true, false} is the domain for each variable; • R = ({false, true}, V, A) is a commutative semiring, V is the binary O R operation with false as its identity element ; A is the binary A N D operation with true as its identity element. • F = {fi, • • •, f } is a set of r clauses, r : D —» {false, true}. k Given Z = 0 , the inference task for SAT is to compute: 9SAT = V / \ / And the assignment task for SAT is to compute: x = arg \J x f\ f feF Chapter 3. A Semiring-Based Generalized Framework for CBI 23 MaxSAT Similarly, MaxSAT is defined as finding a value assignment for each variable that allows the maximum number of clauses to be true. A MaxSAT is the same with SAT except the definitions of R and F: • R = [0, oo), max, +) is a commutative semiring, 0 is the identity elements of both max and +• • F = {/i, • • •, f } \Scope{fi)\. r is a set of r clauses, /* : {true, false} k —> {0,1}, k = Given Z = 0 , the inference task for MaxSAT is to compute: 9MaxSAT = max ]P / feF And the assignment task for MaxSAT is to compute: x = arg max ^ / feF. 3.2.3 P r o b a b i l i t y Inference Problems Probability inference problems can be seen as constraint-based inference by treating conditional probability distributions (CPDs) as soft constraints over variables. A belief network or a Bayesian network (BN) [49] is a graphical representation for probability inference under conditions of uncertainty. B N is defined as a directed acyclic graph (DAG) where vertices X = {Xi, • • • ,X } denote n random variables and directed edges denote causal influences between variables. D = {Di,--,D } is a collection of finite domains for variables. A set of conditional probability distributions F = {/i, • • •, / „ } , where fi = P(Xi\Parents(Xi)) is attached to each variable (vertex) Xi. Then the probability distribution over X is given by P(X) = JliLi P{Xi\Parents(Xi)). The moralized graph of a B N is obtained by adding edges among the vertices with a common child vertex and removing directions of all edges. The moralized graph of a B N is the primal graph of the corresponding probability inference problem. Generally, there are three types of tasks in belief inference: probability assessment, most probable explanation, and maximum a posteriori hypothesis (maximum likelihood). A l l of these tasks can be generalized into our semiringbased framework. n n P r o b a b i l i t y Assessment Probability assessment is a C B I problem over a B N , which computes the posterior marginal probability of a subset of variables, given values for some variables as known evidence. Formally, a probability assessment problem is a tuple (X, D, R, F), where X, D, and F are defined in B N , and R = ([0, oo),+, x) is Chapter 3. A Semiring-Based Generalized Framework for CBI 24 a commutative semiring, where (©,0) = (+,0) and (®, 1) = (x, 1). Given Z as the variable subset of interests, the inference task for a probability assessment problem is to compute: 9BA(Z) = a ^ Y [ f x\zfeF where a is a normalization constant. Obviously ([0, oo), x) has the invertible property, making normalization feasible. Most Probable Explanation ( M P E ) Most probable explanation is a C B I problem over a B N , which is to find a complete value assignment of variables that agrees with the available evidence and has highest probability among all such assignments. Formally, the most probable explanation problem is a tuple (X, D, R, F), where X, D, and F are defined in the corresponding belief network, and R = ([0, oo), max, x) is a commutative semiring, where (©,0) = (max,0) and (®, 1) = ( x , l ) . Given Z = 0 , the inference task for a M P E is to compute: 9MPE = max JJ f feF The assignment task for a M P E is to compute: x = arg max JJ / feF X M a x i m u m A Posteriori Hypothesis ( M A P ) A maximum a posteriori hypothesis problem is an optimization task over a B N , which finds a value assignment of a set of hypothesized variables to agree with the available evidence and has highest probability among all such assignments. Formally, a maximum a posteriori hypothesis problem is a tuple {X, D, R, F), where X, D, and F are defined in B N , and R = ([0,oo),max,+, x) is a commutative 2-semiring, where (max, 0) is defined over non-negative real values, (©,0) = (+,0), and = (x,l). Given Z = {Z\, - • •, Z } C X as a subset of hypothesized variables, the inference task for a M A P is to compute: t 9MAP = max( ] T x\z JJ /) feF The assignment task for a M A P is to compute: z = argmax(]T JJ /) x\zfeF Chapter 3. A Semiring-Based Generalized Framework for CBI 3.2.4 25 D y n a m i c Bayesian Networks Many discrete-time stochastic processes can be graphically represented as dynamic Bayesian networks (DBN) [13]. D B N is a powerful tool to model dynamic systems with N random variables X = {X\,; • • ,X }. Let X be the state vector of X at time t. As an extension of Bayesian networks (BN), inference tasks can be performed over a D B N . Formally, a dynamic Bayesian network is defined [45] to be a pair (B\, B_.), where B\ is a B N which defines the prior P{X ) , and B _ is a two-slice temporal Bayesian net (2TBN) which defines PiX^X ' ) as 1 n l 1 1 n PiX^X - ) = rjP(X?|Parents(X/)) 1 1 t=i Like representing B N , a D B N can be abstracted in the semiring-based framework as a tuple (X, D, R, F): = {X\, • • •, X } is a set of random variables at time t = 1, • • •, T; • D = {D\, • • •, D } is a collection of finite domains for each item in X\ • X 1 n n • R = ([0, oo), +, x) is a commutative semiring, (©, 0) = (+, 0) and (®, 1) = (x,l). • = {/{, • • • , / £ } where / / = P{X\\Parents(X\)) Given Z = {Zj, • • •, ZT} as a subset of random variables of interests at time t — T, the inference task in a D B N is to compute the marginal of joint distribution: T 9DBN = E X\-,X -\X \Z T 3.2.5 T T n n /* t=l / 6F' l Decision-Making Problems The Decision Network (DN), or the influence diagram [27], is a graphical representation for decision-making problems. A decision network is a directed acyclic graph with three types of nodes: random nodes, decision nodes, and value nodes. Each random node represents a random variable, associating with a conditional probability table; each decision node represents a decision variable with a finite set of possible actions (or decisions); each value node associates with a utility function. The edge from a random node to a decision node is called an information edge, and so a decision node Ai can be viewed as a random node. Given a value assignment of its parents nodes, Ai has a conditional probability P(Ai = a\Parents(Ai)) — 1 for an decisions G domain(Ai), and 0 for other decisions. The goal of the one-off decision-making problem in decision networks is to find such policies for each decision node that maximize the summation of expected utilities. We do not integrate sequential decision-making problems [62] in decision networks into our framework and leave it in our future work. 26 Chapter 3. A Semiring-Based Generalized Framework for CBI In terms of our semiring-based framework, a one-off decision-making problem is a tuple (X,D,R,F): • X = CU A, where C — {Xi, • • • X^} is a set of random (chance) variables and A = {Ak+\, • • •, A } = {X^+i, • • •, X } is a set of decision (action) variables; s n • D = { D i , • • •, D } n n is a collection of finite domains for each item in X; • R = ((—oo, oo), max,+, x) is a commutative 2-semiring, where —oo is the identity element of max, 0 is the identity element of +, and 1 is the identity element of x ; • F = {u} where u = Yli=i i the summation of t utility functions, each Ui associating with a value node, u : X —» 7c. u 1S n Given Z = A = {Ak+i, • • •, A } as a subset of variables of interest, the inference task for a one-off decision-making problem is to compute: n n gN D (3.5) = mzx^X\P{Xi\Parents{Xi))-u{A) C i=l The assignment task for a one-off decision making problem is to compute: n a = arg max Yl I I P{Xi\Parents{Xi)) • u{A) C i=l 3.2.6 Possibility Inference Problems Possibility theory [61] is another approach to characterizing uncertainties. A possibility inference problem is defined over a set of variables X = {Xi, • • •, X }, a collection of discrete domains D = {D\, - • • ,D } for variables, and a set of constraints or possibility distribution functions F = •••,/»•}• Each /j specifies the degree of possibility of the configuration of variables in Scope(fi). The degree of possibility, or possibility potential, is a value in [0,1], representing non-possibility to full certainty. In possibility theory, a binary operation T is a function T : [0,1] x [0,1] —» [0,1], which satisfies the following conditions: • Boundary: Va G [0,1], I T a = a and OTa = 0; n n • Monotonicity: V a i , 0 2 , 6 1 , 6 2 € [0,1] s.t. ai < a 2 and 61 < 62, a i T 6 i < a2Tf)2 holds; • Commutativity: Va,6,c £ [0,1], aTb = bTa and (aTb)Tc = a T ( 6 T c ) . Such a binary operation is also called a i-norm. There are many i-norms in possibility theory research. The most frequently used i-norms are: Chapter 3. A Semiring-Based Generalized Framework for CBI 27 • Product i-norm: aTb — a x b; • Godel's i-norm: aTb = min(a,b); • Lukasziewcz's i-norm: aTb = max(0, a + b — 1). The inference task for possibility inference is to find the maximum possibility of some variables and the corresponding configuration of these variables. In terms of the semiring-based unified framework, a possibility inference problem is a tuple (X, D, R, F) if the corresponding i-norm satisfies the commutative semiring properties: • X = {Xi, • • •,X ] is a set of random variables; • D = {£>!, • • •, D } is a collection of finite domains for each item in X; n n • R = ([0,1], max, T) is a commutative semiring, (©,0) = (raox, 0) and (®, 1) = (T, 1) where T is a i-norm and 1 is the identity element for the i-norm; • F = {fi, • • •, f ) r is a set of possibility distribution functions. Given Z C X a s a subset of variables of interests and let Y = X \ Z, the inference task for a possibility inference problem {X, D, R, F) is to compute the marginal of joint possibility distribution: gposs = m a x T / s F / The assignment task for a possibility inference problem is to compute: y = argmaxT/ F/ e 3.2.7 Coordination Graph In decision-making problems, we sometimes assume that utility functions are additive, i.e., u(A) = Yl\=i i(Aj), then Eq. 3.5 can be re-written as: u t n g DN = max^2Y[P(Xi\Parents(Xi) c i=l = max^^Y\P{Xi\Parents{Xi)) c j=l i=l t t • (^2UJ(AJ)) j=l n • UJ(AJ) n = max^^YlPiXilParentsiXifi-UjiAj) A j = l c t=l t = m a x ^ Q ^ ) (3.6) Chapter 3. A Semiring-Based Generalized Framework for CBI 28 where Qj(Aj) = 2~2c 117=1 P{Xi\Parents(Xi)) • Uj is a utility function that depends only on a subset actions Aj C A, but does not depend on random variables. After computing Qj(Aj), the task is transformed to find an equilibrium of a coordination game, which requires finding a decision or action from a finite set of possible actions for each player, to maximize the summation of players' utilities. Such a reduced problem is formalized by Guestrin et al. [25] and represented by a Coordination Graph. D e f i n i t i o n 3.7 A C o o r d i n a t i o n G r a p h for a set of players with local utilities { Q i , • • • ,Q } is i directed graph whose nodes are the actions for the players {Ai, • • •, An), and there exists an edge from Ai to Aj if and only if Ai £ Scope(Qj). n The task for finding an optimal joint action in a coordination graph can be easily embedded in our semiring-based framework as an assignment task for a CBI problem (X, D, R, F) where: • X = A = {Ai, • • •, A } is a set of decision (or action) variables; t • D = {Di, • • •, D } is a collection of finite domains, each for a variable; t • R = ([0, oo), max,+) is a commutative semiring, (©,0) = (max, 0) and (®,1) = (+,0); • F = {Qi, • • •, Q } is a set of utility functions, Qi : D where Scope(Qi) = {A^, • • •, A }. t tl x •••x D i k —> S, ik The inference task for computing optimal joint action in coordination graph is just to compute Eq. (3.6). The assignment task for a coordination graph is ' then to compute: t a = argmax£Q (A ) j 3.2.8 i M a x i m u m Likelihood Decoding In digital communication, decoding is an important step in recovering the original information transmitted through a discrete memoryless channel. How to decode information efficiently and exactly is always a central problem in coding theory. Maximum Likelihood Decoding (MLD) is one widely used approach. Semiring-based general massage-passing approach for M L D were first proposed in [1]. Suppose a codeword Z = { Z i , • • •, Z } is transmitted over a discrete memoryless channel. A vector Y = {Yi, • • • ,Y } is received. The likelihood of the codeword Z is then: n n P{Z\Y) = n \{P{Zi\Yi) Chapter 3. A Semiring-Based Generalized Framework for CBI 29 where P(Zi\Yi) is the transition probability of the channel. For computational convenience, sometimes we compute the log-likelihood: • iog(P(z|y)) = ^ - i o ( P ( z | y ) ) g i i It is equivalent to minimizing the log-likelihood decoding. We use hi to denote -logfPCZilYO). The M L D problem is to find the codeword Z that maximizes the likelihood. In addition, different linear block codes define a set of indicator functions x { x i i ' ' • , Xfc} to specify if a subset of Z is a valid part of the codeword. A n indicator function Xi I defined as: = s ,_ 7 \ _ / 0 ^ • - ^ - ( o o : : {Zn, Zik) is a part of the codeword; otherwise In terms of the semiring-based unified framework, a maximum likelihood decoding problem is a tuple (X, D, R, F), where: • X = Z U Y is a set of 2n variables; is the domains for each Z and Y*; • D = {Di, • • •, D ) n t • R = ([0, oo), min, +) is a commutative semiring, (©,0) = (min, oo) and ( ® , l ) = (+,0); • F = // U x is {Xi, • • - ,Xk}. a se = {/ii, • • •, h } t of constraints where n and x = The inference task for a M L D is to compute: 9MLD = nun ^ And the assignment task for a M L D is to compute: z = arg mm ^ hiGH 3.3 hi Y x i XjGx Summary This chapter presented a semiring-based unified framework to generalize the representation of C B I problems from different fields. The general multiplication operation ® denotes the combination of constraints over local variables. The general addition operation © denotes the marginalization of a large constraint into a smaller scope. Through specifying variables and local constraints, C B I problems are generalized as computing a general sum-of-products in semirings. Chapter 3. A Semiring-Based Generalized Framework for CBI 30 We discussed many concrete C B I problems in this chapter as instances of the proposed semiring-based unified framework. These problems cover a wide range of research fields like probabilistic inference, decision-making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference. We will show in following chapters that the abstract representation provides a powerful tool for studying and analyzing inference algorithms for these problems. The generalized representation, together with abstract level algorithm analyses, will eventually improve the concrete inference approach design for specified C B I problems. Chapter 4. Exact Inference Algorithms 31 Chapter 4 E x a c t Inference Algorithms The distributivity and the commutativity properties of a commutative semiring R = (S, ffi, ®) provide computational opportunities for solving constraint-based inference problems more efficiently. Comparing the computation costs (in terms of the number of operations) of the two tasks listed in Table 4.1, we will find that the two computation tasks are equivalent if distributivity holds. Both tasks need n — 1 © operations, however, Task 2 only needs one ® operation, compared to n ® operations in computing Task 1. Task 1 2 Table 4.1: Computation Costs of Two Equivalent Tasks Formula To Compute # of © Operations # of ® Operations ( a ® O x ) © , • • •, © ( a ® 6 ) n - 1 n a ® (6i®, • • •,©&„) n - 1 1 n The basic idea behind these two computation tasks is: different computation sequences lead to different costs, given the same computation task defined by a commutative semiring. The distributivity of the two operations of a semiring provides possibilities to change the computation sequences equivalently, which makes finding a lower computation cost, for the task possible. Many inference algorithms have been developed independently, explicitly or implicitly exploiting the distributivity of commutative semirings. The key idea of these algorithms is to rearrange the computation sequences of the task solutions. In Chapter 3 we generalize these algorithms into two categories: variable elimination algorithms and junction tree algorithms. 4.1 4.1.1 Variable Elimination Algorithm Generalized Variable Elimination A l g o r i t h m The basic idea behind variable elimination (VE) algorithms, or bucket elimination algorithms, comes from non-serial dynamic programming [4]. Since dynamic programming algorithms work by eliminating variables one by one while computing the effect of each eliminated variable on the remaining problem, they can be viewed as variable elimination algorithms. Formally, given a C B I problem (X, D, R, F) and a variable subset of interest ZQX,\etY = X\ Z and a =< Y\, • • •, Yfc > be a permutation of elements in Y. The ordering a is decided by the variable elimination algorithm. Let Fy k 32 Chapter 4. Exact Inference Algorithms be the subset of constraints in F whose scopes contain VV Let Fy be F \ Fy . Then Eq. 3.1 can be re-written as: k 9CBI{Z) = k 0(g)/ X\Z f€F = ©0/ Y f€F = © © (<gj / <8> /) = © <8> / ( © <8 /) Y\{Y }f€F k = \ }feFy Vk {Yk © (8) / ® / ' y\{y }/eFv fc k J ( ) 41 fc where / ' = ©y. ®f p f is a new constraint that does not contain variable Ffc in its scope. In Eq. 4.1, variable is eliminated. In another words, now the inference task for the C B I problem does not depend on YkThe generalized variable elimination algorithm follows the same procedure, eliminating variables recursively one by one, according to some ordering of these variables. See Fig. 4.1 for the description of our generalized variable elimination algorithm for the C B I inference task, abstracted from the variable elimination algorithm [64] in probability inference. Another revision of the generalized-VE algorithm is the generalized bucket elimination algorithm, which is abstracted from the bucket elimination algorithm [14] in the probabilistic inference field. See Algorithm 4.2 for details. The difference of these two algorithms is in the implementation, where the bucket elimination algorithm requires sorting constraints before the elimination of variables. A common concern of applying V E algorithms is how to find an optimal elimination ordering to minimize the computation cost. Finding an optimal elimination ordering, which is equivalent to finding an optimal triangulated graph or finding the treewidth, the minimum width of all tree decompositions, is MV-hard [2]. Several heuristic search methods discussed in Chapter 2 can be used to generate elimination orderings. We will show their empirical effectiveness in Chapter 7. • For the assignment task for a C B I problem, backtracking approaches can be applied. A l l intermediate constraints during the elimination procedure are cached. The final computation value of gcBi(Z) is used as an index to track valid value assignments in the intermediate constraints. To apply V E algorithms, commutativity of (g> is required; for the C B I inference task that enables us to rearrange the combination sequence of local functions. The commutativity of © is a desired property as well; that enables fe e Y Chapter 4. Exact Inference Algorithms 33 A l g o r i t h m 4.1 Generalized Variable Elimination Algorithm (GVE) for a C B I Inference Task Input: A C B I problem (X, D, R, F) and a variable subset of interest Z O u t p u t : g Bi{Z) = 0 _ C X ® / Z e F / Let Y = X \ Z 2: Choose an elimination ordering cr =< Y\, • • •, Yt > of Y 3: for i = k to 1 do 4: F' := 0 1: 5: 6: 7: 8: 9: 10: for each / € F do if Yi £ Scope{f) then F' := F ' U {/} F:=F\{/} e n d if e n d for 11: / ' : = ® <8W< / 12: F:=FU{/'} 13: e n d for 14: Return gcBi(Z) := 0 Yi / 6 F / A l g o r i t h m 4.2 Generalized Bucket Elimination Algorithm (GBE) for a C B I Inference Task ; Input: A C B I problem (X, D, R, F ) and a variable subset of interest Z = ©X-Z0/SF/ O u t p u t : gcBi(Z) l: Choose an elimination ordering Y =< Y i , • • •, Y& >, where Y = X \ Z\ 2: Initialize k + l empty buckets B = {B , B\, • • •, Bk], Bi = 0; 3: for i — 1 to r do 4: if Y n Scope(fi) = 0 then Q 5: 6: 7: B :=B \j{fi}; Q Q else Find the variable Y £ Y 0 Scope(fi) with maximum index i; t 8: B := B U {/<}; 9: e n d if 10: e n d for 11: for i = k to 1 do t t 12: /':=0 (8)/,..B /i 13: if Y n Scope(f') = 0 then 14: 15: B := J5 U • { / ' } ; else 16: 4 y i 0 0 Find a variable Y £ Y D Scope(f') with highest order; t 17: S :=B U{/'}; 18: e n d if 19: e n d for T T 20: Return g Bi{Z) C := (8>/ B / s 0 34 Chapter 4. Exact Inference Algorithms us to exploit different elimination orderings. Identity elements for the two operations are not required, so we can relax the requirement of commutative semiring here. Then we can solve the problem using the concrete V E algorithm through instantiating our generalized algorithms. 4.1.2 Generalized V E A l g o r i t h m for k-Semiring A n extension of the generalized-VE algorithm works with C B I problems defined on a commutative fc-semiring (S, opo, op\, • • •, opk)- In such cases, we repeatedly use the generalized-VE for the last two operations, then replace F by the new marginalized constraints . Details of the generalized V E algorithm for C B I problems defined on a commutative fc-semiring are shown in Algorithm 4.3. 1 A l g o r i t h m 4.3 Generalized V E Algorithm for k-Semiring (kGVE) for a C B I Inference Task I n p u t : A A-semiring-based C B I problem (X, D, R, F) where R = is (S, opi,op2, • • • ,opk+i) a k-semiring, and Z = {Z\,--,Z } k , Zi C • • • C Z is a set of variable subsets of interest k O u t p u t : g CBi(Z) k = opi _ op -z x Zl 2X •' 2 1: for m = k to 1 do 2: F := GVE(X, D, Z , (5, o , o ), 3: e n d for m 4: 9kAR{Z) 4.1.3 ••= Pm Pk+1 °Pkx-z °Pk+ij=ifj k F) Op +l f k f£F Instances of Variable E l i m i n a t i o n A l g o r i t h m Many concrete algorithms from different fields can be seen as instances of variable elimination. For classic CSPs, Seidel [54] proposed a variable elimination algorithm in the early 80s. Later, the Adaptive Consistency (AC) algorithm [16] was proposed. It works by eliminating variables one by one, while deducing the effect of the eliminated variable on the rest of the problem. Adaptive Consistency algorithm can be easily understood as a V E algorithm through instantiating the semiring {S, ©, ®) by ({false, true}, V, A) in our generalized V E algorithm. For propositional SAT problems, directional resolution is the core of the well-known Davis-Putnam (DP) algorithm [18]. The basic idea of directional resolution is exactly variable elimination. We can get the concrete D P algorithm through instantiating the generalized V E algorithm through using semiring ({false, true}, V, A) as well. For probabilistic inference, the variable elimination algorithm [64] and the bucket elimination algorithm [14] are the first two V E algorithms that are widely studied in tackling inference tasks for probability assessment, M P E , and interestingly, another variable elimination process eliminates the second last operation repeatedly. Chapter 4. Exact Inference Algorithms 35 M A P problems. The concrete algorithms can be obtained through instantiating the generalized V E algorithm through using semiring ([0, oo),+, x) or ([0,oo),max, x ) . For decision-making problems, Zhang [63] reduced the influence diagrams, the graphical representations of the decision-making problems to Bayesian networks; the variable elimination algorithm for probabilistic inference in B N is then applied to solve decision-making problems. For maximum likelihood decoding of coding theory, Viterbi decoding [57] is an instance of variable elimination, which is defined on the semiring ([0, oo), max, x In [25], -a variable elimination algorithm was proposed to find an optimal joint action in the coordination graph. The algorithm was applied in RoboSoccer successfully [36], incorporating other techniques like discretizing the continuous domain and introducing role assignments. We can design other concrete V E algorithms for problems that can be abstractly represented by the.semiring-based unified framework, through instantiating different semirings. In general, variable elimination is a variant of dynamic programming, which is the key idea of these algorithms. 4.1.4 Space a n d T i m e C o m p l e x i t y Given a C B I problem (X, D, R, F), the space and time complexities of the generalized variable elimination algorithm are discussed in terms of the following: • G = (V, £): primal graph representation of C B I problem; • d = max(\Di\), Di £ D: maximum domain size of variables; • w. width of the tree decomposition of Q along the elimination ordering; • r = | F | : number of constraints; • re number of constraints with Vj in their scopes; T h e o r e m 4.1 (Space Complexity of VE) The space complexity of the generalized variable elimination algorithm is 0(|V| + d ). w+1 P r o o f : To eliminate a variable Vi, all constraints with Vi in their scopes are combined. Because \Neighbors(Vi)\ < w, the size of the combined constraint is at most d . We eliminate variables sequentially and reuse the space. So the total size required of the generalized V E is 0(|V| + d ). • w+1 w+1 T h e o r e m 4.2 (Time Complexity of VE) The time complexity of the generalized variable elimination algorithm is 0(r • d ). w+l P r o o f : To eliminate a variable Yj S Y = X \ Z, we need at most t;((8>) = (rj - 1) • d ® operations, and at most tj(ffi) = d © operations. So in total we need at most T((g>) = _ _ i = i ^ ( ® ) = (r - |V|) • d , and T(©) = __i=i £,((§>) = | V| • d . If in the semiring R, © and ® have the same processing times, the total time of the generalized V E is 0(r • d ). • w+1 w+1 w+l w+l w+1 Chapter 4. Exact Inference Algorithms 4.1.5 36 Discussion of Applying the V E Algorithm The variable elimination algorithm is essentially a dynamic programming approach, which exploits the semiring structure of a constraint-based inference problem. The inference task is finished through repeatedly combining local constraints with a common variable in their scopes, caching the result of the marginalization as a new constraint, and posting the new constraint to refine the solution of the original problem. In theory, all C B I problems with semiringbased structures can apply the scheme of the generalized variable elimination algorithm. The only difference between one problem and another is in the different implementations of combination and marginalization. In the application of V E algorithms, the identity elements of combination and marginalization operations are not necessary. The commutativity of combination is not a necessary property but desired. If commutativity does not hold, the only possible elimination ordering is decided by the representation of the problem. Otherwise, we could use heuristic or approximate techniques to find a near-optimal elimination ordering to reduce computation costs. According to the complexity analysis, both the space and time complexities of the generalized V E algorithm are linear in the size of the task, but exponential in the width of a tree decomposition of the corresponding primal graph representation. For a large scale C B I problem with a complex graphical representation, treewidth (the lower bound of widths of all tree decompositions of the problem) is often intractable, which makes the direct application of the V E algorithm infeasible. There are basically two ways to overcome it: the first one is to seek approximation inference solutions in terms of some criteria; the second is to incorporate value properties of the variables and exploit the structure properties of the problem. The approximation inference will be discussed in Chapter 5. The improvement of V E algorithms from the value point of view will be discussed briefly in Section 8.2. 4.2 4.2.1 Junction Tree Algorithm Generalized J T Algorithm The major motivation that prompts researchers to use junction tree algorithms to solve C B I problems is handling multiple queries. The structure of a given CBI problem always remains unchanged, whereas the variables of interest may be frequently changed according to the requests of users. Junction tree algorithms can share intermediate computational results among different queries, which is an advantage relative to variable elimination algorithms. In fact, junction tree algorithms can be seen as memorized dynamic programming [9], where solutions for subproblems are memorized for later use. A junction tree is a structure that efficiently divides the original problem into subproblems. Given a C B I problem P = (X,D,R,F), the primal graph Q = (V,£) of P can be re-organized as a secondary structure, junction tree T = (C,S), by triangulating the primal graph and identifying the maximal cliques. A junction 37 Chapter 4. Exact Inference Algorithms tree is a tree decomposition of the original graph Q. A junction tree is optimal if the maximum clique size (minus 1) is equal to the treewidth of Q. Finding an optimal junction tree of the given graph, equivalent to finding the treewidth, is ATP-hard in general. Various heuristics search algorithms or approximation algorithms can be applied to find junction tree decompositions with near-optimal width. Let C = {Ci, • • •, Cp) be a set of clusters, where Cj C V, and <S = { S i , • • •, S } is a set of separators. Here we also use Ci and Sj to denote a subset of variables X or vertices V. S, = C; n C if Q and C are connected by separators Sj. Then we say T = (C, S) is a junction tree for a C B I problem P = (X, D, R, F) if the following three properties hold: q r r • Singly connected property: T is a tree; • Constraint allocation property: For any constraint /j £ F, there are some clusters Cj such that Scope(fi) C Cj\ • Running intersection property: If a vertex (or variable) appears in two clusters Cj and Cj, then it also appears in all clusters on the unique path between Ci and Cj. In general, junction tree algorithms assign constraints to clusters and combine constraints in the same cluster. The combined constraint is marginalized and passed as a message between clusters. Following a specified message-passing scheme, the junction tree reaches consistency and any variable of interest can be queried through marginalizing out other variables in the cluster that contains that variable. Formally, the generalized junction tree algorithm for a C B I problem is shown in Algorithm 4.4, which is abstracted from the message-passing scheme in the Shenoy-Shafer architecture [55] of probabilistic inference. One interesting case of applying the J T algorithm is that the variable subset of interest Z in query is not contained in a single cluster. One naive way to overcome it is adding a clique of Z before constructing the junction tree. However, such an approach hurts the flexibility of the J T algorithm to handle multiple queries. A more elegant solution is to find a subtree Tz = (Cz,Sz) of T = (C,S), where Z C C = The answer to the query is computed z ^as the marginal of the combination of all the local constraints together with all the incoming messages of the subtree Tz. Formally: Ucec gcBi(Z) := 0 0 C \ Z d € C z (4> -® Ci 0 m(Cj,Ci)) (4.2) Cj&Neighbors(Ci),CiiCz The basic idea of Eq. 4.2 is to treat the subtree Tz as a virtual cluster and compute the marginal as treating normal concrete clusters. We can modify the generalized J T algorithm to solve the assignment task for a C B I problem. After computing gcBi(Z) in some cluster C j , backtracking is applied to variables contained in C,. After a value assignment for these variables is decided, the assignment is passed to all neighboring clusters of C j . The Chapter 4. Exact Inference Algorithms 38 A l g o r i t h m 4.4 Generalized J T Algorithm (GJT) for a C B I Inference Task Input: A junction tree T = (C, <S) of a C B I problem (X, D, R, F) and a variable subset of interest Z = © \ z <8>f€Ff O u t p u t : 9CBI{Z) x 1: Attach to each cluster C i a constraint (fid — 1 2: for each / £ F do 3: Find a cluster C i such that Scope(f) C C* 4: ^Ci 5: end for := <fid ® / 6: for each Edge which is from clusters d to C j do 7: if Ci has received messages from all its neighbors other than Cj then 8: Ni„j := Neighbor{d) \{Cj} 9: % m(Ci,Cj) is the message sent from C i to CJ; 10: m(Ci, Cj) := 0 \ s ( f o , ® m(Cj, Ci)) C i y e n d if 12: e n d for 13: for each d £ C do 14: if Z C C i then 11: 15: 0 16: Return 17: 18: C i : = 0 C i ® <8>C,eAfeifl/i6<,r(C ) ( gcBi(Z) := © C i m ^ C i) \ _ 0Ci e n d if e n d for procedure recursively instantiates values of variables according to the messages passing from the instantiated clusters to the un-instantiated ones. Following the result of [55], commutativity of both © and ® is required, which ensures the correctness and completeness of the J T algorithm. The message-passing scheme in the generalized junction tree algorithm can be seen as consisting of two passes. By choosing any cluster as the root of the tree, the first pass is from the root to leaves and the second pass is from leaves to the root. We can use several message-passing scheduling techniques, such as parallel or hybrid computing, to improve the efficiency of message-passing. The messagepassing scheme in Algorithm 4.4 is sometimes called Shenoy-Shafer architecture, following the authors' names of [55]. To apply junction tree algorithms, the commutativity of either © or ® is required, which ensures the correctness and completeness of the J T algorithm. The identity elements of the two operations are required as well. Some revised versions of the junction tree algorithm require ® has the combination invertible property, which makes the message-passing scheme more efficient, though in our generalized J T algorithm it is not mandatory. In practice, many problems are graphically represented as disconnected junction trees. We can add an arbitrary edge between two clusters in different subtrees and assign an empty separator set to the new edge. The empty separator set between subtrees separates the message passing, as well as satisfies Chapter 4. Exact Inference Algorithms 39 the junction tree properties. Now the problem can be solved by the proposed generalized junction tree algorithm. Another class of problems can not be efficiently re-organized as junction trees due to the intractable maximum cluster sizes. One revision is to use junction graphs. In such a case, the generalized J T algorithm does not work directly since cycles in the junction graph prohibit message-passing being terminated. However, some experimental results [1] show that we can get a high quality approximation through using the iteratively (loopy) message-passing scheme. 4.2.2 Generalized JT Algorithm for /^Semiring Given a C B I problem defined on the commutative fc-semiring (5, opo, opi, • • •, opt), we can solve the problem by attaching k constraints 4>c. to each cluster Ci, where 1 < j < k. Therefore, we have k messages to pass to neighboring clusters of Ci, each corresponds to marginalize </>^ by o p j _ j . The constraints are updated by combining all received messages through using opj. Each constraint for a cluster is initialized as the identity element of the corresponding operation. At the beginning of the-algorithm, cf>^ = f^opj+i(f>Q \ where / ^ denotes the +l combination of constraints attached to the constraint 4>cr of Cj. It is actually the basic idea behind the junction tree algorithm for solving decision-making problems in influence diagrams by Jensen et al. Details of the generalized J T algorithm for C B I problems defined on fc-semirings are given in Algorithm 4.5. 4.2.3 Generalized JT Algorithm for Idempotent Semirings The setup of the generalized junction tree algorithm in Algorithm 4.4 requires semirings of C B I problems be to commutative under both the combination and marginalization. The identity element of the combination is required though the identity element of marginalization is not necessary. According to these observations, any C B I problem with commutative semiring representation can be solved by instantiating the generalized J T algorithm. In some C B I problems, the corresponding semirings have idempotency properties under c o m b i n a t i o n operations, which provide possibilities for improving computational efficiencies of the generalized J T algorithm. A commutative semiring (R, ©, ®) is idempotent under combination if Va G R, a® a = a. We call such a semiring an idempotent semiring, without explicitly mentioning that the idempotent property is actually for the combination operation. Since combination corresponds to information (or constraint) gathering, the idempotency of combination implies that repeatedly combining the same information will not produce new information. Furthermore, considering the marginalization of a set of information m = © ^ m* in an idempotent semiring, we get m = m ® m = m® ©^ mi = ©^ (m ® rrii). This induction implies m = ©^m, = © j ( m ® m j ) , which means that combining the marginalization with original information will not produce new information after another 40 Chapter 4. Exact Inference Algorithms Algorithm 4.5 Generalized J T Algorithm for k-Semiring (kGJT) for a C B I Inference Task Input: A junction tree T = {C,S) of k-semiring-based C B I problem (X,D,R,F), where R = (S, op\, op , • • •, opk+i) is a k-semiring. Z = {Zi, • • •, Zk) , Z\ C • • • C Zk are variable subsets of interest and Z is contained in at least one cluster of T. Output: gkCBi(Z) = opi \ op \z -• • op \z °Pk+i Jj 1: Build k empty function buckets F j , each corresponds to opi \ 2 k r x Zi 2x 2 kx k j= + 2: for each / e F do 3: 4: if / is a factor of opi \ then Allocate / to F + t end if end for 7: for each Cj £ C do 5: 6: 8: 9: Attach Cjwith a constraint 4>d Initialize 4>d to identity element of opk+i end for for m = k to 1 do 12: for each /j £ F do 10: 11: m 13: Find a cluster Cj such that Scope(fi) C Cj 15: 4> ••end for 16: 17: 18: 19: for each Edge 5 ^ which is between clusters Ci and Cj do if Ci has received messages from all its neighbors other than Cj then Ni-j := Neighbor(Ci) - {Cj} m(Ci,Cj) = op \ .opk i _ m(C[,C )opk+i(f>c 20: end if end for for each d £ C do 14: 21: 22: 23: 24: 25: Cj 4>CjOPk+lfi mCi Si := op„ i + C|eN ._ .m(Q,Ci)opfc+i(/>c j end for end for 26: Find a cluster Cj such that Z\ QCi 27: Return g CBi{Z ) := opi .\ 4> k t + CieNi c Zi Ci j i i i Chapter 4. Exact Inference Algorithms 41 marginalization. The observation above shows that for an idempotent semiring, we can ignore the double-counted messages in the generalized junction tree algorithm, without loss of correctness. In Algorithm 4.6, the generalized J T is revised to cope with idempotent semirings. 1 Algorithm 4.6 Generalized J T Algorithm for Idempotent Semirings (GJTIdemp) for a C B I Inference Task Input: A junction tree T = (C, S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: gcBi(Z) = 0X\Z<8>/£F/ l: Attach each cluster C^with a constraint 4>c = 1 t 2: for each / € F do Find a cluster Ci such that Scope(f) C d 3: 4: (j> •= 4>d ® / Ci 5: end for 6: Choose an arbitrary cluster R G C as the root of T 7: for each Cluster Cj {Upward Phase} do 8: if A l l messages from d's children clusters are ready then 4>d •= 0 C < 9: 10: «8>C eC/nldren(C0 ( i> k)) m C fc C if Ci 7^ R then 11: 12: 13: 14: 15: ® Cj := Parent(d) m{d,Cj)~ ®c \Sa<t>c i i end if end if end for 16: for each Non-leaf cluster C, {Downward Phase} do 17: if message from d's parent clusters is ready then 18: if Ci ^ R then 19: 20: end if 21: 22: for each Cj € Children{Ci) do m(d,Cj) := © . 0c, 23: 24: 25: </f>Ci := 4>Ci ®m{Ci,Parent{d)) C A S < end for end if end for 26: Find a cluster Cj such that Z C d 27: Return g Bi{Z) C 4.2.4 := © c x z ^ c . Generalized J T Algorithm for Invertible Semirings Some CBI problems are denned on semirings with invertible properties under combination operations, which provide possibilities for improving the generalized J T algorithm. We call such a semiring an invertible semiring, with- Chapter 4. Exact Inference Algorithms 42 out explicitly mentioning that the invertible property under the combination operation. A commutative semiring (R, ©, ®) is an invertible semiring if Va £ R, B a £ R, s.t. a ® a = a ® a = 1. The binary operation 0 of i? is then defined as: _ 1 - 1 - 1 Va, b £ i i , a 0 6 = a ® b~ l Generalized JT Algorithm for Invertible Semirings The generalized J T algorithm can be modified to use the combination invertible property, as listed in Algorithm 4.7. The basic idea here is caching the combination of all incoming messages and dividing the specific incoming message to get the outgoing message for that separator. Dividing here eliminates duplicated information from the combined message. Lauritzen-Spiegelhalter (LS) architecture and HUGIN architecture There are another two message-passing schemes for semirings with combination invertible properties: the Lauritzen-Spiegelhalter (LS) architecture [39] and the H U G I N architecture [29]. We generalize the J T algorithm for LS architecture in Algorithm 4.8 and the J T algorithm for H U G I N architecture in Algorithm 4.9. The major differences between LS and H U G I N architectures are: (1) The reverses operation is done in the domain of separators in H U G I N but in the domain of clusters in LS. A small separator domain size enables H U G I N to achieve more time-efficiency than LS architecture; (2) H U G I N architecture requires additional storage for the constrains attached to separators, whereas LS architecture passes messages on the fly (do not need to be stored). A detailed discussion of the time and space complexities of Shenoy-Shafer architecture,' LauritzenSpiegelhalter (LS) architecture, and H U G I N architecture can be found in the following sections. 4.2.5 Instances of J u n c t i o n Tree A l g o r i t h m Many concrete algorithms from different fields can be seen as instances of the generalized junction tree algorithm. Junction tree algorithms have been widely used and studied in the probabilistic inference community since the late 1980s. Lauritzen and Spiegelhalter [39] developed a message-passing scheme in probabilistic inference based on clustered tree structures, which exploits the combination invertible property of semirings ([0, 0 0 ) , + , x) and ([0,00), max, x). Shenoy and Shafer [55] independently developed another general message-passing scheme that does not rely on the combination reverse property. Such a scheme is general enough to be applied to C B I problems with any type of commutative semirings. Lately, Jensen et al. [29] revised the message-passing scheme and developed H U G I N architecture for probabilistic inference problems. Chapter 4. Exact Inference Algorithms 43 Algorithm 4.7 Generalized J T Algorithm for Invertible Semirings (GJT-Inv) for a C B I Inference Task Input: A junction tree T = (C, S) of a C B I problem (X, D, R, F) and a variable subset of interest Z Output: 5CB/(^) = 0 \ Z ® / € F / l: Attach each cluster C,with a constraint (fid — 1 2: for each / £ F do 3: Find a cluster d such that Scope(f) C d 4: 0 := (fid ® f 5: end for 6: Choose an arbitrary cluster R £ C as the root of T 7: for each Cluster Cj {Upward Phase} do 8: if A l l messages from C,'s children clusters are ready then X C i 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: <fid : = 4>d ® (<g>C €Children(d) k m ^ k)) C if d ^ R then Cj :— Parent(d) m{C Cj) := © C A S y ^ C i end if end if end for for each Non-leaf cluster C, {Downward Phase} do if message from CVs parent clusters is ready then if d ± R then (fid •= 4>d ® m{d, Parent{Ci)) end if for each Cj £ Children(Ci) do m(d, Cj) := © 0 m ( C j , C*)) end for end if end for Find a cluster Ci such that Z C d Return.o B/(^):=©c \z0c U C C i A S y i Chapter 4. Exact Inference Algorithms 44 Algorithm 4.8 Generalized J T Algorithm for LS Architecture (GJT-LS) Input: A junction tree T = (C, S) of a C B I problem (X, D, R, F) and a variable subset of interest Z Output: g {Z) = (&x\ <8>f€F f 1: Attach each cluster Cjwith a constraint 4>Ci = 1 AR z 2: for each / £ F do 3: 4: 5: Find a cluster Ci such that Scope(f) C C j 4> ••= 4>d ® / end for Ci 6: Choose an arbitrary cluster R G C as the root of T 7: for each Cluster Ci ^ R {Upward Phase} do 8: if Ci has received messages from all its children clusters then 9: C j := Parent(d) 10: m{Ci,Cj) := 0 S 0C< 11: 0c, := 4>Cj (Smid^j) 12: (fid <f>Ci0rn(d,Cj) C 13: 14: A y end if end for 15: for each Non-leaf cluster Ci {Downward Phase} do 16: if Cj has received messages from its parent cluster then 17: for each Cj € Children(Ci) do 18: m(d,Cj) := © <f>c 19: 4> •- (t>Cj ®m{Ci,Cj) C A S i j t Cj 20: 21: 22: end for end if end for 23: Find a cluster C j such that Z C C j 24: Return gcBi(Z) := (& \z ^Ci Ci Chapter 4. Exact Inference Algorithms 45 A l g o r i t h m 4.9 Generalized J T Algorithm for H U G I N Architecture (GJTHUGIN) Input: A junction tree T = (C, <S) of a C B I problem (X, D, R, F) and a variable subset of interest Z O u t p u t : g {Z) = (&X\Z <S>feF f 1: Attach each cluster Cjwith a constraint 4>c = 1 2: for each / e F d o 3: Find a cluster Ci such that Scope(f) C C* 4: 4>Ci := 4>Ci®f 5: end for 6: Choose an arbitrary cluster R e C as the root of T 7: for each Cluster Ci^ R {Upward Phase} do 8: if Ci has received messages from all its children clusters then 9: Cj := Parent(Ci) 10: <j> = m{Ci, Cj) := © . 4>Ci 11: </> := 4>Cj'®Tn{Ci,Cj) 12: end if 13: end for 14: for each Non-leaf cluster C , {Downward Phase} do 15: if Ci has received messages from its parent cluster then 16: for each Cj G Children(Ci) do 17: Tn(C ,C ):=©cAS„^c 18: <£ := ® (m(Ci, Cj) 0 < £ . ) 19: 0 :=m{CuCj) 20: end for 21: end if 22: end for 23: Find a cluster C, such that 'Z C Cj ~ 24: Return g Bi{Z) := ® \z ^ AR t Sij C A S i c i i 4 Cj S i S y C Ci Chapter 4. Exact Inference Algorithms 46 The application of junction tree algorithms in constraint satisfaction problems can be traced back to [17]. A message-passing algorithm on the tree clustering structures of constraint networks were developed. The algorithm is essentially the same scheme as Shenoy-Shafer architecture in probabilistic inference, except that the join and project operations are used as relation combination and marginalization operations. As a special case of CSP, propositional SAT problems can be solved by the same algorithm on the semiring ({false, true}, V , A ) . In [65], Zhang and Mackworth generalized the Arc Consistency (AC) algorithm for constraint networks and designed T A C , a A C algorithm for the rooted join tree, to solve CSP. In the T A C algorithm, the marginalization operation corresponds to relational projection and the combination operation corresponds to relational join. As a result, the marginal of combining two relations is represented as relational semi-join. The contribution of [65] is proposing and analyzing parallel and distributed message-passing algorithms ( P T A C and D T A C ) that can be generalized to apply to junction tree algorithms from other disciplines. The junction tree algorithm was explicitly generalized in decoding applications as Generalized Distributive Law (GDL) by Aji and McEliece [1]. Many concrete decoding algorithms, including the Baum-Welch algorithm, the GallagerTanner-Wiberg decoding algorithm, and the B C J R algorithm are all special cases of the G D L algorithm. In decision-making problems, Jensen et al. [30] proposed an algorithm that attaches two types of potentials to the junction tree converted from the influence diagram, and solves the corresponding decision-making problem by passing messages in the junction tree. We could design other concrete J T algorithms for problems that can be abstractly represented in the semiring-based unified framework by instantiating different semirings. Specifically, when a C B I problem is defined on the semiring with the combination invertible property, the generalized LS architecture and H U G I N architecture are both suitable to solve the inference task. 4.2.6 Space and T i m e Complexity Given a C B I problem (X, D, R, F), the space and time complexities of the generalized junction tree algorithm are discussed in terms of the following notions: • T = (C,<S): a junction tree of the C B I problem; • d = max(\Di\), Di G D: the maximum domain size of variables; • w: the width of the junction tree T ; • sep: the maximum separator size of the junction tree T ; '• r = | F | : the number of constraints; • r%: the number of constraints allocated to cluster Cf, • dep,: degree of cluster C,; 47 Chapter 4. Exact Inference Algorithms • k: ratio of r and n , etc. r = k • n. T h e o r e m 4 . 3 (Space Complexity of JT in Shenoy-Shafer (SS) Architecture) The space complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture is 0{\C\ • d ). w+1 Proof: To pass messages in SS architecture, each cluster needs one storage for the combination of constraints allocated to this cluster, and another storage for the marginalized constraint after absorbing all the messages passing to it. So the needed storage for all clusters is at most 2 • \C\ • d . In each separator, both the upward and downward messages are stored, and the needed storage is then 2 • |<S| • d at most. The total storage needed for the J T algorithm in SS architecture is 2 • \C\ • d + 2 • \S\ • d . Since sep < w and \C\ = \S\ + 1 in trees, the space complexity is 0(\C\ • d ). • w+1 sep w+1 sep w+1 T h e o r e m 4 . 4 (Space Complexity of JT in SS Architecture with Idempotent Semiring (SS-Idemp)) The space complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture with an idempotent semiring is 0(\C\-d ). w+1 Proof: To pass messages in SS-Idemp, each cluster needs only one storage for the combination of constraints initially allocated to this cluster. Lately, it is used to store the marginalized constraint after absorbing all the incoming messages. So the needed storage for all clusters is at most |C| • d . , A l l messages do not need to be stored, so the space complexity is 0(\C\ • d ). • w+1 w+1 T h e o r e m 4 . 5 (Space Complexity of JT in SS Architecture with Invertible Semiring (SS-Inv) ) The space complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture with the combination invertible property is 0(\C\ • d ). w+1 Proof: To pass messages in SS-Inv, each cluster needs only one storage for absorbing all the messages passing to it. So the needed storage for all clusters is at most \C\ •d . In each separator, the upward and downward messages are stored in the same place, so the needed storage is then \S\-d at most. The total storage needed for the J T algorithm in SS-Inv architecture is |C|-a™ + | i S | - d . Since sep < w and |C| = |<S| + 1 in trees, the space complexity is 0(\C\ • d ). w+1 sep +1 sep w+l a T h e o r e m 4 . 6 (Space Complexity of JT in Lauritzen-Spiegelhalter (LS) Architecture) The space complexity of the generalized junction tree algorithm in Lauritzen-Spiegelhalter architecture is 0(\C\ • d ). w+1 Proof: To pass messages in L S , each cluster requires only one storage for the combination of constraints allocated to this cluster. After passing messages, the combined constraint is revised to be the marginal of the total constraints combination. So the needed storage for all clusters is at most |C| • d . In each separator, no message is stored since all messages are absorbed by the cluster on the fly. Then the total storage needed for J T algorithm in the LS architecture is at most \C\ • d , which means the space complexity is 0(\C\ • d ). • w+1 w+1 w+1 Chapter 4. Exact Inference Algorithms 48 T h e o r e m 4.7 (Space Complexity of JT in HUGIN Architecture) The space complexity of the generalized junction tree algorithm in HUGIN architecture is 0{\C\)-d ). w+1 P r o o f : To pass messages in H U G I N , each cluster requires only one storage for the combination of constraints allocated to this cluster. After passing messages, the combined constraint is revised to be the marginal of the total constraints combination. So the needed storage for all the clusters is at most \C\ • d . In each separator, one storage is required to save the intermediate marginal constraint, which requires |«S| • d storage at most. The total storage needed for the J T algorithm in H U G I N architecture is \C\ • d + \S\ • d , which is half of the. storage needed for the J T algorithm in SS architecture. Since sep < w and \C\ = |<S| + 1 in trees, the space complexity is 0(\C\ • d ). • Then we discuss the time complexities of the generalized J T algorithm and its variants. w+1 sep s e p w+1 w+l T h e o r e m 4.8 (Time Complexity of JT in Shenoy-Shafer (SS) Architecture) The time complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture is 0{\C\ '- d ). 2 w+1 P r o o f : To combine the initial constraints allocated to clusters, X)l=i ( * ~ -0 ' °" ® operations are needed at most. To compute a message from Ci to Cj, (degi — l)-d ® operations and (d /d — 1)• d © operations are required at most. For each cluster Ci, there are degi messages to compute, so in total £ i = i 9i • ( 9i - !) • ® operations and £ ' = i degi • (d - d ) © operations. Finally, to combine all the incoming messages, each cluster requires at most degi • d ® operations. Combining all of them, the upper bound of ® operations T(®) of the J T algorithm in SS architecture is: r w+1 de w+1 de d W + sep sep 1 w+1 sep w+l |C| T(®) = (n - 1 + deg} - degi + de ) • d £ gi |C| (^degj + r-\C\)-d w+l i=l |C| < 2 ($2 de +r-\C\)-d ' w+ 9i i=l {2 • \S\ + r - \C\) • OT+ 2 < 1 (4 • \C\ + r - |C|) • d 2 w+l And the upper bound of © operations T(©) of the J T algorithm in SS architecture is: 49 Chapter 4. Exact Inference Algorithms |C| T(©) = Y de • (d - w+1 gi d ) sep i=l < < 2 • \S\ • (d 2-\C\-d - w+l d ) sep w+1 If in the semiring R, © and <g> have the same processing time, the total time of the generalized J T in SS architecture is 0(\C\ • d ). • 2 w+l T h e o r e m 4 . 9 (Time Complexity of JT in SS Architecture with Idempotent Semiring (SS-Idemp)) The time complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture with an idempotent semiring is 0(\C\-d ). w+1 P r o o f : To combine the initial constraints allocated to clusters, {r, — 1) • d <g> operations are needed at most. In the upward phase, SS-Idemp architecture needs at most __!=i (deg - 1) • d <g> operations and __[=i (d - d ) ©. In the downward phase, __!=i ® operations and ((degt - 1) • (d - d )) © operations are required. Combining all of them, the upper bound of ® operations T(<gi) of the J T algorithm in SS-Idemp architecture is: w+l w+1 w+1 sep t w+1 |C| = J^{ri-l + de -l + gi l)-d w+1 i=l = (r-\C\ + 2.-\S\)-d < (ICI+r)-^-*" w+1 1 The upper bound of © operations T(ffi) of the J T algorithm in SS-Idemp architecture is: |C| T(©) = J2( 9i- = 2 • |5| • {d de + )-(d l l w+1 -d ) sep i=l < w+1 - d ) • sep 2-\C\-d . w+1 If in the semiring R, © and ® have the same processing time, the total time of the generalized J T in SS-Inv architecture is 0(\C\ • d ). • If semiring R has the combination invertible property, message-computing in SS architecture can be improved by caching the combination of all the incoming messages and dividing the specific incoming message to get the outgoing message for that separator. We call it SS-Inv architecture. w+l sep 50 Chapter 4. Exact Inference Algorithms T h e o r e m 4.10 (Time Complexity of JT in Shenoy-Shafer Architecture with Combination Invertible (SS-Inv)) The time complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture with the combination invertible property is 0{\C\ • d ). w+l P r o o f : To combine the initial constraints allocated to clusters, __l=i ( i ~ 1) ' d _> operations are needed at most. In the upward phase, SS-Inv architecture needs at most J_|=i (degi - 1) • d 0 operations and __!=i (d - ° ) ©• In the downward phase, £J_ '_: ' 0 operations, ((degi - 1) • (d - d )) © operations, and __i=i ((.degi — 1) • d ) 0 operations are required. Combining all of them, the upper bound of <g> operations T(®) of the J T algorithm in SS-Inv architecture is: r w+l u w+l +1 d w+l ep sep 1 w+l |C| T(®) = ]T ^ ~ + 9i - + ! ) ' d 1 de 1 w+l i=l = (r-\C\ + 2-\S\)-d < (|C| + r) • d w+1 w+1 The upper bound of © operations T(ffi) of the J T algorithm in SS-Inv architecture is: |C| ^(©) = J_(de -l + l ) - ( < r +1 f f i -d sep ) t=l = 2 • |5| • (d < 2-\C\-d - d ) w+1 sep w+1 And the upper bound of 0 operations T ( 0 ) of J T algorithm in SS-Inv architecture is: |C| T(0) = Y^ d e 9i~ )-d l w + l ) i=l = (2 • |5| - |C|) • d < |C| • c T w+1 + 1 If in the semiring R, © , 0 and 0 have the same processing time, the total time of the generalized J T in SS-Inv architecture is 0(|C| • d ). • w+l T h e o r e m 4.11 (Time Complexity of JT in Lauritzen-Spiegelhalter (LS) Architecture) The time complexity of the generalized junction tree algorithm in Lauritzen-Spiegelhalter architecture is 0(\C\ • d ). w+1 51 Chapter 4. Exact Inference Algorithms P r o o f : To combine the initial constraints allocated to clusters, £ - (r* — 1) • d 0 operations are needed at most. In the upward phase, LS architecture needs at most Y^fli ( 9i ~ ) ' d ® operations, Y)ili (d - d ) © operations and d 0 operations. In the downward phase, £ i = i d 0 operations and X)-^! (deaj — 1) • (d — d ) © operations are required. Combining all of them, the upper bound of 0 operations T(<g>) of the J T algorithm in LS architecture is: v = 1 de l w+1 w+1 sep u , + 1 w+1 w+1 sep |C| T(®) = = ] T ( r - l +de5 -l + l)-d'" i=i (r-\C\ + 2-\S\)-d < (\C\ + i + 1 i w+1 r)-d w+l The upper bound offfioperations T(ffi) of the J T algorithm in LS architecture is: |C| T(®) = ^(de - l + l)-(d w+1 g i -d ) sep i=i = 2 • |5| • (d™ - d ) < 2 • \C\ • d +1 sep w+1 And the upper bound of 0 operations T ( 0 ) of the J T algorithm in LS architecture is: |C| T(0) .= ^ d ™ 4 1 i=l = 2 • |5| • d < 2 • \C\ • d w+l w+1 If in the semiring R,ffi, 0 , and 0 have the same processing time, the total time of the generalized J T in LS architecture is 0(\C\ • d ). • w+1 T h e o r e m 4.12 (Time Complexity of JT in HUGIN Architecture) The time complexity of the generalized junction tree algorithm in HUGIN architecture is 0(\C\ -d ). w+1 P r o o f : To combine the initial constraints allocated to clusters, Y^li ( i ~ -0 ' °" 0 operations are needed at most. In the upward phase, H U G I N architecture r Chapter 4. Exact Inference Algorithms needs at most £ [ = i {degi - 1) • d w+1 operations and Ylf=\ 0 erations. In the downward phase, £ [ ® operations, d w+1 = 1 52 (d ~ d Jlf2i (degi - w+1 aep © op1) • {d w+l © operations and Yl^li d 0 operations are required. Combining all of them, the upper bound of 0 operations T{®) of the J T algorithm in H U G I N architecture is: sep |C| T(®) = £ (n - 1 + de - 1 + 1) • 9i i=l (r-|C|+2-|«S|)-(i < w + 1 (\C\+r)-d w+1 The upper bound of © operations T(©) of the J T algorithm in H U G I N architecture is: |C| r(8) l)-^ -^) 1 = ^(deoi-l + i=l = 2 • | 5 | • (d - d < 2 • |C| • ' w+1 s e p ) And the upper bound of 0 operations T ( 0 ) of the J T algorithm in H U G I N architecture is: T(0) = £d i=l s e p = 2-\S\-d < 2-\C\-d sep w+l If in the semiring R,ffi, 0 , and 0 have the same processing time, the total time of the generalized J T in H U G I N architecture is 0(\C\ • d ) . • w + 1 4.2.7 Discussion of A p p l y i n g the J T A l g o r i t h m In general, junction tree algorithms are memorized dynamic programming, which cache solutions for subproblems to answer multiple queries. Both the time and space complexities of junction tree algorithms are polynomial in the size of the junction tree, with a constant factor exponential in the maximum subproblem size (width of the tree plus 1). So the key to applying J T algorithms is to divide the original C B I problem into subproblems with tractable sizes. However, - d ) sep Chapter 4. Exact Inference Algorithms 53 many practical C B I problems have large treewidth. As the lower bound of the width of any tree decomposition, a large treewidth makes exact inference in the junction tree infeasible. There are two ways to perform approximate inference In such a situation: (1) Splitting the oversized clusters, which corresponds to removing some edges from the primal graph representation, or retracting some constraints in the original C B I problem; (2) Using junction graphs to perform inference, instead of junction trees. In a junction graph, messages can pass in loops and may not terminate, which means that information may be counted repeatedly. Some criteria are used to terminate the message-passing after reaching preset thresholds. Both of these approaches will be discussed in Chapter 5. Another interesting approach for performing exact inference in junction trees is to consider the effect of variable values and their impacts on the structure of the tree. We will briefly discuss this in Section 8.2. All concrete junction tree algorithms discussed in this chapter use serial message-passing schemes. In practical applications, there are more efficient implementations. Both paralleled and hybrid message-passing schemes can be applied to achieve computational efficiency. The time and space complexities of the generalized junction tree algorithm, as well as the generalized variable elimination algorithm, are compared in Table 4.2. Here Q = (V,£) is the primal graph of a C B I problem (X,D,R,F). T = (C,S) is a junction tree transformed from Q. Both the V E and the J T algorithms are exponential in the induced width of a given elimination ordering, or the width of the junction tree representation. For the V E and the J T algorithms with combination invertible properties or combination idempotency properties, the time complexity is linear in the size of the problem, whereas in the generalized J T , it is quadratic in the size of the problem. Table 4.2: V E and J T Complexity Comparisons Algor. Space Complexity Time Complexity GVE GJT GJT-Idemp GJT-Inv GJT-LS GJT-HUGIN 0(|V| + 0(\F\ • 0(\C\-d ) 0(\C \-d ) 0(\C\-d ) 0(\C\ • d ) 0(\C\-d ) w + l w+l w + L w+v w + l <r ) +1 0{\C\ -d ) 0(\C\ -d ) 0{\C\-d ) o(\c\ • d ) 0{\C\-d ) 2 w+1 w+i w+v w+v w+v The big O expressions in Table 4.2 do not convey sufficient information about the running times of these algorithms. The upper bounds of different operations in these algorithms are listed in Table 4.3. According to the running time of each concrete operation in Java (cf. Table 4.4), users can compare the running times of different junction tree algorithms and instantiate these algorithms for their own purposes. For a C B I problem with the combination invertible property, More specifically, for a C B I problem defined on a semiring with the combination invertible property, generally we conclude: (1) GJT-Inv uses the least time, followed by G J T - H U G I N and then Chapter 4. Exact Inference Algorithms 54 Table 4.3: Upper Bounds of V E and J T Running Times 0 © 0 GVE (r - |V|) • d |V| • d 0 2 . |C| • (d d P) GJT 0 (4 • |C|* - Id + r) • d GJT-Idemp (\C\+r)-d 2 • |C| • {d - d P) 0 (\C\+r)-d GJT-Inv 2 • |C| • (d - d P) Id •d (\C\ + r)-d + 2 • |C| • ( d - d P) 2- |d -d GJT-LS GJT-HUGIN (\C\+r)-d 2-\C\- (d -d P) 2 • |d • d Algor. w+L w+l w+l w+i w+i w L w+l w+1 se w+1 se w+1 se U J + 1 w+L w+l se se w+1 s e p G J T - L S ; (2) G J T - L S uses the least space while G J T - H U G I N and GJT-Inv use about the same space. This conclusion is supported by the experimental results in Chapter 7. Table 4.4: Running Times (per 10 operation) for Different Operations in Java on a PIII833MHz P C , Running Windows X P a n d JDK1.4 Operations Running Time (ms) Objects in Java X 1570 Double, Float, Integer 1534 Double, Float, Integer + 1584 Double, Float, Integer -fmax 1687 Double, Float, Integer min 1688 Double, Float, Integer V Boolean 128 Boolean A 128 6 4.3 Summary In this chapter, two popular exact inference algorithms for C B I problems are generalized based on our semiring-based unified framework. Both variable elimination algorithms and junction tree algorithms are dynamic programming, where junction tree algorithms cache solutions of subproblems for answering multiple queries. In some semirings with the combination invertible property or the combination idempotency property, the J T algorithms can be improved to achieve more computational efficiencies. Generalized versions of these improved algorithms are discussed as well. In this chapter, we conclude that many concrete algorithms for CBI problems from different disciplines are actually special cases of the generalized variable elimination algorithms and the generalized junction tree algorithms. These algorithms are listed in Table 4.5 for comparison. The complexity analyses indicate that both the time and space complexities of V E and J T algorithms are polynomial in the size of the CBI problem, but have a constant factor which is exponential in the width of the tree decomposition. Chapter 4. Exact Inference Algorithms 55 Therefore, finding an optimal elimination ordering to minimize the induced width, which is equivalent to finding a tree decomposition with the minimum width, is the key of applying these two exact inference algorithms. For many practical problems with intractable treewidth (lower bound of the widths of all tree decompositions), exact V E and J T algorithms are infeasible. In such cases, we can either use approximate inference approaches if permitted, or exploit the value of variables, as well as the structures of the C B I problems to reduce the scale of the problems. Satisfiability Problems Table 4.5: Concrete V E and J T Algorithms in Different Fields. JT V E for CSP(Seidel [54]); Adaptive Consistency Bucket-Tree Elimination (Dechter and Pearl [17]); Arc Consistency for J T (TAC) (Zhang and (AC)(Dechter and Pearl [16]) Mackworth [65]) Bucket-Tree Elimination (Dechter and Pearl Davis-Putnam (DP)(Dechter and Push [18]) Probability Inference in BN Variable Elimination Algorithm [64]; Bucked Elimination (Dechter [14]) Fields Constraint Satisfaction Decision Making Dynamic Bayesian Network Possibility Inference Coordination Games Max-Likelihood Decoding VE - [17]) SS (Shenoy and Shafer [55]); L S (Lauritzen and Spiegelhalter [39]); and H U G I N (Jensen et al. [29]) J T for ID (Jensen et al. [30]) Belief Propagation (BP) - - V E for Coordination Graph (Guestrin et al. [25]) Viterbi Decoding (Viterbi [57]) - Transform to V E of B N (Zhang [63]) Generalized Distributive Law (GDL) (Aji and McEliece [1]) Chapter 5. Approximate Inference Algorithms 57 Chapter 5 A p p r o x i m a t e Inference Algorithms 5.1 Motivation for the Design of Approximate Algorithms The analyses in previous chapters show that both the variable elimination algorithm and the junction tree algorithm can perform exact inference for a constraint-based inference problem in polynomial time and space. However, there always exist constant factors in the time and space complexities that are exponential in the maximum cluster size of the problem's tree decomposition. This means that the generalized exact inference algorithms would be infeasible when the treewidth of the corresponding problem, a lower bound of width for all possible tree decompositions, is intractable. This practical challenge encourages researchers to develop approximate inference algorithms for C B I problems from different disciplines, if approximate inference results with some quality guarantee are acceptable in their application domains. The key idea of these approximate inference algorithms is to restrict the size of the maximum subproblem, or equivalently, the maximum cluster size of a tree decomposition or the induced width given an elimination ordering, to an acceptable level. In general, there are at least two possible ways to design an algorithm to achieve this purpose. The first approach is to revise the original CBI problems by removing some less important constraints, which makes the structure of the problem's graphical representation much simpler; another approach does not touch the original C B I problem but re-organizes it into more complex graphical representations, e.g., a junction graph with loops. Inference procedures are carefully re-designed to cope with these graphical representations. 5.2 Algebraic Approximation for V E and J T The algebraic foundation of approximate algorithms is based on Eq. 5.1. Here F is a set of constraints and Fi is a subset of constraints for i = 1, • • •, b, where F i U • • • U F = F and F; n Fj = 0 for any i, j € {1, • • •, b], i ^ j. The basic idea here is breaking the original C B I problem into 6 (overlapped) subproblems, solving them individually, then joining them again to get the solution. b Chapter 5. Approximate Inference Algorithms ©<8>/«<8>(©<g>/) Y f£F i=l 58 (s-i) Y f€Fi Algorithm 5.1 describes how to compute the approximate marginal of combination. A l g o r i t h m 5.1 The Approximate Marginal of Combination Algorithm (ApproxMC) Input: F : set of constraints with variable Y in their scopes, b: number of approximate sets, w : threshold width O u t p u t : F' = { A , •••J } where 0 , /' « © ® / l: 5 = 0 max b / < 6 F y / g F 2: for each / G F do 3: 5 := SL)Scope(f) 4: end for 5: if | 5 | - 1 < w ax m 6: /' :=©y<g> then / e F / 7: F' := {/'} 8: else 9: F' := 0 10: for j = 1 to b do 11: F j := 0 12: end for 13: for each / G F do Choose j G {1, • • •, b} Fj := Fj- U {/} 14: 15: 16: 17: ' e n d for for j — 1 to 6 do 18: F'j := ApproxMC(Fj, Y, 6, w a ) m X 19: F' := F' U F ' j 20: end for 21: end if 22: Return F ' In the following sections we will present the approximate V E and J T algorithms based on the subroutine ApproxMC. 5.2.1 Approximate Variable Elimination Algorithm According to the generalized exact variable elimination algorithm in Chapter 4, the bottleneck of computation occurs when we eliminate a variable Xi, too many constraints with Xi in their scopes have to be combined, which implies a large constraint after combining and marginalizing. For example, in Figure 5.1(a), Y\,---,YQ are fully connected after eliminating Xi, which means that the size of the marginal constraint is exponential in 6. 59 Chapter 5. Approximate Inference Algorithms One way to overcome this bottleneck is to clone X i with many identical copies • • •, x\ \ Constraints with X i in their scopes are revised by replacing X i with X ^ \ j G {1, •••,&} according to specified rules. Then the V E algorithm can be applied. In Figure 5.1(b), applying 6 = 2 leads to two marginal constraints with cubic sizes. Of course, introducing identical copies x\ \ • • • , X ^ of X i will introduce conflicts of the X i values and errors in the marginalization. Experimental analyses appear in Chapter 8. b X (a) (b) Figure 5.1: Graphical Interpretation of Approximate V E . Min-Buckets [19] is the first algorithm proposed for applying this idea to solve probability inference problems approximately. In this section, we generalize the idea of Min-Buckets as the approximate V E algorithm in Algorithm 5.2. A l g o r i t h m 5.2 Generalized Approximate V E Algorithm (GVE-Approx) Input: A CBI problem {X, D, R, F ) , a variable subset of interest Z, 6, w O u t p u t : g Bi{Z) = ® , / ~ @ - z <S>/eF / l: Choose an elimination ordering < Y i , • • •, Y >, where {Yi, • • •, Y ) = X - Z m / C e F X k 2: 3: for i = k to 1 do for each / G F do 4: F • 5: b if 6:" 9: := 0 Yi G F Scopetf) :-• F 7: 8: k b then F \ { f } : = F b U { f } e n d if e n d for 10: F ' : = 11: F - - F U F ' A p p r o x M C { F 12: e n d for 13: Return gcBi{Z) b , Y i , b , w m a x ) ' = (g) / G F / - a x Chapter 5. Approximate Inference Algorithms 5.2.2 60 Approximate Junction Tree Algorithm The generalized junction tree algorithm can be modified to perform approximate inference based on Eq. 5.1. The basic idea of the approximate J T algorithm is to pass a set of messages from one cluster to another, instead of passing a single message. The combination of these messages is an approximation of the message passed in the exact J T algorithm. For example, in Figure 5.2(a), rriij is marginalized from the combination of m i to me, together with the local constraint of CV, in Figure 5.2(b), rrty is approximated by the combination of m|j' and m | | \ each of which is generated from the marginalization of three messages and a part of local constraints. (a) ; (b) Figure 5.2: Graphical Interpretation of Approximate J T . The Min-Clustering Tree [43] is an approximate probability inference algorithm following this idea. In this section, we generalize the idea of The MinClustering Tree as the approximate junction tree algorithm in Algorithm 5.3. For semirings with combination invertible properties, the reverse of a message is similarly approximated by the combination of a set of the messages' inverses. The approximation can be formalized as Eq. 5.2. i ® ( © ® / ) « ® Y f£F (i0(0(g)/)] t=l \ y fG.Fi (-) 52 J After applying Eq. 5.2, we can revise any one of the three exact junction tree algorithms for invertible semirings to cope with approximate inference tasks. 5.2.3 Discussion Algebraic approximation is the foundation of many approximate inference approaches for C B I problems. On the C B I problems level, it is equivalent to retracting some given constraints. On the primal graph representation level, it is equivalent to removing some edges from the graph. On the junction tree representation level, it is equivalent to splitting some clusters in the junction Chapter 5. Approximate Inference Algorithms 61 A l g o r i t h m 5.3 Generalized Approximate J T Algorithm (GJT-Approx) Input: A junction tree T = (C,S) of a C B I problem (X, D, R, F), a variable subset of interest Z, b and w Output: { * J s.t. 3d e C, g Bi(Z) = ® * * . {®Ct\z <t>) ~ m a x C e C c © x - 2 •S'/eF / l: Attach each cluster Ciwith a constraint set$c = 0 i = 1 to r do 3: Find a cluster C j such that Scope(fi) C C j 4: * :=*c<U{/i} 5: end for 6: for each Edge which is from cluster Cj to C j do 7: M(Cj,Cj):=0 8: if Cj has received messages from all its neighbors other than C j then 9: Q := $ 10: for each C; e (Neighbor(Ci) \ {Cj}) do 11: Q:=Q\JM{Ci,d) 4 2: for C i Ci end for M{C Cj) := 4 p p r o x M C ( Q , C i \ Sij, 6, u i 14: end if 15: end for 16: for each C i £ C do 12: 13: J U 17: 18: for each C j £ Neighbors(d) $ :=$ ,UM(Cj,Ci) 19: end C i 20: 21: end m Q X ) do C for •= ApprOxMCi^Ci^^^m^) for tree. The purpose of these approximations is to restrict the size of the maximum subproblem to a tractable level. Though the idea of algebraic approximation is straightforward, it is hard to analyze the error bounds of these approaches, especially when the combination and marginalization operations are abstract. So far there are few theoretical guidelines for choosing which constraints should be released (which edges should be moved, how to split a cluster); only empirical analyses currently apply in such cases. 5.3 Thin Junction Tree Algorithm Recently, Paskin [48] introduced the thin junction tree algorithm to solve Simultaneous Localization and Mapping (SLAM) problems. A thin junction tree is a tractable approximation of the original junction tree representation that imposes an upper bound of the cluster size. Generally, it can be seen as a special case of the generalized approximate J T algorithm in Section 5.2.2. Given a cluster C with intractable size | C | , the thin junction tree algorithm splits it into two clusters. One cluster C i consists of only one variable v and another cluster 62 Chapter 5. Approximate Inference Algorithms C consists of \C\ — 1 variables, where v £ C . If the cluster C has only one neighbor cluster C" with v € C, C\ can be safely absorbed by C". A l l of the incoming messages to C , other than the one from C", can be redirected to C . Variable v is not contained in these messages. Then we can connect C to C and remove C from the junction tree. The size of C is reduced to \C \ = \C\ — 1. The procedure can be performed repeatedly to restrict the cluster size under an imposed level. 2 2 2 2 2 5.3.1 Thinning through Variable Contraction According to the description in [48], the basic operation for making a thin junction tree is variable contraction. Definition 5.1 (Variable Contraction [48]) Let x be a variable that appears in more than one cluster of the junction tree T. Let C be a leaf of T (the subtree of T induced by clusters containing x). Let S be the separator joining C to T . A variable contraction of x from C removes x from C and S, then marginalizes x out of 4>C (and out of 4>s if using HUGIN architecture). If G becomes non-maximum, it is merged into its subsuming neighbor. x x Let T be a thin junction tree after performing a variable contraction of T . It is shown [48] that T is still a tree with the running intersection property. The procedure of variable contraction is formalized in Algorithm 5.4, according to [48]. A l g o r i t h m 5.4 Variable Contraction (Contract(v,Ci)) Input: Contract variable v from cluster d, $ C i is a set of constraints allocated to d Output: after contraction 1: Let Cj be the parent cluster of Ci in T v 2: 4> '•= 0 Ci 3: </>„:= 1 4: for each / £ $ C j do if Scope(f) C Cj then <l'c, := U {/} 5: 6: else if v e Scope(f) then 7: 8: 9: 4>v •= 4>v ® / else 10: 11: 12: $ Q : = $Ci U {/} e n d if e n d for Ci := Ci \ {v} 14: Sij := Sij \{v} where Cj i / = ®v4>v 13: 1 5 : 16: : $ C i := $ C j U {/} Chapter 5. Approximate Inference Algorithms' 63 There are two special cases for performing variable contractions. The first case is that v appears only in one cluster C , of the tree. In this case, we can clone Ci to obtain an identical cluster Cj, add an edge between C j and Cj, contract v from Ci to Cj, and finally contract some other variable u from Cj to Ci. The second case is that v appears elsewhere, but v is not a leaf of T . In this case, we can contract v from the other clusters until Cj is a leaf of T , then contract v from Cj. v v 5.3.2 Generalized T h i n J u n c t i o n Tree A l g o r i t h m The basic procedure of the thin junction tree algorithm is to (1) build a junction tree, (2) limit the maximum cluster size by a given upper bound by performing variable contraction, and (3) pass messages within the thin junction tree. The generalized thin junction tree algorithm within the semiring-based C B I framework is shown in Algorithm 5.5. Algorithm 5.5 Generalized Thin Junction Tree Algorithm (GThinJT) Input: A junction tree T = (C,S) of a C B I problem P = {X, D, R, F)\ Z: an variable set of interests; w : the upper bound of cluster size Output: g Bi{Z) « ® \z <8>/SF/ ' 1: Attach each cluster Cjwith a constraint set$C; = 0 2: for each / e F d o 3: Find a cluster d such that Scope(f) C C j max x C : 4: $ := 5: end for C i U {/} 6: for each d € C s.t. | C i | > w do 7: v = ToBeContracted(Ci) 8: if v only appears in Cj then 9: Clone Cj = Cj 10: C := C U Cj and 5 : = 5 U 11: Contract(v,Ci) 12: u = ToBeContracted(Cj) 13: Contractu, Cj) 14: else if v is the leaf of T then 15: Contract(v,Ci) 16: else 17: Recursively contract v from leaf of T until Cj is the leaf of T 18: Contract(v, Ci) 19: end if 20: end for 21: Perform message-passing of the generalized J T algorithm in the thin J T f=(C,S) 22: Find a cluster C G C s.t. Z C C max v v 23: Return __•_»/(£) « © c \ z ® ®_,€Jv i_/,fcor.(6) e v ™(CjiC) Chapter 5. Approximate Inference Algorithms 5.3.3 64 Discussion The key of the thinning junction tree algorithm is to implement procedure v = ToBeContracted(Ci), which chooses a variable to contract from the cluster. The chosen variable should minimize the approximation error. In probabilistic inference, the approximation error is often measured as Kullback Leibler(KL) divergence. Definition 5.2 (Kullback Leibler(KL) Divergence [34]) KL divergence D(p \\ q) of two discrete probability distributions p(x) and p(y) is defined as: Paskin [48] shows that the error introduced by contracting x from a cluster C can be computed using only the marginal over C. Thus the error can be estimated through local computation; also the errors of a series of contractions are additive, providing a way to estimate the approximation of the accumulated error of the thinning junction tree algorithm. For other application domains without explicit error measurements or error additive properties, heuristic searches can be applied. For example, a variable induced minimum local constraint (within a cluster) can be the one to contract. There are very few studies on this topic. It deserves more attention. 5.4 Loopy Message Propagation The algebraic approximation is equivalent to relaxing some constraints of the original C B I problems, removing some edges in the primal graph representations, or splitting some clusters in the junction tree representations. These approximations mean that the original C B I problems are revised and simplified to make computation feasible. Loopy message propagation, by contrast, does not revise the original C B I problems. As already known, in junction tree algorithms both time and space complexities are bounded by the maximum cluster size. To maintain junction tree properties, the maximum cluster size is usually large in practical problems. If junction tree properties are relaxed, in other words, if the secondary structure is not necessarily a tree but a graph with loops, the maximum cluster size can be dramatically reduced. At the same time, the message-passing may not terminate due to the introduction of loops. Also messages will be repeatedly counted. Both of these bring errors of inference. However, empirical results in the probability inference [46] and Turbo Codes decodings [44] show that the same message-passing schemes in exact junction tree algorithms work well in junction graphs with loops. When the junction graph is singly connected, the exact inference result is produced after one iteration of message-passing. When there exists only a single loop in the junction graph, it is guaranteed to converge to the correct result under some conditions [60]. Chapter 5. Approximate Inference Algorithms 5.4.1 65 Building a Junction Graph Definition 5.3 (Junction Graph) Given the primal graph Q = (V, £) of a CBI problem P = (X, D,R, F), a junction graph is J = (C,S), where C = {Ci,- • • , C } is a set of clusters, each cluster Ci corresponds to an aggregation of a subset vertices (correspondingly, variables) Vc C V); and S = {(Ci, Cj)\Ci, Cj £ C} is a set of separators between Ci and Cj. In addition, a junction graph satisfies that for any constraint f £ F, there exists a cluster d £C s.t. Scope(f) C Cj. n t A junction tree is a junction graph without loops. In a junction graph, both the constraint allocation property and the running intersection property still hold. Given a C B I problem P = (X, D, R, F), there are many ways to generate a junction graph. The most straightforward way is shown in Algorithm 5.6. Following this algorithm, the maximum cluster size of the generated junction graph is the size of a constraint with the maximum number of variables in its scope, which is relatively small to cope with. The junction graph generated by Algorithm 5.6 is sometimes called the dual graph. A l g o r i t h m 5.6 Building a Junction Graph for a C B I Problem Input: A C B I problem P = (X, D, R, F) and a variable subset of interest Z O u t p u t : A junction graph J = (C, S) l:C:={Z} 2: S:=0 3: for each / £ F do 4: 5: 6: 7: if Scope(f) C d for some d € C then Continue else if d C Scope(f) for some d € C then Replace Cj with Scope(f) 8: else 9: 10: 11: Cluster Ci := Scope(f) C := C U C* end if 12: end for 13: for each Cj, Cj £ C do 14: if Cj n Cj / 0 then 15: 16: 17: Sij := Ci O Cj S-.-SUSij end if 18: end for A dual graph can be revised to be a junction graph with less clusters and a larger maximum cluster size. The basic operation of the revision is the merging of neighboring clusters. Given two clusters Ci and C2 in neighbors, merging is defined as the following sequence of actions: (1) introducing a new cluster 66 Chapter 5. Approximate Inference Algorithms C = C i UC2, (2) connecting all neighbors of C\ and C2 to C , and (3) removing C i and C2 from the graph. The size of the merged cluster obviously increases. It is a tradeoff between the number of clusters and the maximum cluster size. 5.4.2 Generalized Loopy Message Propagation A l g o r i t h m The generalized loopy message propagation for a junction graph J = (C,S) of a C B I problem P = (X,D,R,F) is shown in Algorithm 5.7. The basic idea of the generalized loopy message propagation is the same as message-passing in the generalized junction tree algorithm, except that messages at iteration t are updated by incoming messages at iteration t — 1. The initial messages are produced based on local constraints and do not depend on the incoming messages from neighboring clusters. A l g o r i t h m 5.7 Generalized Loopy Message Propagation Algorithm (GLoopy) I n p u t : A junction graph J = (C,S) of a C B I problem P = (X,D,R,F)\ Z: an variable set of interests; t : maximumiterationtimes O u t p u t : g Bi{Z) « ( $ \ z <8>/_F/ l: Attach each cluster C,with a constraint rpd = 1 2: for each / G F do 3: Find a cluster Cj such that Scope(f) C Cj 4: (Pd •= <Pd <8> / 5: end for 6: for each Ci G C do 7: for each Cj G Neighbors(Cj) do <• 8: m^(Ci,Cj) ~© . ©C; 9: e n d for 10: e n d for 11: for t = 1 to t do 12: for each Ci G C do 13: for each C j € Neighbors(Ci) do 14: m ^ ^ C j ) := © . <P ® < 8 > c y . / . 6 o r . ( c ) - ^ " ( ^ , C ) 15: end for 16: end for 17: e n d for 18: Find a cluster C G C s.t. Z C C 19: Return __•_/(£) « 0 0 ® <g>_,_jv« /.i»r.(c ) d) max C X c X 5 u max fn(t C A S i C X Z C ie Ci ff Vei i J 1) i ( If a C B I problem is defined on a semiring with the combination invertible property, the algorithm can be revised slightly to save computational cost. Details of this revised version of the generalized loopy message propagation algorithm for semirings with the combination invertible property are shown in Algorithm 5.8. 67 Chapter 5. Approximate Inference Algorithms A l g o r i t h m 5.8 Generalized Loopy Message Propagation Algorithm with Inverses (GLoopy-Inv) Input: A junction graph J of a C B I problem P = (X,D,R,F); = (C,S) an variable set of interests; t : max Z: maximumiterationtim.es « 0 \z<8>/ef/ O u t p u t : 9CBI{Z) x l: Attach each cluster Cjwith a constraint 4>d = 1 2: for each / £ F 3: do Find a cluster Cj such that Scope(f) C Cj 4: 4> •- 4>d ® / Ci end for 6: for each Cj £ C do 5: 7: for each Cj G Neighbor s(d) m<°>(c7 ,Cj):=e AS *c 8: 9: 10: 12: for each d 13: ^ 14: 18: do i m a x do G C do : = 0 C ® O c ^ / V e . g W o r ^ C O m * ' " ^ , Ci) 1 for each Cj G Neighbors(Ci) 15: 17: w end for end for l i : for t — 1 to i 16: C i do m W ^ . C j O - ^ O m ^ ^ ^ i ) end for end for end for 19: Find a cluster C G C s.t. Z Q C 20: Return g Bl{Z) « © \ 0 C ® (SceJVeisfcfcoraCC) " » ( C j , C) ( 0 C 5.4.3 C Z Loopy Message Propagation A s an Exact Inference A special case of applying loopy message propagation is performing inference on CBI problems that are defined on semirings with the combination idempotency property. Since the idempotency of combination implies that repeatedly counted messages will not introduce errors, loopy message propagation returns exact answers after finite iterations. T h e o r e m 5.4 (Loopy Message Propagation As an Exact Inference) Loopy message propagation algorithm (Algorithm 5.7) runs as an exact inference algorithm in CBI problems with combination idempotency property. Maximum iterations to be converged are the diameter d of the graph, which is defined as the number of edges in the longest path (with backtrack, detour, or loop excluded from consideration) between any two vertices of the graph. Proof: Consider any cluster C\ in the junction graph J = (C,S) of a C B I problem. It will receive messages from another cluster C in d iterations at most. Since passing through other clusters will not filter the message from C and repeatedly counting a message will not introduce errors, C\ will receive messages 2 2 Chapter 5. Approximate Inference Algorithms 68 from all of the other |C| — 1 clusters after d iterations at most. Combining these messages with local constraints in C\ will result in the marginal of the joint combination of total constraints. • Classic CSPs can be embedded into our semiring-based unified framework using the semiring R = ({false, true}, V, A). It is easy to show that the combination operation A, logical A N D , is idempotent. According to the theorem, loopy message propagation should be an exact inference algorithm for classic CSPs. In practice, arc-consistency [42], the key technique for CSPs, can be seen as a special case of the generalized loopy message propagation for C B I problems with idempotent semirings. In arc-consistency, the messages are the possible values of the variable in a separator. Combining messages from other clusters will remove illegal values, in other words, revise the outgoing message. The algorithm terminates when all the messages remain the same. 5.4.4 Discussion In the probability inference field, the convergence of loopy message propagation in a junction graph with a single loop is proven. For more complex junction graphs of probability inference, little theoretic work has been done. For classic CSPs, loopy message propagation is proven as an exact inference algorithm. Chapter 7 will present the empirical results of applying loopy message propagation in classic CSPs. Loopy message propagation is natively a distributed and parallel algorithm. At the iteration t, each cluster can compute its outgoing messages by collecting all the incoming messages from other neighbor clusters at iteration t — 1 in parallel. The number of needed processors is the same as the number of clusters. We may resort to junction graph building approaches to reduce the number of clusters, while increasing the maximal cluster size at the same time. 5.5 Hybrid Junction Tree Algorithm The last approach for approximate inference of C B I problems is the hybrid junction tree (HybridJT) algorithm. Generally, it passes messages exactly to/from clusters with tractable sizes. When a cluster with intractable size is encountered, the hybrid junction tree algorithm builds a local junction graph based on local constraints and incoming messages. Loopy message propagation is used in this local junction graph. After several iterations of loopy message propagation, the approximate messages of such a cluster are generated and ready to pass to its neighbors. Formally, the hybrid junction tree algorithm is shown in Algorithm 5.9. The hybrid junction tree algorithm here is analogous to the generalized approximate junction tree algorithm, except in the method of dealing with large clusters. In ApproxJT, large clusters are split into small sub-clusters, whereas in HybridJT, large clusters are treated as subproblems, which are approximately solved using loopy message propagation. In addition to the local constraints of a subprob- Chapter 5. Approximate Inference Algorithms 69 lem, all the incoming messages are seen as constraints of the subproblem. A l l outgoing messages are initially seen as unitary constraints. After performing loopy message propagation over the junction graph of the subproblem, we can approximately produce outgoing messages. 5.6 Summary The exponential complexities of exact inference algorithms for C B I problems make approximate approaches not only desirable but mandatory in practical scenarios. A l l generalized approximate inference algorithms presented in this chapter aim to restrict the maximum cluster size or induced width and to reduce the exponentially large factor in the complexities expression. The cluster sizes actually represent sizes of subproblems. We can achieve this purpose through either relaxing some constraints or re-organizing subproblems in a more complex way. With a bounded subproblem size, all exact inference approaches can be modified to cope approximately with scale problems. So far, approximate inference approaches are mainly studied empirically. Little theoretical work has been done to analyze the bounds of approximation errors or give directions for choosing approximate parameters. These are definitely interesting topics for future study. Chapter 5. Approximate Inference Algorithms 70 A l g o r i t h m 5.9 Hybrid Junction Tree Algorithm (HybridJT)-, Input: A junction tree T = (C,S) of a CBI problem P = (X,D,R,F); Z: an variable set of interests; w x- upper bound of cluster size; b: number of partial messages O u t p u t : (Approximate) Consistent junction tree that is ready to answer queries l : Attach each cluster Ci with a constraint set$Ci = $ ma 1 for each / _ F do Find a cluster d such that Scope(f) C Cj $C< : = $ U {/} Ci end for for each Edge Sij which is from clusters Cj to C , do M(Ci,Cj) := 0 if Cj has received messages from all its neighbors other than Cj then Q •= *c, for each C, € (Neighbor(Cj) \ {Cj}) do Q:=QuM(Q,Ci) • end for if |Q| < w then M(Cj,Cj):={© max C A S i .(g) / 6 Q /} else if \Sij\ < w then Define : 5< j -» 1 Q:=QU{ (°)} max 0 else Split S j into 6 parts, \J for k = 0 to b — 1 do Define o< ) : S $ -» 1 Q:=QU{ W} b it k=l S\j = 5j,j fc 9 end for end i f Building junction graph J of constraints set Q and do loopy message propagation over J let Q be the cluster in J to represent for k' = 0 to b — 1 do k Af (Ci, C,) := Af (Ci, Cj) U ® end for end i f end i f end for Q l £ N e i g b o r s { Q k ) m(Qu Qk) Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 71 Chapter 6 Generalized C B I Toolkit in Java - G C B I J 6.1 G C B I J Motivation In a constraint-based inference problem, the two basic constraint handling operations, combination and marginalization, are expressive enough to generalize various inference approaches. The semiring-based unified framework for C B I problems introduced in Chapter 3 is based on this observation. The proposed framework defines the generalized representation of C B I problems in terms of discrete variables and constraints on local variables. It also uses the semiring, an importation concept in abstract algebra, to generalize operations of the constraint combination and marginalization. Any C B I problem, then, can be theoretically represented by our semiring-based framework. Any discrete inference approaches based on combination and marginalization operations can be theoretically interpreted by the framework as well. In previous chapters, we have embedded into this framework many concrete C B I problems, such as probabilistic inference, decision-making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference. We also show that many concrete exact and approximate inference algorithms can be generalized within the framework. In other words, we show that the semiring-based unified framework is powerful enough to represent both CBI problems and various concrete inference approaches. To prove the representation power of our proposed semiring-based unified framework in practice, we implement a software toolkit, named Generalized Constraint-Based Inference Toolkit in Java (GCBIJ). It is a concrete system for implementing the ideas and concepts of our semiring-based unified framework. Besides demonstrating the representation power of the framework, there are also two reasons for implementing the framework: (1) G C B I J provides a series of generalized inference approaches, either exact or approximate, to solve practical C B I problems. Users can tackle problems in their own domains simply through calling procedures provided by this toolkit. They only need to specify or design their task-specific semirings. (2) Researchers can design, ver^ ify, compare, and analyze different inference approaches for C B I problems on GCBIJ's platform. Discovering these general ideas for performing inference is more straightforward by dealing with abstract versions of inference algorithms. The process of abstracting inference approaches from other disciplines provides Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 72 a good opportunity to improve the algorithm design in one discipline. 6.2 G C B I J System Architecture The main body of G C B I J consists of 7 packages. Figure 6.1 shows the relations among these packages. We will discuss the details of G C B I J implementations in the following sections. 1. Semiring: Various semirings are defined and implemented in this package. Public interfaces are shared among semirings, which provide access to combination and marginalization operations, as well as to the identity and absorbing elements of each operation; 2. C B I T a s k : This package provides the objects (variable, domain, constraint) to specify a C B I problem. Together with an instance of a specified semiring, a C B I problem can be fully characterized; 3. G r a p h : Two sub-packages are included in the package Graph: • Components: This package provides the primal graph, junction tree, junction graph, and related graphical components . The basic graphical elements, such as vertices and edges, are implemented through wrapping OpenJGraph [31], an open source Java Graph and Graph Drawing library under G N U Lesser General Public License; • Algorithms: This package provides the basic algorithms of the graph theory, including triangulation, cliques identification, and spanning tree finding; 4. Inference: This package provides various inference algorithms. These algorithms are categorized into exact inference and approximate inference. A l l of these algorithms solve specified C B I tasks by accessing the operations of the abstract Semiring interface; 5. Parser: To solve concrete C B I problems, different parsers are implemented to translate various types of practical problem representations into the internal format of our framework; 6. Utilities: For research purposes, we need to collect information about inference approaches, either in terms of C P U time or in terms of the number of operations. Some statistic utilities are implemented in this package to assist with our algorithm analyses; 7. G C B I J J V I a i n : This example package includes routines on how to use G C B I J integrally. Example C B I problems and inference algorithms are tested. Statistic utilities are used here to analyze various inference algorithms. There are also some test units included in this package. i ,-Pfcgr Graph.Components 1. Graph 1 -v Edge Vertex - r "PKg: i : Graph.Algorithms I ShortestPath MinFillln r-i "7ST JGraph MaxCard SmallWorldParam Separator HeuristicSearch TriangulationAlg Cluster JTree 1 - _ _pkg; GCBIJ_Ma in pm 1 11 Domain ^Variable 1 * 1 _, i 1 1 Pkg: Parser •valueList i GCBIJ i+query(): Constrainti +findOneAssign() +findAIIAssign() r Semiring j+product(): object +sum(): object i+getSumldentity(): object i+getProdldentity(): object i+isCommutativeO : bool «interface» 4 - Maih •task : CBITask •graph : Graph •algor: InferAlg Constraint Semiring Parser +parse() CBITask •varList •valueTable +getOrdenng() • - BHParser^XmlbifParser DimacsParseri +parseCmdLine()i - ~ Pkg: ~ ~ StatisticUtilities TestUnit 1 «interface» llnversiable lldempotent +divide(in valueList) : object\ +normalize(): object +compares(in valueList): boon Inference InferAlg GenericlnferAlg •params SumProdSeminng MaxProdSeminng OrAndSemirmg +query(): Constraint| +findOneAssign() +findAIIAssign() — E ApproxInferAlgl ExactlnferAlgl i-params 1 VEExactl nferAlg VEApproxInferAlg J T E x a c t l n f e r A Ig JTApproxInferAlg LoopyApproxInferAlg HUGIN_JTApproxlnferAlg HybndApproxInferAlg | 2T LS_JTExactlnferAlgi HUGIN JTExactlnferAlgl ldempotent_JTExactlnferAlg |LS_JTApproxlnferAlg ldempotent_JTApproxlnferAlg ThinJTApproxInferAlg _ Figure 6.1: System Architecture of GCBIJ i _ _ _ _ _ _ J Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Table 6.1: Public Methods of the Class Semiring Method Name Comments Return Semiring(int) Object Object Object product(Object, Object) sum(Object, Object) getSumIdentity() Object getProdldentityQ Object getSumAbsorb() Object getProdAbsorb() boolean isCommutative() 6.3 74 Constructor with specified semiring type ID. Binary combination operation. Binary marginalization operation. Get the identity element of marginalization, if exists. Get the identity element of combination, if exists. Get the absorbtion element of marginalization, if exists. Get the absorbtion element of combination, if exists. Determine if it is a commutative semiring. G C B I J Implementation and Interfaces G C B I J is capable of representing a set of C B I problems with discrete variables, which include probabilistic inference, decision making under uncertainty, constraint satisfaction, prepositional satisfiability, decoding problems, and possibility inference. A l l the constraints are represented in the table form. Several generalized inference algorithms, either exact or approximate, are implemented. By manipulating various semirings, these generalized algorithms can solve concrete C B I problems within the semiring-based unified framework. 6.3.1 Semiring Semiring is the key concept in the proposed framework. The implementation of semirings should capture their common computational properties. The abstract class Semiring is the base class for all concrete semirings, which provides basic operations (combination and marginalization), as well as provides access to other semiring properties. The public methods provided by class Semiring are listed in Table 6.1. Different semirings are derived from it, which implement concrete methods. In addition, there are two public interfaces, IReversible and Ildempotent, which specify the behaviors of semirings with combination idempotency and invertible properties. The methods of these two interfaces are listed in Table 6.2 and Table 6.3, respectively. Concrete semirings with these two properties can implement these interfaces, besides inheriting the base class Semiring. Implemented concrete semirings in G C B I J include: • SumProdSemiring; Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Return Table 6.2: Public Methods of the Interface Ildempotent Method Name Comments int compares(Object, Object) boolean isSumldempotent () boolean isProdldempotentQ Return Table 6.3: Public Methods of the Interface IReversible Method Name Comments Object Collection reverses(Object) normalize(Collection) • MaxSumSemiring; • OrAndSemiring; • MaxMinSemiring; • MinMaxSemiring; • MaxProdSemiring; • MinSumSemiring; 75 Return a partial ordering of the two given objects. Determine if marginalization operation is idempotent. Determine if combination operation is idempotent. Return the reverse of a given object. Normalize a collection of objects. . • MinProdSemir-ing.. Provided with the base class and common interfaces, users can easily design and implement their own task-specific semirings. 6.3.2 Parser To tackle practical C B I problems from different disciplines, G C B I J provides several parser classes to construct C B I problems from input files. A l l parsers are derived from the base abstract class Parser. The public methods provided by the base class are listed in Table 6.4. • Implemented concrete parsers in G C B I J include: • XmlbifParser : This parser corresponds to parse X M L files in the Interchange Format for Bayesian Networks [11]. • BifParser : This parser corresponds to parse B I F files in older versions of the Interchange Format for Bayesian Networks [10]. Together with XmlbifParser and a 3rd-party Bayesian Networks Formats Convertor [28], G C B I J can parse and perform inference on most Bayesian networks in the Bayesian Network Repository [23]. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Return Table 6.4: Public Methods of the Class Parser Method Name Comments CBITask Parser(Semiring) parse (String) Semiring setSemiring(Semiring) 76 Constructor with specified semiring. Given a filename, parse it into CBITask, the internal C B I problem representation of G C B I J . Specify the semiring used during parsing a file. • CspifParser : This parser corresponds to parse CSPs in CSPIF format that is used in the C S P representation of CISpace [41]. Currently, the parser can parse scheduling problems with custom constraints, n-queens problems, and crossword problems. Given more CSPIF specifications, we can extend this parser to incorporate G C B I J into CIspace in the future. • DimacsParser : This parser corresponds to parse SAT problems in DIM A C S C N F format. All benchmark problems in S A T L I B [26] are in this format. • DimacsColorParser : This parser corresponds to parse coloring problems in C O L format. C O L format is an older version of the D I M A C S C N F format used to characterize clique and coloring problems in graphs. 6.3.3 CBITask This package consists of several elements used to characterize a C B I problem. Class Domain wraps a List in Java as domain values. Class Variable specifies the variable name. It contains an instance of class Domain as the domain of the variable. Class Constraint is the implementation of the constraint, it consists of a list of variables and a (logically) multi-dimensional table to specify the constraint value (in Object in Java) under every value assignment of variables. Internally the table is implemented as a List. A series of accessing methods are provided. The class CBITask is the container of variables and constraints, which corresponds to the basic elements of a concrete C B I problem. It also has an instance of Semiring as its member. The combination and marginalization operations for both exact and approximate approaches are implemented for the CBITask through wrapping corresponding methods of this semiring. 6.3.4 Inference The Inference package is a collection of generalized exact and approximate inference algorithms for C B I inference tasks, which queries a new constraint over a variable, a set of variables, or the whole problem (no particular variable is Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Return 77 Table 6.5: Public Methods of the Class InferAlg Method Name Comments InferAlg () InferAlg(CBITask) CBITask void setCBITask(CBITask) setOrderingType(int) CBITask int Constraint getCBiTask() getOrderingType() query () Constraint query(Variable) Constraint query(Collection) Map Collection queryOneAssignmentQ query All AssignmentsQ Default constructor, no C B I problem specified. Constructor with specifying a C B I problem to solve. Specify a C B I problem to solve. Set the heuristic search type to determine a variables elimination ordering. Return a C B I problem to be solved. Return current heuristic search type. Query the property of the whole C B I problem. Query the constraint on specified variable. Query the constraint on a set of specified variables. Find one assignment of all variables. Find all assignments (collection of Maps) of all variables. interested). Another task for C B I problems is the assignment task with combination idempotent semirings. Inference algorithms need to find one valid value assignment or all valid value assignments to uninterested variables. The basic methods of inference algorithms are defined in the abstract base class InferAlg, as shown in Table 6.5. A l l concrete inference algorithms inherit it and follow the same interface. Generic Inference The class GenericInferAlg is the only concrete class, derived directly from the abstract class InferAlg. It directly implements the Eq. 3.1 of C B I problems. Allocation tasks are solved by looking up the value table of the total combined constraint. The generic inference can be classified as an exact inference. However, we decided to implement it independently. Exact Inference The abstract class ExactlnferAlg is derived from the abstract class InferAlg, without more public methods specified. Based on it, several exact inference algorithms introduced in Chapter 4 are implemented. These algorithms include: • VEExactlnferAlg : The generalized variable elimination algorithm (Algorithm 4.1) with various elimination orderings is implemented in this class. Backtracking is used to answer assignment tasks. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 78 • JTExactlnferAlg : The generalized junction tree algorithm (Algorithm 4.4) for Shenoy-Shafer architectures is implemented in this class. Backtracking in the final cluster and constraints-passing are used to answer assignment tasks. There is no special requirement for semirings except commutativity. Three classes derived from JTExactlnferAlg aim to cope with semirings with combination invertible and idempotent properties. - LS JTExactlnferAlg : The generalized J T algorithm for LauritzenSpiegelhalter (LS) architectures (Algorithm 4.8) is implemented in this class, which is designed to solve C B I problems with combination invertible semirings. - HUGIN JTExactlnferAlg : The generalized J T algorithm for H U G I N architectures (Algorithm 4.9) is implemented in this class, which is designed to solve C B I problems with combination invertible semirings. - IdempotentJTExactlnferAlg : The generalized J T algorithm for C B I problems with combination idempotent properties (Algorithm 4.6) is implemented in this class. Approximate Inference The abstract class ApproxInferAlg is derived from the abstract class InferAlg, with additional public methods to specify approximation parameters. These methods are listed in Table 6.6. Based on the abstract class ApproxInferAlg, several approximate inference algorithms introduced in Chapter 5 are implemented. These algorithms include: • VEApproxInferAlg : The generalized approximate variable elimination algorithm (Algorithm 5.2) with various elimination orderings is implemented in this class. • Loopy ApproxInferAlg : The generalized loopy message propagation algorithm (Algorithm 5.7) is implemented in this class. • JT ApproxInferAlg : The generalized approximate junction tree algorithm (Algorithm 5.3) for Shenoy-Shafer architectures is implemented in this class. There is no special requirement for semirings except commutativity. Two classes derived from JT ApproxInferAlg aim to cope with semirings with the combination invertible property. Both of them are implemented based on the parent abstract class JT ApproxInferAlg. The implementation of the inverses approximation follows Eq. 5.2. - LSJTApproxInferAlg : The generalized approximate J T algorithm for Lauritzen-Spiegelhalter (LS) architectures. - HUGIN JT ApproxInferAlg : The generalized approximate J T algorithm for H U G I N architectures. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Return 79 Table 6.6: Public Methods of the Class ApproxInferAlg Method Name Comments void setMaxWidth(int) void setMaxBucket (int) void setLoopNum(int) int getMaxWidth() int getMaxBucket() int getLoopNumQ Specify the upper bound of the size of combined constraint (the maximum cluster size in a junction tree/graph). Specify the number of buckets to divide constraints. Specify the maximum loops in loopy message propagation. Get the upper bound of the size of combined constraint (the maximum cluster size in a junction tree/graph). Get the number of buckets to divide constraints. Get the maximum loops in loopy message propagation. The other two approximation inference algorithms derived from JTApproxInferAlg are: — ThinJIApproxInferAlg : The generalized thin junction tree algorithm (Algorithm 5.5). — HybridJT ApproxInferAlg rithm (Algorithm 5.9). 6.3.5 : The generalized hybrid inference algo- Graph This package includes two sub-packages: C o m p o n e n t and A l g o r i t h m . Graph.Component The basic graph element classes Vertex and Edge are implemented through wrapping classes Vertexlmp and Edgelmpl in OpenJGraph [31]. The primal graph class Graph wraps OpenJGraph's class Graphlmpl. More specifically, the class Vertex has a Variable instance as its member. Class Cluster is derived from Vertex with a collection of vertices as its member. Additional public methods (cf. Table 6.7) for Cluster are implemented. Class Separator is derived from Edge with a cluster as its member. The separator set and messages are stored in that cluster. Accessing methods for the class Separator are implemented as well. Class JGraph is derived from Graph, which consists of clusters and separators. Class JTree is derived from JGraph with the tree property enforcement. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 80 Graph. Algorithm A collection of graph algorithms are implemented in this package. Class Triangulation A Ig is the implementation of Algorithm 2.1 by feeding in the adjacent matrix of a given primal graph. Also, it provides a method for getting a variable ordering if specifying a heuristic search algorithm. The abstract class Heuristics Search provides a common interface for various heuristic search algorithms in Appendix A. These concrete heuristic search algorithms include: • RandomS'earch; • MinWidthS earch; • MinFillinS earch; • MinlnducedFillinS earch; • MaxCardinalitySearc; • Lex MinimalS earch; • LexP erfectS earch; Another class SmallWorldParam specifies the methods for computing the graphical parameters of small world models. Class SmallWorldUtilities acts as the helper to parameter computing procedures. 6.3.6 Utilities Several utility classes are implemented in this package, which include StatisticsUtility, SetUtility and CSPGenerator. The class Statistics Utility implements a collection of static methods for statistical purposes, which includes a start/stop C P U timer, counting numbers of operations, and basic data post-processing methods. The class SetUtility implements set operations such as interaction, difference, and union. Basically, it wraps methods of Collection in Java. The class CSPGenerator is an abstract class that specifies the interface of generating binary CSPs in CSPIF format. The CSPIF format can be parsed either by G C B I J or by CIspace. In G C B I J we implement three concrete binary CSP generators, inherited from CSPGenerator: • The class RandomCSPGenerator generates random binary CSPs; • The class SmallWorldCSPGenerator generates binary CSPs with Small World topologies, which implements Algorithm 2.2; • The class ScaleFreeCSPGenerator generates binary CSPs with Scale-Free topologies, which implements Algorithm 2.3. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 81 To generate binary CSPs, we need to specify the number of variables (vertices), the number of constraints (edges), the size of the variable domain, the forbidden rate of constraints, and the number of generated instances. Furthermore, we need to specify the re-written probability of SmallWorldCSP'Generator. Generated binary C S P instances with various topologies are used in our experiments (cf. Chapter 7). 6.3.7 G C B I J _Main The package GCBIJJVlain contains a main class GCBIJ as the entry point and an example for using G C B I J . Also, several test unit classes are included in this package. For test purposes, a command line mode is implemented. 6.4 Summary The proposed semiring-based unified framework for C B I problems in Chapter 3 is implemented as the Generalized Constraint-Based Inference Toolkit in Java (GCBIJ). The toolkit provides a way to represent various concrete C B I problems and a series of generalized inference algorithms, from exact to approximate inference. By specifying various semirings, these generalized inference algorithms can be instantiated as concrete algorithms that solve C B I problems from different disciplines. G C B I J also provides a collection of parsers to translate concrete problems from various fields with domain-specified formats. The architecture of G C B I J is flexible, making it easy to extend. Users can implement their own task-specific semirings to fulfill their purposes. A l l the implemented inference algorithms use the abstract class Semiring to access basic operations, without relating to the properties of concrete semirings. On the contrary, to design a new inference algorithm, users do not need knowledge of specific semirings, which ensures the generality of the algorithm. Given the common parser interface, users can also design a new parser to translate their domain problems into our internal C B I problem representation. The graph package is flexible as well. Users can later add more graphical features and algorithms. G C B I J is a proof of the representation power of the proposed semiringbased unified framework. It is a concrete system for implementing the ideas and concepts of the framework. The flexibility of its open architecture makes it a suitable toolkit for both research and application purposes. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ Return int Collection void void void void boolean boolean void void void void void void 82 Table 6.7: Public Methods of the Class Cluster Method Name Comments Return the number of variables getSize() in this cluster. getVertexSet() Return vertices in this cluster. addVertex( Vertex) Add a vertex to this cluster. addVertices(Collection) Add a collection of vertices to this cluster. remove Vertex( Vertex) Remove a vertex from this cluster. remove Vertices(Collection) Remove a collection of vertices (if any) from this cluster. containVertex( Vertex) Determine if the vertex is in this cluster. contain Vertices (Collection) Determine if a collection of vertices is a subset of this cluster. addToLocalPotential(Collection) Add a collection of constraints to the local constraint set of this cluster. Clear the local constraint set of clear LocalPotential(Collection) this cluster. addToIncomingMsg(Collection) Add a collection of constraints to the incoming message set of this cluster. Clear the incoming message set clear IncomingMsg (Collection) of this cluster. addToOutgoingMsg(Collection) Add a collection of constraints to the outgoing message set of this cluster. clearOutgoingMsg(Collection) Clear the outgoing message set of this cluster. Chapter 7. GCBIJ Applications: Case Studies 83 Chapter 7 G C B I J Applications: Studies Case In this chapter, we use the Generalized Constraint-Based Inference toolkit in Java (GCBIJ), an implementation of the proposed semiring-based unified framework and generalized inference algorithms, to do a series of experiments for C B I problems from different disciplines. Based on these experiments, we show that (1) the proposed semiring-based unified framework and generalized inference algorithms are suitable for representing and solving various concrete C B I problems; (2) G C B I J is an good platform for studying C B I problems; (3) G C B I J has the potential to tackle practical applications. 7.1 Test Suites In this chapter, we study a series of concrete C B I problems based on G C B I J . These problems belong to probability inference, CSP, and SAT respectively. See Table 7.1 for basic information on these problems. 7.2 Studies of Heuristic Triangulation Algorithms Analyses in previous chapters show that both the time and space complexities of exact inference algorithms are exponential in the induced width of a given elimination ordering, or equivalently, exponential in the width of a tree decomposition of the original C B I problem. Ideally, the complexity will reach minimum if we can find the treewidth of the primal graph of the given C B I problem. However, computing treewidth is known as A/"P-hard • [2]. To get a near-optimal tree decomposition, we sometimes resort to heuristic triangulation algorithms. Basically, heuristic triangulation algorithms produce an elimination ordering of variables, which leads to an upper bound of the induced width or the treewidth. In this chapter, we study several heuristic triangulation algorithms, which include Minimum Induced Width (MIW), Minimum Induced Fill-in (MIF), Maximum Cardinality (MC), Perfect Lexicographic B F S ( L E X P ) , and Minimum Lexicographic B F S ( L E X M ) . Details of these algorithms are listed in Appendix 84 Chapter 7. GCBIJ Applications: Case Studies Fields Bayesian Networks CSP Coloring Table 7,1: The Basic Information of Test Suites Adopted From No. Problem Name |V| 1 Diagnosis 37 Bayesian Net- 2 Insurance 27 work Repository [23] 3 Pigs 441 4 Link 724 fpsol2.i 5 496 6 mulsol.i 197 DIMACS Dataset [20] zeroin.i 211 7 SAT S A T L I B [26] CSP CSPGenerator of GCBIJ \£\ 65 70 806 1738 11654 3925 4100 . 8 9 10 11 12 uf20 flatlOO BlocksWorld Random Binary C S P Scale-Free Binary C S P 20 300 116 35 100 147 1017 777 70 400 13 Small World C S P 100 400 A. To compare the results of different algorithms, we studied the triangulation algorithm with a random ordering. Table 7.2 shows the upper bounds of treewidth for various C B I problems. Here w i„ is the best reported result so far, adopted from [24]. The number in the parentheses is the maximum separator size of the generated junction tree. For heuristic search algorithms ( L E X P , L E X M , M C ) with a random initial vertex, the upper bound of treewidth is averaged from 0.1 x |V| runs, each with a random initial vertex. For straightforward comparisons, we normalize the upper bounds of treewidth from various heuristics by the best reported results. The normalized upper bounds are shown in Figure 7.1. According to it, the Minimum Induced Fillin heuristic results in the best approximation of treewidth among all of these heuristics. The second best is the Minimum Induced Width heuristic, which outperforms to the Maximum Cardinality heuristic in most problems. We also notice that heuristics are sensitive in different problems. Problems No. 3 (Pigs), No. 4 (Link), and No. 7 (zeroin.i) are hard to approximate by many heuristic techniques. Following this experiment, we use the Minimum Induced Fill-in as the default heuristic triangulation algorithm in G C B I J , though users can specify other heuristics by changing the default parameters. m Further information and discussion about heuristic triangulation algorithms and their empirical evaluations can be found in [33]. Table 7.2: Upper Bounds of Treewidth for Graphs, Different Heuristic Searches MIF LEXM MC LEXP MIW RAND Problem 5(3) 5(3) 5(3) 5(3) 6(4) 13(8) 5 Diagnosis 10(8) 9(6) 8(7) 10(8) 8(6) 14(13) 8 Insurance 31(30) 21(20) 11(9) 17(14) 26(26) 137(131) 10 Pigs 71(70) 121(120) 16(14) 36(29) 68(66) 254(250) 13 Link 72(67) 79(73) 72(68) 67(66) 247(202) 71(66) 66 fpsol2.i 71(64) 77(73) 62(45) 51(50) 130(121) 51(50) 50 mulsol.i 102(99) 63(48) 51(49) 97(96) 51(50) 110(106) zeroin.i 50 18(16) 17(16) 17(14) 17(16) 17(16) 18(17) 17 uf20 126(124) 114(111) 101(89) 107(105) 120(119) 175(173) 100 flatlOO 53(50) 49(48) 52(44) 74(72) 50(46) 80(77) 49 BlocksWorld 9(8) 9(8) 9(8) 9(8) 9(8) 9(8) Rnd. Binary C S P 9 86 Chapter 7. GCBIJ Applications: Case Studies I I I ! 1 1 -m• «• • MIF MIW ~ MC ' i i ;i ' t ' i < t i i \...[./, A j •...»...",...; ( / / 4;' ' ».,..•: \ \\ .« , . ••. ! >.'..; »i \l „< \ . 2 1 1 1 k ! 1 5 6' 7 N o . of P r o b l e m s Figure 7.1: Upper Bounds of Treewidth, Based on Various Heuristics 7.3 E m p i r i c a l S t u d i e s of E x a c t I n f e r e n c e Algorithms 7.3.1 Exact Inference Algorithms w i t h Invertible Semiring In this experiment, we solve the p r o b a b i l i t y inference p r o b l e m in G C B I J , by specifying the regular sum-product semiring (Semiring No. 6 in Table 2.2). It is obvious that the sum-product semiring has the combination (product) invertible property. So, besides generalized V E and J T algorithms, both J T algorithms in LS and H U G I N architectures can be used to tackle inference tasks. The probability inference problem to be tested in this experiment is the Diagnosis network (cf. Figure 7.2, which is taken from CIspace [41]). The Diagnosis problem has 37 variables. The primal graph, the moralized graph of the original Bayesian network, has 65 edges. To test the performances of various generalized exact inference algorithms ( G V E , G J T , G J T - L S and G J T - H U G I N ) in G C B I J , we query the marginal probability for every variable (37 queries in total). Validated by CIspace, all generalized exact inference algorithms return correct results for all queries. The running time and the number of binary operations used are shown in Table 7.3. Results here empirically support the theoretical analyses conveyed by Table 4.2 and Table 4.3 in Chapter 4. We can conclude that: (1) J T algorithms are Figure 7.2: Bayesian Network of the Diagnosis Problem Chapter 7. GCBIJ Applications: Case Studies 88 Table 7.3: Running Times of Exact Inference Algorithms with the Combination Invertible Property for the Diagnosis Problem. (37 Independent Queries) GVE G V E ( perquery) G J T G J T - L S G J T - H U G I N Run. Time (ms) ^Operation x #Operation + ^Operation / 60068 77946 43910 0 1623 2107 1187 0 7922 6768 3694 0 4307 2881 3694 966 4217 2881 3694 250 ideal for multiple queries and V E algorithms are ideal for single queries; (2) for C B I problems defined on invertible semirings, both LS and H U G I N architectures are superior to the generalized J T algorithm in the Shenoy-Shafter architecture; and (3) the H U G I N architecture is slightly better than the LS architecture in using fewer / operations. 7.3.2 Exact Inference Algorithms w i t h Idempotent Semiring In this experiment, we solve the inference task (decision problem) of a random b i n a r y C S P based on G C B I J , by specifying the logical or-and semiring (Semiring No. 1 in Table 2.2). It is obvious that the or-and semiring has the combination (A) idempotent property. So besides generalized V E and J T algorithms, the idempotent J T algorithm (Algorithm 4.6) can also solve this inference task. The test problem in this experiment is a random generated binary C S P with 35 variables and 70 constraints. The domain size of each variable is 3. The forbidden rate for each constraint is set at 0.5, in other words, each value assignment of a constraint has a probability 0.5 of not being allowable. In G C B I J we compose a binary CSP generator to produce binary CSPs in CIspace's C S P I F format. To test the performances of various generalized exact inference algorithms ( G V E , G J T , GJT-Idemp) in G C B I J , we query the marginal constraint on every variable (35 queries in total). The marginal constraint of each variable indicates valid values for this variable. Validated by CIspace, all generalized exact inference algorithms return correct results for all queries. The running time and the number of binary operations used are shown in Table 7.4. These results empirically support the theoretical analysis conveyed by Table 4.2 and Table 4.3 in Chapter 4. We can conclude that: (1) J T algorithms are ideal for multiple queries. For CSPs, the query for a variable returns valid values for that variable; (2) for a C B I problem defined on idempotent semirings, the idempotent J T algorithm is superior to the generalized J T algorithm. 89 Chapter 7. GCBIJ Applications: Case Studies Table 7.4: Running Times of Exact Inference Algorithms with the Combination Idempotent Property for a Random Binary CSP. (35 Independent Queries) GVE G V E (per query) GJT GJT-Idemp Running Time (ms) #Operation A ^Operation V 7.3.3 182945 2428510 1407875 5227 69386 40225 117749 1204947 243093 41360 416556 306022 Exact Inference Algorithms for General CBI Problems The last problem to test for exact inference algorithms in G C B I J is the M a x C S P for previous random binary C S P (35 variables and 70 constraints). According to the solution of the inference task, the test problem is satisfiable. So the inference result of the corresponding Max C S P should be 70. Both generalized V E and J T algorithms return this correct result. See Table 7.5 for details. We also conclude the maximum number of satisfiable constraints for each variable, given different domain values (cf. Table 7.6 ). From Table 7.5 we can clearly see that the V E algorithm is ideal for single queries. Compared to corresponding columns in Table 7.4, we notice that different semirings cause different running times even when we use the same generalized exact inference algorithm. 7.4 7.4.1 E m p i r i c a l S t u d i e s of A p p r o x i m a t e Inference A l g o r i t h m s Approximate V E and J T for CSPs In this experiment, we apply both the generalized approximate variable elimination (GVE-Approx, Algorithm 5.2) and the approximate junction tree (GJTApprox, Algorithm 5.3) to solve a random binary C S P with 35 variables and 70 constraints with G C B I J . The exact inference results of the problem was given in previous sections. We query the valid values for each variable. Each valid value corresponds to its Table 7.5: Running Times of General Exact Inference Algorithms for a Max CSP (Single Query) GVE GJT Running Time (ms) ^Operation + ^Operation max 6670 70761 37801 274975 1204947 237737 Chapter 7. GCBIJ Applications: Case Studies 90 Table 7.6: The Maximum Number of Satisfiable Constraints Given Different Domain Values for a Random Binary C S P with 35 Variables and 70 Constraints Variable DI D2 D3 64 VI 70 63 V2 70 66 67 V3 66 70 66 V4 70 63 69 64 V5 70 70 V6 70 67 70 67 V7 70 70 V8 70 68 70 64 V9 70 70 V10 70 63 70 Vll 70 63 69 V12 70 67 69 V13 70 68 70 V14 70 67 70 64 V15 70 69 67 V16 70 69 67 V17 70 68 V18 70 70 70 V19 70 70 70 V20 70 69 69 V21 70 68 67 V22 70 68 68 V23 70 70 70 V24 70 68 69 70 70 V25 70 V26 70 68 68 V27 70 68 '68 70 70 V28 68 V29 70 69 69 V30 70 69 69 ' V31 70 68 69 V32 70 70 70 68 69 V33 70 V34 70 67 70 V35 70 69 70 Chapter 7. GCBIJ Applications: Case Studies 91 appearance in a valid assignment. If there is no valid value for a variable, the problem is not satisfiable. Both GVE-Approx and GJT-Approx return the approximate marginal of each variable. Compared to exact results, all the errors made by approximate algorithms are false-positive. In other words, only introduced errors falsely report some invalid values as valid. In general, both GVE-Approx and G J T Approx trade time and accuracy to save the space of the corresponding exact inference algorithms. Two parameters control the performance of approximate inference algorithms. The maximum width w is the upper bound of the acceptable width during computing, which controls the accuracy of the approximation. A larger acceptable width, at the same time, will take more space. If w is the same as or larger than the treewidth or induced width of the given CBI problem, the approximate V E and J T algorithms return exact results. Another parameter b, the number of buckets to partition constraints, controls the speed of the approximation. A larger b will reduce the running time. However, more errors will be introduced at the same time. We can change the settings of w and d to achieve balance between accuracy and speed. Figure 7.3 shows the accuracy and running time of the approximate V E algorithm for a random binary CSP under different parameter settings. Analogous results for the approximate J T algorithm are shown in Figure 7.4. A Bounded Width: w B Bounded Width: w Figure 7.3: Accuracy and Running Time of the Approx. V E for a C S P Chapter 7. GCBIJ Applications: Case Studies A B B o u n d e d Width: w B o u n d e d Width: w 92 Figure 7.4: Accuracy and Running Time of the Approx. J T for a C S P 7.4.2 A p p r o x i m a t e V E and J T for P r o b a b i l i t y Inference The approximate V E and J T algorithms in G C B I J can be applied to solve probability inference problems without any modifications. The only thing needed is to specify a regular sum-product semiring to replace the logical or-and semiring for CSPs. In this experiment, we use approximate V E and J T algorithms to solve the Insurance problem from B N Repository [23]. The Insurance problem contains 27 variables and 27 conditional probability tables (CPTs). Figure 7.5 shows the Bayesian network of the Insurance problem. After moralization, we have 27 vertices and 70 edges in GCBIJ's internal primal graph representation. In each run of the experiment, we randomly choose 4 variables as observed. Observed values are chose randomly as well. Then we compute the marginal of the rest variables. For both GVE-Approx and GJT-Approx, we fix the number of buckets at b = 2. The maximum width w changes from 4 to 7 (treewidth of the Insurance problem is 8). The correspondence between the exact and approximate marginal for the Insurance problem are shown in Figure 7.6 and Figure 7.7, corresponding to approximate V E and J T algorithms respectively. We find that the accuracy of the approximate V E and J T algorithms are increase with the maximum width w, the same as the results for solving CSPs. Chapter 7. GCBIJ Applications: Case Studies 93 Chapter 7. GCBIJ Applications: Case Studies 94 Figure 7.6: Marginal Correspondence of the Approx. V E for a Probability Inference Problem. 95 Chapter 7. GCBIJ Applications: Case Studies w = 4;b = 2 0 0.2 0.4 0.6 Exact Marginal w = 5;b = 2 0.8 1 0 0.2 0.4 0.6 Exact Marginal 0.8 1 Figure 7.7: Marginal Correspondence of the Approx. J T for a Probability Inference Problem. 7.5 Loopy Message Propagation As Exact Inference Algorithm As discussed in Chapter 5, loopy message propagation (Algorithm 5.7) is in general an approximate inference algorithm for C B I problems. However, when a C B I problem is defined on semirings with the combination idempotency property, loopy message propagation returns exact results within at most d iterations, where d is the diameter of the corresponding dual graph. In this section, we apply loopy message propagation algorithm to solve a random binary CSP. The random binary CSP is the same as the one used in Section 7.3.2, which has 35 variables with 3 domain values each and 70 constraints with the forbidden rate at 0.5. In the experiment, we examine the different upper bounds of the maximum cluster size in the join graph. We gradually increase the maximum allowable iterations. Within the allowable iterations, the program exits when all messages are unchanged (stable), returning the actual number of iterations. Chapter 7. GCBIJ Applications: Case Studies 96 Figure 7.8: Accuracy (A) and Time Complexity (B)of loopy Message Propagation for a C S P Figure 7.8(A) and Figure 7.8(B) report the accuracy and time complexity (in terms of binary operations) respectively. Just like using the approximate V E and J T algorithms for CSPs, loopy message propagation returns only falsepositives (in other words, reporting some invalid values as valid). Errors are reduced when the iterations of the message propagation increase. Eventually all messages are stable and loopy message propagation returns exact inference results. Roughly speaking, a junction graph with a larger cluster size consumes more binary operations (combination and marginalization) for random CSPs. 7.6 7.6.1 Impact of Small World Topology Small W o r l d Topology of C B I problems The Small World topology [59] appears widely in social graphs (e.g., the collaboration graph of actors in feature files), biological graphs (e.g., the neural network of the nematode worm C.elegans), and man-made graphs (e.g., the electrical power grid of the western United States). In a graph with the Small World topology, vertices are highly clustered, yet the path length between them is small. In this thesis, we are interested in whether or not such a topology exists in primal graph representations of real world C B I problems and how this characteristic impacts the performance of inference algorithms. 97 Chapter 7. GCBIJ Applications: Case Studies \ 10 5 6 15 35 40 45 0.5 0.1 0.2 0.3 ' •I I JS— 0 0 50 - Ill D 25 30 Path Length; 1 0.8 • 0 g 20 10 — 20 0.4 0.5 0.6 Clustering Coefficient: c 1 1 0.7 ' , 30 0.8 0.9 j r- , , , 40 50 60 1 70 Degree: d Figure 7.9: Small World Parameter Distributions for the Pigs Network The Small World characteristic parameters for various C B I problems, including characteristic path length L, clustering coefficient C, and normalized scalar H (cf. Chapter 2), are computed and shown in Table 7.7, and compared to some practical networks [59]. For comparison, we also generate three scale-free networks based on Algorithm 2.3. All of them exhibit characteristic of small world topology. That observation shows that the scale-free network is probably a subset of small world topology, though no formal proof is available so far. In this table, we are particularly interested in the Pigs and the Link networks, which are taken from a real-world application dataset. Figure 7.9 and Figure 7.10 show the characteristic parameter distributions of these two problems respectively. In both of the Pigs and the Link networks, characteristic path lengths are relatively small, which implies that the message from one cluster will quickly reach another cluster in the junction graph. Although degree distributions show that most vertices in the graph have small degrees, some large degrees, in addition to a large amount of medium clustering coefficients, imply that exact inference algorithms are infeasible. Combining these observations, loopy message propagation is probably an efficient approach to coping with these two C B I problems. Fields Practical Table 7.7: Parameters of the Small World Topology for Graphs C L Problem Grand |V| Lrand \£\ 0.00027 2.99 0.79 3.65 Film Actors 225,226 6,869393 4942 18.7 12.4 0.08 0.005 Power Grid 6819 1974 2.25 0.28 0.05 282 ' 2.65 C.elegans M 2396 10.61 4.755 Bayesian Networks Diagnosis Insurance Pigs Link 37 27 441 724 65 70 806 1738 3.82 2.23 4.97 6.37 2.92 2.12 4.77 4.35 0.79 0.70 0.79 0.68 0.56 0.48 0.50 0.40 1.08 1.37 1.56 1.17 Coloring fpsol2.i mulsol.i zeroin.i 496 197 211 11654 3925 4100 1.69 1.81 1.66 1.92 1.80 1.82 0.45 0.59 0.48 0.13 0.24 0.23 3.83 2.42 2.33 Unf20 FlatlOO BlocksWorld 20 300 116 147 1017 777 1.22 3.72 2.56 1.22 3.18 2.07 0.80 0.33 0.52 0.80 0.31 0.25 1.00 0.92 1.72 sflOO Link-Like C.elegans-Like 100 724 282 400 1738 1974 1.92 2.56 1.95 2.42 4.37 2.42 0.88 0.388 0.926 0.08 0.049 0.049 13.87 13.66 23.58 SAT Scale-Free Chapter 7. GCBIJ Applications: Case Studies 99 Figure 7.10: Small World Parameter Distributions for the Link Network 7.6.2 Inference in the Small W o r l d To analyze the impacts of the small world topology for C B I problems, we generates a series of binary CSPs with different re-written probabilities, according to [59]. In general, a small re-written probability corresponds to ordered graphs, whereas a large re-written probability corresponds to random graphs. Visually small world graphs are those between ordered and random graphs. Figure 7.11 is adopted from [59]. It illustrates the relations between small world parameters and the re-written probabilities of graphs with 100 vertices and 400 edges. In this section, we use loopy message propagation to perform inference tasks for binary CSPs with the Small World topology. These CSPs are generated based on Algorithm 2.2. Each of them has 100 binary variables and 400 binary constraints, with a 0.5 forbidden rate. The re-written probabilities range from 0 to 1. We collect the number of binary operations under each re-written probability (with 5 instances). The results are generalized in Figure 7.12. Compared to Figure 7.11, we find that loopy message propagation works well in CSPs with the small world topology. A relatively small characteristic path length implies quick convergence. A relatively large clustering suggests that we can resort to a large maximum cluster size in loopy message propagation if the computation power permits. The results also suggest that loopy message propagation is a suitable inference approach for CSPs with ordered graphical representations, but not suitable for random CSPs. 100 Chapter 7. GCBIJ Applications: Case Studies • C(p)/C(0) 0.8 0.6 h 0.4 , • 0 0.0001 0.001 0.01 0.1 p Figure 7.11: Characteristic Path Length L(p) and Clustering Coefficient C(p) for Different Re-written Probabilities (Figure is taken from [59])' O x • Bounded Widlh: w = 3 Bounded Width:w = 4 B o u n d e d Width: w = 5 10"* 10" J 10" Rewriting Probability: p J 10"' 10° Figure 7.12: Time Complexity of loopy M P in the Small World Chapter 7. GCBIJ Applications: Case Studies 7.7 101 Summary Based on the Generalized Constraint-Based Inference toolkit in Java (GCBIJ), this chapter presents a series of experiments of C B I problems from different disciplines. The results of these experiments are not totally new to research communities. However, these results as a whole verify the feasibility of using semirings to generalize representations of various C B I problems. Generalized exact and approximate inference algorithms, based on the proposed semiring-based unified framework, are proven to be suitable to apply in concrete application domains. Case studies also show that G C B I J is a flexible toolkit of constraint-based inference research. Various exact and approximate inference algorithms can solve different C B I problems when different semirings within G C B I J are specified. New algorithms can be added to the toolkit by either generalizing existing concrete approaches or designing new approaches on the abstract level. Although more optimization and implementation work is required, the extendibility and flexibility of G C B I J make it a suitable toolkit for both C B I research and applications. Chapter 8. Conclusions and Future Work 102 Chapter 8 Conclusions and Future Work 8.1 Conclusions As the first contribution of this paper, we propose a semiring-based unified framework, a single formal representation framework with a broader coverage of the constraint-based inference problem space based on the synthesis of existing generalized frameworks. Our framework explicitly subsumes and unifies many concrete C B I problems, such as probabilistic inference, decision making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference, in an abstract representation. The unified framework is also a single formal algorithmic framework that provides a broader coverage of both exact and approximate inference algorithms. This is the second contribution of this paper. We unify various widely used inference algorithms, such as the exact and approximate variable elimination algorithms, the exact and approximate junction tree algorithms, and loopy message propagation, based on the framework. Many of these algorithms depend only on the basic properties of the commutative semirings. Based on other special properties of different commutative semirings, we also generalize the variants of these algorithms that arise in different application scenarios. Abstract representations of C B I problems, as well as abstract inference algorithms, provide several opportunities for researchers from various fields: (1) they can study the most important common characteristics of various C B I problems without representation barriers; (2) they can analyze and compare different inference approaches; (3) they can borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) implementations at the abstract level significantly reduce the amount of work targetted previously at the individual problems. In other words, researchers from different fields may reinterpret many familiar approaches in their domains at a higher level. The algorithm discussions and the complexity analyses in this thesis are examples of applying the abstract knowledge to the concrete application domains. The unified representation for C B I problems and inference algorithms is not, of course, a totally novel idea. Much research has been conducted in various disciplines through using different tools or notions. Here we have significantly broadened the scope of the problems and the coverage of the algorithms. The Chapter 8. Conclusions and Future Work 103 final contribution of this paper is a software toolkit, the Generalized ConstraintBase Inference Toolkit in Java (GCBIJ). G C B I J is the first concrete toolkit to implement ideas for unifying the representations of C B I problems using semirings and to implement various inference algorithms on the abstract level. The generalization and extensibility of G C B I J make it a good platform for both C B I research and practical problem solving. 8.2 Possible Improvements to Inference Approaches Here we briefly discuss some possible improvements to generalized inference algorithms for C B I problems. These preliminary discussions are not complete but suggest some interesting research topics for the future. 8.2.1 F r o m Variables to Values The inference approaches generalized in Chapter 4 and Chapter 5 do not depend on possible values or the domains of variables in C B I representations. If values are taken into account, we can improve these approaches by manipulating values. As a restriction of our proposed semiring-based unified framework, the domain of a variable is discrete. Considering Values in Heuristic Search Heuristic searches are used in both V E and J T algorithms to find a near-optimal elimination ordering or tree decomposition before performing the inference. The most straightforward improvement to heuristic-search-based approaches is to consider the actual domain size of the variables instead of treating them uniformly. For example, we use heuristic searches to find the upper bounds of the treewidth or the induced width. For either Minimum Induced Fill-in or Minimum Induced Width heuristics, we break ties by randomly choosing a vertex. Alternatively we can use the domain size as the weight of a vertex (variable) to break ties with largest weight first. In approximate inference algorithms, the key idea of the approximation is to restrict the size of the maximum sub-problem to a tractable level. In our framework, the sub-problem size is measured by the cluster size or the number of variables participating in the sub-problem. More specifically, the sub-problem size could be measured as the product of the domain sizes of all participating variables. Using such a measurement as a heuristic function or an upper bound of approximation procedures will bring computational benefits such as giving a more accurate approximation of the treewidth or introducing less approximation errors. At the same time, using values in heuristics will increase computational costs. Chapter 8. Conclusions and Future Work 104 Instantiating and D o m a i n Splitting In general, when a vertex of a graph is removed, the graph's structure will be simplified. The treewidth of a simplified graph will possibly be reduced. As shown in previous chapters, the time and space complexities of exact inference algorithms in this thesis are proven to be exponential in the width of a tree decomposition of the corresponding primal graph representation. One way to remove a vertex is to assign a value to its corresponding variable. These naive observations suggest that instantiating variables will bring computational benefits if carefully designed. For the completeness of inference, we need to store instantiations of a variable for further use. Once the inference of a conditioning value is finished, we may need to backtrack to other instantiated values. Basically, it is a backtracking process. The D P L L algorithm [12] of SAT is a typical example of implementing such an idea. Given a normal inference approach, when to start instantiating, which variable to be instantiated, and what value to assign are the key problems of instantiation. A related improvement is to split the domain of a variable. Instead of assigning one value to a variable, we could assign a subset of the original domain values to the variable. For context-specific constraint representations, a variable with splitting domains will simplify the graph since some edges are not presented in the current context. Domain Shrinking The last possible improvement to inference approaches we would like to mention is domain shrinking. This idea comes from the Arc Consistency of CSPs. Given a constraint, the value of a participating variable can be discarded once all the assignments induced by this value are false. After checking out all-false values for a variable, we shrink its domain size. The reduced domain size leads to computational benefits for regular inference approaches. The same idea can be generalized to abstract C B I problems. In CSPs, false can be used to detect shrinking values since false is the absorbing element of the combination operation (logical A N D ) in the O R - A N D semiring. The absorbing element of the combination operation means that it is a trap for further combinations. At the same time, false is also the identity element of the marginalization operation (logical OR), which means that no more new information is contained. Based on the previous observations, we can apply domain shrinking to probability inference problems. Since zero is both the absorbing element of the combination (product) and the identity element of the marginalization (plus), we can remove a value from a variable domain if all the assignments in which this value participates are mapped to zero. If approximation is permitted, near zero values with a user-specified threshold can be used to detect the values to remove. Chapter 8. Conclusions and Future Work 105 Table 8.1: Detecting Elements of Some Commutative Semirings Detecting Element No. R © ® 1 {true, false} V A false 2 min [0,1] or [0, oo) max 0 X 3 max 0 [0,1] 4 max N/A (-oo,0] + max N/A 5 [O.oo) + X 6 0 [0,oo) + X 7 [0,oo) max 0 8 (—oo, oo) min oo + X 9 min ( - o o , 0] 0 In general, given a semiring R = (S, ©, ®), a e R can be used to detect values to remove from the. domain if the following two statements hold at the same time: (1) a is the absorbing element of ®; (2) a is the identity element of ©. We call a the detecting element of the semiring R, Table 8.1 concludes the detecting elements of the some semirings. 8.2.2 From Systematic to Stochastic Basically, all the inference algorithms generalized in our framework are systematic approaches to C B I problems. In concrete application domains, stochastic approaches, such as sampling techniques in probability inference and stochastic local searches in SAT, are very successful. Given the fact that underlying representations of these problems are highly analogous, an abstract representation of stochastic or hybrid (stochastic and systematic) inference approaches should be a reasonable goal for future research. 8.3 Future Work In Section 8.2, we briefly discussed some possible improvements to existing inference approaches. Some of these improvements have already achieved success in many fields, which suggests generalizing the characteristics of these approaches, adding them into our semiring-based unified framework, and improving the design of other concrete approaches. One limitation of our framework is that the domain of variables must be discrete. However, both the semiring itself and the combination and marginalization operations do not impose any restriction on variable types. It would definitely exciting to extend our framework to tackle continuous variables in the future. In this thesis, we propose some naive inference algorithms for C B I problems, which can be abstracted with a fc-semiring. For k = 2, we show that the decision making under uncertainty is one example. For k > 2, the application of k- Chapter 8. Conclusions and Future Work 106 semiring is far from clear. If we have some input from experts in other fields to support the usefulness of fc-semirings, we would like to investigate more efficient inference algorithms for fc-semirings. As an open and extendable toolkit, the implementation and optimization of G C B I J is far from complete. To maintain generality, we sacrificed some running efficiency in our current implementation. To attract people to using this toolkit, more comprehensive documents and clear interfaces are needed. In general, much coding and documenting work for G C B I J remains. Bibliography 107 Bibliography S. M . Aji and R. J . McEliece. The generalized distributive law. Transactions on Information Theory, 46:325-343, 2000. IEEE Stefan Arnborg, D. G. Corneil, and A. Proskurowski. Complexity of finding embeddings in a fc-tree. SIAM J. Algebraic and Discrete Methods, 8:277284, 1987. Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509-512, October 1999. Umberto Bertele and Francesco Brioschi. Nonserial Dynamic Programming. Academic Press, Inc., 1972. ISBN 0120934507. Stefano Bistarelli. Semirings for Soft Constraint Solving and Programming. Springer-Verlag, 2004. Stefano Bistarelli, Ugo Montanari, and Francesca Rossi. Semiring-based constraint satisfaction and optimization. J. ACM, 44(2):201-236, 1997. ISSN 0004-5411. Hans L. Bodlaender. A tourist guide through treewidth. Acta Cybernetica, 11:1-21, 1993. URL http://citeseer.ist.psu.edu/ bodlaender93tourist.html. Hans L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theor. Comput. Sci., 209(1-2): 1-45, 1998. ISSN 0304-3975. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. M I T Press/McGraw-Hill, 1990. Fabio Cozman. The interchange format for bayesian networks: Bif, 1996. U R L http://www-2.cs.cmu.edu/~fgcozman/Research/ InterchangeFormat/Old/xmlbif02.html. Fabio Cozman. The interchange format for bayesian networks: Xmlbif, 1998. U R L http://www-2.cs.cmu.edu/afs/cs/user/fgcozman/www/ Research/InterchangeFormat/. Martin Davis, George Logemann, and Donald Loveland. A machine program for theorem-proving. Commun. ACM, 5(7):394-397, 1962. ISSN 0001-0782. 108 Bibliography [13] Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Comput. Inteli, 5(3): 142-150, 1990. ISSN 0824-7935. [14] Rina Dechter. Bucket elimination: A unifying framework for probabilistic inference. In 12th Conf. on Uncertainty in Artificial Intelligence, pages 211— 219, 1996. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / d e c h t e r 9 6 b u c k e t . h t m l . [15] Rina Dechter. Constraint Processing. Morgan Kaufmann, 2003. [16] Rina Dechter and Judea Pearl. Network-based heuristics for constraint satisfaction problems. Artificial Intelligence, 34:1-38, 1987. [17] Rina Dechter and Judea Pearl. Tree clustering for constraint networks (research note). Artif. Inteli, 38(3):353-366, 1989. ISSN 0004-3702. [18] Rina Dechter and Irina Rish. Directional resolution: The Davis-Putnam procedure, revisited. In Principles of Knowledge Representation and Reasoning, pages 134-145. 1994. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / article/dechter94directional.html. [19] Rina Dechter and Irina Rish. A scheme for approximating probabilistic inference. In Uncertainty in Artificial Intelligence (UAI97), pages 22-44, 1997. [20] Rutgers University D I M A C S Center. rutgers.edu/. Dimacs. URL http://dimacs. [21] Amir Eyal. Efficient approximation for triangulation of minimum treewidth. In Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01), pages 7-15, San Francisco, C A , 2001. Morgan Kaufmann Publishers. [22] Eugene C. Freuder. Complexity of k-tree structured constraint satisfaction problems. In AAAI, pages 4-9, 1990. [23] N. Friedman, M . Goldszmidt, D. Heckerman, and S. Russell. Bayesian network repository. U R L h t t p : / / w w w . c s . h u j i . a c . i l / l a b s / c o m p b i o / Repository/. [24] Vibhav Gogate and Rina Dechter. A complete anytime algorithm for treewidth. UAI04, 2004. U R L http://www. i c s . u c i . e d u / ~ c s p / r l l 2 .pdf. [25] C. Guestrin, D. Roller, and R. Parr. Multiagent planning with factored mdps. In T. G . Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1523-1530, Cambridge, M A , 2002. M I T Press. [26] Holger H. Hoos and Thomas Sttzle. Satlib: A n online resource for research on sat. In I. P. Gent, H. v. Maaren, and T. Walsh, editors, SAT2000, pages 283 - 292. IOS Press, 2000. U R L h t t p : / / w w w . s a t l i b . o r g . Bibliography 109 [27] R. Howard and J . Matheson. Influence diagrams. In R. Howard and J. Matheson, editors, Readings on the Principles and Applications of Decision Analysis, volume II, pages 721-762. Strategic Decisions Group, Menlo Park, C A , 1981. [28] William H. Hsu. Bayesian networks formats convertor. U R L http: //www.kddresearch.org/KDD/Groups/Probabilistic-Reasoning/ convertor.html. [29] F. V . Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly, 4:269-282, 1990. [30] Frank Jensen, Finn V . Jensen, and S0ren L. Dittmer. From influence diagrams to junction trees. In Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-94), pages 367-373, San Francisco, C A , 1994. Morgan Kaufmann Publishers. [31] Jesus M Salvo Jr. Openjgraph - Java graph and graph drawing project, 2002. U R L http://openjgraph.sourceforge.net/. [32] Kalev Kask, Rina Dechter, and Javier Larrosa. Unifying cluster-tree decompositions for automated reasoning. Submitted to the A I J , June 2003. [33] Uffe Kjasrulff. Aspects of Efficiency Improvement in Bayesian Networks. PhD thesis, Aalborg University, 1993. [34] Uffe Kjaerulff. Reduction of computational complexity in bayesian networks through removal of weak dependences. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence,, pages 374 - 382, San Francisco, C A , 1994. Morgan Kaufmann Publishers. [35] J . Kohlas and P.P. Shenoy. Computation in valuation algebras. In Handbook of Defeasible Reasoning and Uncertainty Management Systems, Volume 5: Algorithms for Uncertainty and Defeasible Reasoning, pages 5-40. Kluwer, Dordrecht, 2000. [36] Jelle R. Kok and Nikos Vlassis! Distributed decision making of robotic agents. In S. Vassiliadis, L.M.J. Florack, J . W . J . Heijnsdijk, and A. van der Steen, editors, Proc. 8th Ann. Conf. of the Advanced School for Computing and Imaging, Heijen, The Netherlands, pages 318-325. Advanced School for Computing and Imaging (ASCI), June 2003. [37] A. M . C. A . Koster, H. L. Bodlaender, and C. P. M . van Hoesel. Treewidth: Computational experiments. Technical Report 01-38, ZIB, Berlin, Germany, 2001. U R L http://www.zib.de/PaperWeb/abstracts/ZR-01-38/. [38] Kschischang, Frey, and Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47:498-519, 2001. U R L http://citeseer.ist.psu.edu/article/frey98factor.html. Bibliography 110 [39] S. L. Lauritzen and D. J . Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157-224, 1988. [40] A . K . Mackworth. Constraint satisfaction. In S.C. Shapiro, editor, Encyclopedia of Artificial Intelligence, volume 1, pages 285-293. Wiley Interscience, 1992. [41] Alan Mackworth and David Poole. CIspace: Tools for learning computational intelligence. Laboratory for Computational Intelligence, Computer Science, University of British Columbia. U R L h t t p : / / w w w . c s . u b c . c a / labs/lci/CIspace/index.html. [42] Alan K. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8:99-118, 1977. [43] Robert Mateescu, Rina Dechter, and Kalev Kask. Tree approximation for belief updating. In Eighteenth national conference on Artificial intelligence, pages 553-559. A A A I , 2002. ISBN 0-262-51129-0. [44] R. J . McEliece, D. J . C. MacKay, and J.-F. Cheng. Turbo decoding as an instance of Pearl's belief propagation algorithm. Journal on Selected Areas in Communications, 16(2):150-161, Feb 1998. [45] Kevin P. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, 2002. [46] Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propagation for approximate inference: A n empirical study. In Conf. on UAI, pages 467-475, 1999. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / m u r p h y 9 9 1 o o p y . html. [47] Richard E. Neapolitan. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. John Wiley and Sons, New York, N Y , 1990. [48] Mark A . Paskin. Thin junction tree filters for simultaneous localization and mapping. Computer Science Division Technical Report CSD02-1198, University of California, Berkeley, September 2002. U R L h t t p : //citeseer.ist.psu.edu/article/paskin03thin.html. [49] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., 1988. ISBN 0934613-73-7. [50] N. Robertson and P. Seymour. Graph minors. II. Algorithmic aspects of tree-width. Journal of Algorithms, 7(3):309-322, 1986. [51] Donald J . Rose, Robert Endre Tarjan, and George S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM J. Comput., 5(2):266-283, 1976. Bibliography 111 [52] Z. Ruttkay. Fuzzy constraint satisfaction. In Proceedings 3rd IEEE International Conference on Fuzzy Systems, pages 1263-1268, 1994. [53] Thomas Schiex, Helene Fargier, and Gerard Verfaillie. Valued constraint satisfaction problems: Hard and easy problems. In IJCAI95, pages 631-637, Montreal, 1995. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / s c h i e x 9 5 v a l u e d . html. [54] P. Seidel. A new method for solving constraint satisfaction problems. In Seventh national conference on Artificial intelligence, pages 338-342. A A A I , 1981. [55] P. P. Shenoy and G . Shafer. Axioms for probability and belief-function propagation. In Uncertainty in Artificial Intelligence 4, pages 169-198. North-Holland, 1990: [56] Robert Endre Tarjan and Mihalis Yannakakis. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput., 13(3):566-579, 1984. [57] A . J . Viterbi. Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Transactions in Information Theory, 13:260-269, 1967. [58] Toby Walsh. Search in a small world. In IJCAI, pages 1172-1177, 1999. URL http://citeseer.ist.psu.edu/walsh99search.html. [59] Duncan J . Watts and Steven H. Strogatz. Collective dynamics of 'smallworld' networks. Nature, 393(6684):440-442, June 4 1998. [60] Yair Weiss and William T. Freeman. Correctness of belief propagation in gaussian graphical models of arbitrary topology. Neural Computation, 13(10):2173-2200, 2001. U R L h t t p : / / c i t e s e e r . i s t .psu. edu/ weiss99correctness.html. [61] L. A . Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy sets and systems, l(l):3-28, 1978. [62] Nevin Lianwen Zhang. A Computational Theory of Decision Networks. PhD thesis, University of British Columbia, 1994. [63] Nevin Lianwen Zhang. Probabilistic inference in influence diagrams. Computational Intelligence, 14:475-497, 1998. U R L h t t p : / / c i t e s e e r . i s t . psu.edu/zhang98probabilistic.html. [64] Nevin Lianwen Zhang and David Poole. A simple approach to Bayesian network computations. In Proc. of the Tenth Canadian Conference on Artificial Intelligence, pages 171-178, 1994. Bibliography 112 [65] Ying Zhang and Alan K. Mackworth. Parallel and distributed finite constraint satisfaction: Complexity, algorithms and experiments. In Parallel Processing for Artificial Intelligence, chapter 1, pages 305-334. Elsevier Science Publishers, 1994. Appendix A. Heuristics Search Algorithms for Triangulation 113 Appendix A Heuristics Search Algorithms for Triangulation A.l Minimal Width Heuristic Triangulation Algorithm A . l Minimal Width Heuristic Triangulation (MW) [47] Input: A connected graph Q = (V, £) Output: Triangulated graph Q = (V,£ U £ ) with w*{Q) < w*(Q ) and an elimination ordering a = (v\, • • •, v ) 1: St := 0 2: for v £ V do 3: factor(v) := [Neighbor(v)\ 4: M(v) := Neighbor(v) t t n 5: end for 6: S:=V 7: for i = 8: 9: 10-. 11: 12:13: 14:. 15: 16: 17: 18: |V| to 1 do u := arg min^gs factor(v) a(i) := u for x,y £ M(u) and (x y) £ £ U £ ) do £ :=£ U{x,y) M(x) := M(x) Uy and M(y) := M(y) Ua; t t t end for for u ; £ M ( u ) do M(tu) := M(w) \ u end for 5:=5\{u} end for t t Appendix A.2 A. Heuristics Search Algorithms for Triangulation 114 Minimal Induced Width Heuristic Triangulation Algorithm A.2 Minimal Induced Width Heuristic Triangulation (MIW) [37] Input: A connected graph Q = (V,£) Output: Triangulated graph Q = ( V , £ u £ t ) with w*{Q) elimination ordering a — (vi, • • •, v ) t n := 0 for v € V do 1: S t 2: 3: 4: M(v) : = Neighbor(v) end for V 5: S := 6: for = |V| to 1 do % 7: 8: u := argmin^gs\M(v)\ 9: for a(i) := u x,y e M(u) and (x, y) <£ SU S ) S := S U [x,y) 10: 13: end for for w G M ( u ) do 14: M(w) 15: end for 16: 17: do t M ( z ) := M ( x ) U y and M ( y ) := M(j/) Ux 11: 12: t t S end for := M ( w ) \ u :=S\{u} < w*{Qt) and an Appendix A.3 A. Heuristics Search Algorithms for Triangulation 115 Minimal Fill-in Heuristic Triangulation Algorithm A . 3 Minimal Fill-in Heuristic Triangulation (MF) [47] Input: A connected graph Q = (V, £) Output: Triangulated graph Q = {V,£u£ ) elimination ordering a = • • •, v ) t t with w*{Q) < w*(G ) and an n 1: £ := 0 t 2. for v G V do 3: 4: fillin(v) := 0 M(v) := Neighbor(v) 5: for each x, y G M(u) and (x, y) ^ f do 6: fillin(v) := fillin(v) 7: end for 8: end for 9: 5 : = V 10: for i = |V| to 1 do 11: e 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: +1 it := a r g m i n „ s fillin(v) <r(i) := u • for x,y € M(u) and (a;, y) <£ £ U £ ) do f := 5 U t t £ M ( i ) := M ( s ) U y and M(i/) := JW(i/) U i end for for w G M(u) do M(w) := M{w) \ u end for S:=S\{u} end for t 116 Appendix A. Heuristics Search Algorithms for Triangulation A.4 Minimal Induced Fill-in Heuristic Triangulation Algorithm A.4 Minimal Induced Fill-in Heuristic Triangulation (MIF) [37] Input: A connected graph Q = (V, £) Output: Triangulated graph Q = (V,£u£ ) elimination ordering a = (vi, • • •, v ) 1: S := 0 2: for v £ V do 3: fillin(v) := 0 4: M(v) := Neighbor(v) t t with w"{Q) < w*{g ) and an t n t 5: 6: 7: 8: for each x,y £ M(u) and := fillin(v) fillin(v) ^ if do + 1 end for end for 9: 5 : = V 10: for i = |V| to 1 do 11: u :— arg m i n „ s fillin(v) 12: a(i) := u 13: for I , J 6 M ( u ) and (x, y) £ £ U £ ) do 14: S := £ U (a, j/) 15: M ( x ) := M ( x ) U 2/ and M(y) :- M(y) U x 16: end for 17: for W £ M(u) do 18: M(w) := M(w) \ u 19: fillin(v) = 0 e t t 20: t for each x,y £ M(io) and [x,y) £ £ LI £ ) do 21: fillin(v) := fillin(v) 22: end for 23: end for 24: S:=S\{u} 25: end for t +1 Appendix A. Heuristics Search Algorithms for Triangulation A.5 117 Lexicographic B F S Heuristic Triangulation, variant Perfect A l g o r i t h m A . 5 Lexicographic B F S Heuristic Triangulation, variant Perfect ( L E X P ) [51] Input: A connected graph Q — (V, f ) , arbitrary vertex v* € V O u t p u t : Triangulated graph Gt = (V, £ U ft) with w*(G) < w*{Gt) and an elimination ordering u = (v\, • • •, v ) 1: ft := 0 2: for v £ V do 3: label(v) := 0 n 4: end for 5: 5 ~ - V 6: for i = |V| to 1 do 7: if i == j Vj t h e n 8: u := V* 9: else 10: 11: 12: u := arg m a x s label(v} u e e n d if 13: 14: 15: o-(i) := u fbri,fce{t-rl,-",|V|},jVfcdo if (a(i), a(j)), (a(i), a(k)) e f and (a(j), a(fc)) £ f U ft then ft : = f t U ( a ( j ) , a ( / e ) ) 16: 17: e n d if e n d for 18: 19: 20: S:=S\{u} for u> £ Neighbor(u) fl 5 do label(iv) := label(w) U {i} 21: end for 22: end for Appendix A.6 A. Heuristics Search Algorithms for Triangulation 118 Lexicographic BFS Heuristic Triangulation, variant Minimal A l g o r i t h m A . 6 Lexicographic B F S Heuristic Triangulation, variant Minimal ( L E X M ) [51] Input: A connected graph Q = (V, £), arbitrary vertex v* € V O u t p u t : Triangulated graph Q = (V, £ U £ ) with w*(Q) < w*(Q ) and an elimination ordering a = (i>i, • • •, v ) 1: £t := 0 2: for » £ V d o 3: label(v) := 0 4: end for 5: 5 := V 6: for i = |V| to 1 do 7: if i == |V| then 8: u := V* ' 9: else 10: u :— argmax-ugs label(v) 11: end if 12: a(i) := u t t t n 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: for j,k G {i + 1,- • • , | V | } , J V k do if (a(i),Q(j)),(a(i),a(fc)) £ £ and (a(j), a(fc)) ^ £u£t then f :=f U(a(j').a(fc)) end if end for S:=S\{u} for w e S s.t. 3 path {u = vi, • • • ,Vk+i = w} in Q with G 5 and label(vj) < label(w) for j = 2,3, • • •, /c do label(w) := label(w) U {i} end for end for t t Appendix A.7 119 A. Heuristics Search Algorithms for Triangulation Maximum Cardinality Heuristic Triangulation A l g o r i t h m A . 7 Maximum Cardinality Heuristic Triangulation (MC) [56] Input: A connected graph Q = (V, £) O u t p u t : Triangulated graph Q = ( V , £ u £ ) with w*{Q) < w*{Q ) and an elimination ordering a = (i>i, • • • ,v ) 1: E := 0 2: for v & V do 3: counter(v) := 0 4: end for 5: S - V 6: for i = |V| to 1 do 7: w := arg max^gs counter(v) 8: a{i) : = u 9: forj,fce{i + l ---,|V|},i^fcdo 10: if (a{i), a(j)), (a(i), a(k)) G £ and (a(j), a{k)) £ £ U £ then t t t n t ) t 11: 12: 13: 14: 15: 16: 17: 18: £ t : = £ t U ( a ( j ) , a ( k ) ) end if end for 5:=S\{u} for w G Neighbor(u) fl S do counter(w) := counter(w) + 1 end for end for
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Generalized constraint-based inference
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Generalized constraint-based inference Chang, Le 2005
pdf
Page Metadata
Item Metadata
Title | Generalized constraint-based inference |
Creator |
Chang, Le |
Date Issued | 2005 |
Description | Constraint-Based Inference (CBI) is a unified framework that subsumes many practical problems in different research communities. These problems include probabilistic inference, decision-making under uncertainty, constraint satisfaction, propositional satisfiability, decoding problems, and possibility inference. Solving them efficiently is important for both research and practical applications. Along with the development of inference approaches for concrete CBI problems, researchers are increasingly aware that these problems share common features in representation and essentially identical inference approaches. As the first contribution of this thesis, we explicitly use the semiring concept to generalize various CBI problems into a single formal representation framework with a broader coverage of the problem space based on the synthesis of existing generalized frameworks. Second, the proposed semiring-based unified framework is also a single formal algorithmic framework that provides a broader coverage of both exact and approximate inference algorithms, including variable elimination, junction tree, and loopy message propagation methods. Third, we discuss inference algorithm design and complexity issues based on the abstract representations of CBI problems and inference algorithms. These discussions are examples of applying the abstract knowledge to the concrete application domains. Researchers from various fields can (1) study the most important common characteristics of various CBI problems without representation barriers; (2) analyze and compare different inference approaches; (3) borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) significantly reduce the amount of implementation work target ted previously at the individual problems. Finally, we present a software toolkit named the Generalized Constraint- Based Inference Toolkit in Java (GCBIJ) as the last contribution of this thesis. GCBIJ is the first concrete software toolkit that implements the abstract semiring approach to unify the CBI problem representations and the inference algorithms. Users can design their own task-specific semirings or simply use ones provided by the toolkit to solve their own concrete CBI problems through instantiating various already provided abstract inference algorithms. Users can also design their own inference algorithms on the abstract level and apply them to solve different problems. Furthermore, researchers can test, verify, compare, and analyze inference approaches based on this toolkit. The the experimental results based on GCBIJ show that the generalized CBI framework is a useful tool for both research and practical problem-solving. |
Extent | 6004839 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-12-03 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051112 |
URI | http://hdl.handle.net/2429/16241 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2005-0178.pdf [ 5.73MB ]
- Metadata
- JSON: 831-1.0051112.json
- JSON-LD: 831-1.0051112-ld.json
- RDF/XML (Pretty): 831-1.0051112-rdf.xml
- RDF/JSON: 831-1.0051112-rdf.json
- Turtle: 831-1.0051112-turtle.txt
- N-Triples: 831-1.0051112-rdf-ntriples.txt
- Original Record: 831-1.0051112-source.json
- Full Text
- 831-1.0051112-fulltext.txt
- Citation
- 831-1.0051112.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051112/manifest