Generalized Constraint-Based Inference by Le Chang B . E . , Tsinghua University, P.R.China, 2000 M . E . , Tsinghua University, P.R.China, 2002 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in The Faculty of Graduate Studies (Computer Science) T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A Apr i l 20, 2005 © L e Chang, 2005 Abstract ii A b s t r a c t Constraint-Based Inference (CBI) is a unified framework that subsumes many practical problems in different research communities. These problems include probabilistic inference, decision-making under uncertainty, constraint satisfac-tion, propositional satisfiability, decoding problems, and possibility inference. Solving them efficiently is important -for both research and practical applica-tions. Along with the development of inference approaches for concrete CBI prob-lems, researchers are increasingly aware that these problems share common fea-tures in representation and essentially identical inference approaches. As the first contribution of this thesis, we explicitly use the semiring concept to gener-alize various CBI problems into a single formal representation framework with a broader coverage of the problem space based on the synthesis of existing gen-eralized frameworks. Second, the proposed semiring-based unified framework is also a single formal algorithmic framework that provides a broader coverage of both exact and approximate inference algorithms, including variable elim-ination, junction tree, and.loopy message propagation methods. Third, we discuss inference algorithm design and complexity issues based on the abstract representations of CBI problems and inference algorithms. These discussions are examples of applying the abstract knowledge to the concrete application domains. Researchers from various fields can (1) study the most important common characteristics of various CBI problems without representation barri-ers; (2) analyze and compare different inference approaches; (3) borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) significantly reduce the amount of implementation work target ted previously at the individual problems. Finally, we present a software toolkit named the Generalized Constraint-Based Inference Toolkit in Java (GCBIJ) as the last contribution of this the-sis. G C B I J is the first concrete software toolkit that implements the abstract semiring approach to unify the CBI problem representations and the inference algorithms. Users can design their own task-specific semirings or simply use ones provided by the toolkit to solve their own concrete CBI problems through instantiating various already provided abstract inference algorithms. Users can also design their own inference algorithms on the abstract level and apply them to solve different problems. Furthermore, researchers can test, verify, compare, and analyze inference approaches based on this toolkit. The the experimental results based on G C B I J show that the generalized CBI framework is a useful tool for both research and practical problem-solving. Contents iii C o n t e n t s Abst rac t ii Contents iii L is t of Tables vi L is t of Figures vii L is t of A lgor i thms viii Acknowledgements ix 1 In t roduct ion 1 1.1 Motivation 1 1.2 Related Work 2 1.2.1 Generalized CBI Representations 2 1.2.2 Generalized Inference Algorithms 3 1.3 Thesis Outline 3 2 Background 5 2.1 Graphical Models for Constraint-Based Inference 5 2.1.1 Hypergraph Representations 5 2.1.2 Primal Graph Representations- 6 2.1.3 Junction Tree Representations 6 2.1.4 Factor Graph Representations 7 2.2 Topological Parameters to Characterize Graphs 8 2.2.1 Treewidth and Induced Width 8 2.2.2 Triangulated Graph and Treewidth Computing 9 2.2.3 Small World and Scale-Free Topology 10 2.3 The Basis of Abstract Algebra 12 2.3.1 Set, Group, and Semigroup 13 2.3.2 Ring, Semiring, and k-Semiring 13 2.4 Summary 16 3 A Semir ing-Based General ized Framework for C B I 18 3.1 A Semiring-Based Generalized Framework 18 3.2 Instances of the Semiring-Based Generalized Framework 20 Contents iv 3.2.1 Constraint Satisfaction Problems 20 3.2.2 Propositional Satisfiability Problems 22 3.2.3 Probability Inference Problems 23 3.2.4 Dynamic Bayesian Networks 25 3.2.5 Decision-Making Problems 25 3.2.6 Possibility Inference Problems 26 3.2.7 Coordination Graph 27 3.2.8 Maximum Likelihood Decoding 28 3.3 Summary 29 4 Exac t Inference A lgor i thms 31 4.1 Variable Elimination Algorithm 31 4.1.1 Generalized Variable Elimination Algorithm 31 4.1.2 Generalized V E Algorithm for k-Semiring 34 4.1.3 Instances of Variable Elimination Algorithm 34 4.1.4 Space and Time Complexity 35 4.1.5 Discussion of Applying the V E Algorithm 36 4.2 Junction Tree Algorithm 36 4.2.1 Generalized JT Algorithm 36 4.2.2 Generalized JT Algorithm for fc-Semiring 39 4.2.3 Generalized JT Algorithm for Idempotent Semirings . . . 39 4.2.4 Generalized JT Algorithm for Invertible Semirings . . . . 41 4.2.5 Instances of Junction Tree Algorithm 42 4.2.6 Space and Time Complexity 46 4.2.7 Discussion of Applying the JT Algorithm 52 4.3 Summary 54 5 Approx ima te Inference A lgor i thms 57 5.1 Motivation for the Design of Approximate Algorithms 57 5.2 Algebraic Approximation for V E and JT 57 5.2.1 Approximate Variable Elimination Algorithm 58 5.2.2 Approximate Junction Tree Algorithm 60 5.2.3 Discussion • 60 5.3 Thin Junction Tree Algorithm 61 5.3.1 Thinning through Variable Contraction 62 5.3.2 Generalized Thin Junction Tree Algorithm . 63 5.3.3 Discussion 64 5.4 Loopy Message Propagation 64 5.4.1 Building a Junction Graph 65 5.4.2 Generalized Loopy Message Propagation Algorithm . . . 66 5.4.3 Loopy Message Propagation As an Exact Inference . . . . 67 5.4.4 Discussion 68 5.5 Hybrid Junction Tree Algorithm 68 5.6 Summary 69 Contents v 6 Genera l ized C B I Toolk i t in Java - G C B I J 71 6.1 GCBIJ Motivation 71 6.2 GCBIJ System Architecture 72 6.3 GCBIJ Implementation and Interfaces 74 6.3.1 Semiring 74 6.3.2 Parser 75 6.3.3 CBITask 76 6.3.4 Inference 76 6.3.5 Graph 79 6.3.6 Utilities 80 6.3.7 GCBLLMa in . 81 6.4 Summary 81 7 G C B I J App l ica t ions : Case Studies 83 7.1 Test Suites 83 7.2 Studies of Heuristic Triangulation Algorithms 83 7.3 Empirical Studies of Exact Inference Algorithms 86 7.3.1 Exact Inference Algorithms with Invertible Semiring . . . 86 7.3.2 Exact Inference Algorithms with Idempotent Semiring . 88 7.3.3 Exact Inference Algorithms for General CBI Problems . . 89 7.4 Empirical Studies of Approximate Inference Algorithms 89 7.4.1 Approximate V E and JT for CSPs 89 7.4.2 Approximate V E and JT for Probability Inference . . . . 92 7.5 Loopy Message Propagation As Exact Inference Algorithm . . . 95 7.6 Impact of Small World Topology . 96 7.6.1 Small World Topology of CBI problems . . , 96 7.6.2 Inference in the Small World 99 7.7 Summary 101 8 Conclusions and Future W o r k 102 8.1 Conclusions . . . 102 8.2 Possible Improvements to Inference Approaches 103 8.2.1 From Variables to Values 103 8.2.2 From Systematic to Stochastic 105 8.3 Future Work 105 B ib l iography 107 A Heur is t ics Search A lgor i thms for Tr iangulat ion 113 A . l Minimal Width Heuristic Triangulation 113 A.2 Minimal Induced Width Heuristic Triangulation .114 A.3 Minimal Fill-in Heuristic Triangulation . . 115 A.4 Minimal Induced Fill-in Heuristic Triangulation 116 A.5 Lexicographic BFS Heuristic Triangulation, variant Perfect . . . 117 A.6 Lexicographic BFS Heuristic Triangulation, variant Minimal . . . 118 A.7 Maximum Cardinality Heuristic Triangulation 119 List of Tables vi L i s t o f Tab l e s 2.1 Heuristic Triangulation Algorithms 11 2.2 Summary of Different Commutative Semirings 16 4.1 Computation Costs of Two Equivalent Tasks . 31 4.2 . V E and JT Complexity Comparisons 53 4.3 Upper Bounds of V E and JT Running Times 54 4.4 Running Times for Different Operations in Java 54 4.5 Concrete V E and JT Algorithms in Different Fields 56 6.1 Public Methods of the Class Semiring 74 6.2 Public Methods of the Interface Ildempotent 75 6.3 Public Methods of the Interface IReversible . . 75 6.4 Public Methods of the Class Parser 76 6.5 Public Methods of the Class InferAlg 77 6.6 Public Methods of the Class ApproxInferAlg 79 6.7 Public Methods of the Class Cluster 82 7.1 The Basic Information of Test Suites 84 7.2 Upper Bounds of Treewidth for Graphs, Different Heuristic Searches 85 7.3 Running Times of Exact Inference Algors. with Inverses 88 7.4 Running Times of Exact Inference Algors. with Idempotency . . 89 7.5 Running Times of General Exact Inference Algorithms 89 7.6 The Maximum Number of Satisfiable Constraints for a CSP . . . 90 7.7 Parameters of the Small World Topology for Graphs 98 8.1 Detecting Elements of Some Commutative Semirings 105 List of Figures vii List of Figures 2.1 Graphical Representations of an Example CBI Problem 7 2.2 Axioms in the Definitions of Group and Related Concepts . . . 14 2.3 Axioms in the Definitions of Ring and Related Concepts 15 5.1 Graphical Interpretation of Approximate V E 59 5.2 Graphical Interpretation of Approximate JT 60 6.1 System Architecture of GCBIJ 73 7.1 Upper Bounds of Treewidth, Based on Various Heuristics . . . . 86 7.2 Bayesian Network of the Diagnosis Problem 87 7.3 Accuracy and Running Time of the Approx. V E for a CSP . . . 91 7.4 Accuracy and Running Time of the Approx. JT for a CSP . . . 92 7.5 Bayesian Network of the Insurance Problem 93 7.6 Marginal,of the Approx. V E for a Probability Inference Problem 94 7.7 Marginal of the Approx. JT for a Probability Inference Problem 95 7.8 Accuracy and Time Complexity of loopy M P for a CSP . . . . . 96 7.9 Small World Parameter Distributions for the Pigs Network . . . 97 7.10 Small World Parameter Distributions for the Link Network . . . 99 7.11 Small World Parameters for Different Re-written Probabilities . . 100 7.12 Time Complexity of loopy M P in the Small World 100 List of Algorithms vi i i L i s t o f A l g o r i t h m s 2.1 Triangulating a Graph, Given an arbitrary Ordering ( R A N D ) . . 10 2.2 Small World Modelling [59] ' . . 12 2.3 Scale-Free Modelling 12 4.1 Generalized Variable Elimination Algorithm (GVE) 33 4.2 Generalized Bucket Elimination Algorithm (GBE) 33 4.3 Generalized V E Algorithm for k-Semiring (kGVE) . 34 4.4 Generalized J T Algorithm (GJT) . 38 4.5 Generalized J T Algorithm for k-Semiring (kGJT) . 40 4.6 Generalized J T Algorithm for Idempotent Semirings (GJT-Idemp) 41 4.7 Generalized J T Algorithm for Invertible Semirings (GJT-Inv) . . 43 4.8 Generalized J T Algorithm for LS Architecture (GJT-LS) 44 4.9 Generalized J T Algorithm for H U G I N Architecture (GJT-HUGIN) 45 5.1 The Approximate Marginal of Combination Algorithm (ApproxMC) 58 5.2 Generalized Approximate V E Algorithm (GVE-Approx) 59 5.3 Generalized Approximate J T Algorithm (GJT-Approx) 61 5.4 Variable Contraction (Contract(v, C\)) 62 5.5 Generalized Thin Junction Tree Algorithm (GThinJT) 63 5.6 Building a Junction Graph for a C B I Problem 65 5.7 Generalized Loopy Message Propagation Algorithm (GLoopy) . . 66 5.8 Generalized Loopy Message Propagation Algorithm with Inverses (GLoopy-Inv) 67 5.9 Hybrid Junction Tree Algorithm (HybridJT) 70 A . l Minimal Width Heuristic Triangulation (MW) [47] 113 A.2 Minimal Induced Width Heuristic Triangulation (MIW) [37] . . 114 A.3 Minimal Fi l l - in Heuristic Triangulation (MF) [47] 115 A.4 Minimal Induced Fi l l - in Heuristic Triangulation (MIF) [37] . . '. 116 A.5 Lexicographic B F S Heuristic Triangulation, variant Perfect ( L E X P ) [51] . . . . . . . . 117 A.6 Lexicographic B F S Heuristic Triangulation, variant Minimal ( L E X M ) [51] 118 A.7 Maximum Cardinality Heuristic Triangulation (MC) [56] 119 A ckn owledgem en ts A c k n o w l e d g e m e n t s I would like to thank my supervisor, Professor Alan Mackworth, for his guidance and support in this work. I would not be here without his inspiration and encouragement. Many thanks also to Professor David Poole and Professor Kevin Leyton-Brown for their valuable comments. I would also like to thank Ben D'Andrea for his hard work on my thesis proof-reading. Especially, I would like to give my special thanks to my wife Ming Yan whose patient love enabled me to complete this work. Chapter 1. Introduction 1 Chapter 1 I n t r o d u c t i o n 1.1 Motivation Constraint-Based Inference (CBI) is a general term covering many different problems in several research communities. It consists of discovering new con-straints from a set of given constraints over individual entities. These new constraints reveal previously undiscovered properties of those entities. Practical problems from many different fields can be seen as constraint-based inference problems, including probabilistic inference, decision-making under un-certainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference. A constraint here can be seen as a function that maps possible value assignments to a specific value domain. Along with the development of inference approaches for concrete CBI prob-lems, researchers are increasingly aware that these problems share common ab-stract representation features. Many inference algorithms, described differently, implicitly have essentially identical ideas underlying them. Understanding the common features and characteristics of CBI representations helps research com-munities exchange ideas and design more efficient inference algorithms. The purpose of this thesis is to use the algebraic semiring concept to general-ize a wide range of CBI problems and different exact and approximate inference algorithms into a single formal framework based on the synthesis of existing generalized representation and algorithmic framework from different fields. We aim to analyze the common characteristics of inference algorithms based on the abstract problem representation, then apply the abstract knowledge learnt from the unified framework to improve the concrete inference algorithm design in specific application domains. To demonstrate the representation power of the proposed semiring-based unified framework, a toolkit named Generalized Constraint-Based Inference. Toolkit in Java (GCBIJ) is implemented. It can be used to verify, test, compare, and analyze generalized approaches for CBI problems. The flexible architecture of GCBIJ enables implemented generalized inference algorithms to be applied to solve concrete problems simply through specifying different semirings. GCBIJ 's extensibility enables users to design their own task-specific semirings and use available generalized inference algorithms. This not only demonstrates the fea-sibility of using semirings to unify the CBI problem representations and the various inference algorithms, but also shows that GCBIJ is a suitable platform for both research and practical problem-solving. Chapter 1. Introduction 2 1.2 Related Work Generalized representation and inference algorithms for CBI problems have been studied for the past ten years. Al l of these studies are based on the following observation: there are two essential operations in constraint-based inference: (1) combinat ion, which corresponds to an aggregation of constraints, and (2) marginal izat ion, which corresponds to the projection of a specified constraint onto a narrower domain. Different generalized representations express these two operations through various tools or notions. Given the distributivity property of the two operations, generalized inference algorithms can be abstracted that exploit that property. The semiring-based unified framework proposed in this thesis is developed based on the synthesis of these generalized representation and algorithmic frameworks. 1.2.1 Generalized C B I Representations Semiring-based CSP (SCSP) [6] is a generalized representation framework for soft constraint programming, which includes max CSPs, fuzzy CSPs, weighted CSPs, and probabilistic CSPs. It generalizes these soft constraint programming problems, as well as classic CSPs, into a single abstract framework. The c-semiring, a commutative semiring with the additive idempotency property, is the key notion in SCSP that represents both the combination and marginalization operations. The additive idempotency property excludes general CBI problems like probabilistic inference from the SCSP framework. Given that the CSP is an instance of the generalized CBI problem, the SCSP framework inspired us to propose the semiring-based unified framework in this thesis. In addition to the generalized representation of hard and soft CSPs, SCSP also generalizes the soft local consistency approaches. Extended from local consistency techniques for classic CSPs, generalized soft local consistency can be instantiated to solve a wide range of soft constraint programming problems. The success of the SCSP framework [5] demonstrates that a generalized problem representation provides opportunities to migrate an existing approach for one problem to solve another problem, if the two problems have the same abstract representation. That is our motivation for proposing a semiring-based unified framework to represent CBI problems from other disciplines as well as constraint programming. The valuation algebra framework [35] is an abstract framework for describing probabilistic inference problems in expert systems. It is inspired by the formu-lation of simple axioms that solve global problems through local computing. Knowledge or information in the framework is called valuation. Inferences are made by computing the marginal of a combination of several valuations. The framework of the valuation algebra is abstracted to include many different for-malisms. Bayesian networks, Dempster-Shafer belief function models, Spohn's epistemic belief models, and possibility theory are shown [35] to fit into the valuation algebra framework. The authors also suggested that constraint satis-faction problems, propositional logic, and discrete optimization problems could be expressed in the valuation algebra framework. Chapter 1. Introduction 3 1.2.2 Generalized Inference Algori thms Many algorithms have been proposed for probabilistic inference. The variable elimination algorithm [64] and the junction tree algorithms [55] [39] [29] were among the first inference algorithms for probability inference in Bayesian net-works. Originating in dynamic programming, the variable elimination and junc-tion tree algorithms explicitly use the distributivity of arithmetic additive and multiplicative operations. The generalization of the bucket elimination algo-rithm [14], a variant of variable elimination, can be used in belief assessment, most probable explanation, maximum a posteriori hypothesis, and maximum expected utility problems. The bucket elimination approach was also applied in constraint satisfaction problems. It was proved that the generalized bucket elim-ination algorithm is applicable to both probabilistic and deterministic inference [15]. The Generalized Distributive Law (GDL) [1] is a general message-passing al-gorithm, a synthesis of work in the information theory, signal processing, statis-tics, and artificial intelligence communities. GDL generalizes the computational problems as "Marginalize a Product Function" (MPF) problems through using commutative semirings to represent the combination and marginalization oper-ations. G D L can be seen, in the A l community, as a junction tree algorithm. As a semiring-based generalized inference algorithm, G D L can represent a wide range of inference algorithms. More recently,'a unified algorithmic framework was proposed [32] to rep-resent tree-decomposition algorithms as a generalized bucket-tree elimination algorithm for automated reasoning tasks. Different from GDL, the general-ized bucket-tree elimination does not rely on semirings. Abstract concepts are used to represent combination and marginalization operations, though the dis-tributivity is also the key idea behind the algorithm. Using those concepts, the message-passing within the tree structure can be abstractly represented. The authors also suggested using their generalized algorithmic framework for constraint satisfaction problems. 1.3 Thesis Outline This thesis is organized as follows: related definitions and background for CBI problems, graphical models, and abstract algebra are introduced in Chapter 2. In Chapter 3 we present a semiring-based unified framework for CBI prob-lems. We also show that a series of problems from different fields can be seen as instances of our proposed framework. In Chapter 4, two popular exact in-ference algorithms, variable elimination algorithm and junction tree algorithm, are incorporated into the semiring-based unified framework. Instances of these algorithms are discussed. Theoretical and empirical complexities of these algo-rithms are also discussed. It is well known that complexities of exact inference algorithms are exponential in the parameters of their underling graphical repre-sentations. For many practical CBI problems with intractable parameters, we Chapter 1. Introduction 4 will resort to approximate techniques. We discuss approximate inference algo-rithms within our abstract framework in Chapter 5. To put the ideas of the semiring-based unified framework into concrete form, a Generalized Constraint-Based Inference Toolkit in Java (GCBIJ) is designed and implemented. We present the design specification of GCBIJ in Chapter 6. We also discuss the use and limitations of GCBIJ in this chapter. In Chapter 7, a series of exper-iments are conducted based on GCBIJ platform, which include problems from probability inference, CSP, SAT, and MaxSAT. Although our experimental re-sults are not totally new for research communities, as a whole they verify the feasibility of using the semiring to generalize the representations of various CBI problems. Generalized exact and approximate inference algorithms, based on the proposed semiring-based unified framework, are proven suitable for solving concrete applications. Finally, conclusions and future work appear in Chapter 8. Chapter 2. Background 5 Chapter 2 B a c k g r o u n d 2.1 Graphical Models for Constraint-Based Inference Generally speaking, inference means discovering previously unknown facts from a set of given facts (or knowledge). In an inference problem, knowledge can be represented by a set of (hard or soft) constraints on individual entities. Usually individual entities are represented by variables. In this thesis, a constraint-based inference problem is characterized in terms of a set of variables with values in a finite domain and a set of constraints on these variables. The inference task for a CBI problem is then to make explicit new constraints over some variables, based on the given constraints. Here the new constraints could specify the property of one variable (unary), several variables, or the whole problem. An example of CBI problem is given in Example 2.1. The formal definitions of a CBI problem and related tasks appear in Chapter 3. Examp le 2.1 Consider a constraint-based inference problem with 5 variables Vi,---,Vs, Vi € {0,1}. There are 3 constraints defined over these variables: fi(Vi, V2, V3), / 2 ( V 2 , V 3 , Vi) and fo(Vz, V 5 ) , which specify the set of tuples per-mitted by these constraints respectively. An inference task in this example is to discover tuples permitted by the derived constraint over V2 and V3. In the following chapters of this thesis, we use uppercase letters to denote variables and lowercase letters to denote values in the domain of a variable. Constraints over variables, usually represented as functions, are denoted by lowercase letters as well. Given a function / , Scope(f) denotes the subset of variables that appear in the domain of / : For convenience, we also use uppercase letters to denote subsets of variables. The meaning of uppercase letters can be easily distinguished given the context. 2.1.1 Hypergraph Representations In many cases, we can use graphs to represent CBI problems. The most straight-forward graphical representation is the hypergraph representation, where nodes represent variables and hyperarcs represent constraints. Given a CBI problem as described in Example 2.1, the corresponding hy-pergraph representation is shown in Figure 2.1(a). Chapter 2. Background 6 2.1.2 Primal Graph Representations Although the hyper-graph representation is straightforward, it is hard to repre-sent practical CBI problems using hyperarcs in graphical representations. In a primal graph the hyperarcs in a hypergraph are replaced by cliques. The primal graph representation of a CBI problem is an undirected graph Q = (V, £), where V = {Vi, • • •, Vn} is a set of vertices, each vertex Vj corresponding to a variable Xi of the CBI problem; and £ = {(Vi, Vj)\Vi, Vj G V} is a set of edges between Vi and Vj. There exists an edge (Vi, Vj) if and only if corresponding variables Xi and Xj appear in the scope of the same constraint. A moralized graph of the Bayesian Network (BN), which is obtained by adding edges among vertices with the common child vertex in the corresponding B N , is an example of a primal graph representation of probabilistic inference problems. A constraint graph of binary CSP is another example of a primal graph representation. In this thesis, we use Neighbors(Vi) to denote the set of vertices that have edges to Vj and Family{Vi) = Neighbors(Vi) U {Vi}. Given a CBI problem as described in Example 2.1, the corresponding primal graph is shown in Figure 2.1(b). 2.1.3 Junction Tree Representations Sometimes the primal graph of a CBI problem is re-organized as a secondary structure to achieve better computational efficiency. The junction tree is a widely used secondary structure in graphical models. A junction tree is an undirected graph T = (C,S). C = {C i , • • •, C„ } is a set of clusters, where each cluster d corresponds to an aggregation of a subset of vertices Vd Q V in the primal graph Q = (V, £). S = {Sij, •, Sim} is a set of separators between clusters, where Sij is the separator of clusters Ci and Cj, corresponding to the vertices of Vct fl Vct • In addition, the following junction tree properties have to be satisfied: 1. Singly connected property: T = (C,S) is a tree; 2. Running intersection property: VC j ,C j £ C , Vct D VCJ Q Vck holds for any cluster Ck on the path between Cj and Cj\ 3. Constraint allocation property: For any constraint / of the CBI problem, 3d € C s.t. Scope(f) C d. Typically a junction tree is undirected. In some computational schemes, we can pick one cluster as the root of the tree and assign directions to all separators. A separator Sij = (Ci,Cj) has a direction from Ci to Cj if Ci is in the path from the root to Cj. For each cluster C j , Parent(Ci) denotes the cluster that points to d; Children(d) denotes the set of clusters that Cj points to. Also, we use Root(T) to denote the root of junction tree T. Given a CBI problem as described in Example 2.1, a corresponding junction tree representation is shown in Figure 2.1(c). Chapter 2. Background 7 (c) (d) Figure 2.1: Graphical Representations of an Example CBI Problem, (a) Hy-pergraph (b) Primal graph (c) Junction tree (d) Factor graph 2.1.4 Factor Graph Representations The factor graph [38] is another graphical representation for CBI problems, widely used in information theory research. A factor graph is a bipartite graph that expresses the structure of the factorization. A factor graph has a variable node for each variable, a factor node for each constraint, and an edge connecting a variable node to a factor node if and only if the variable is in the scope of the constraint. Given a CBI problem as described in Example 2.1, the corresponding factor graph representation is shown in Figure 2.1(d). Chapter 2. Background 8 2.2 Topological Parameters to Characterize Graphs In the following chapters we will find that the success of different inference algorithms for the inference task for a CBI problem largely depends on the characteristics of its corresponding graphical representation. In this section, related topological parameters to characterize a graph are introduced. 2.2.1 Treewidth and Induced Width Treewidth, or equivalently, induced width [22], has been proved to be an important notion in computational complexity theory. Many AfV-h&rd graph problems, such as the maximum independent set problem and the Hamiltonian cycle problem, can be solved in polynomial time for graphs with treewidth bounded by a constant. We will show in the following chapters that the time and space complexities of generalized inference algorithms for CBI problems are polynomial in the size of graphical representations with bounded treewidth and have a constant factor which is exponential in the treewidth of primal graphs. Induced Width Definition 2.2 (Induced Width) Given an undirected graph Q = (V, £) and an ordering <r = (V\, - • • ,Vn) of vertices in V, the ordered graph is a pair {G,cr). The width of a vertex in an ordered graph (Q, a) is the number of its lower-indexed neighbors in a. The width of an ordered graph, denoted by w(a), is the maximal width of all vertices. The induced graph of an ordered graph (G,o~), denoted by {Q*,a), is ob-tained by processing the vertices in V recursively, from the vertex with the largest index to the vertex with the smallest index; when vertex Vi is processed, all its lower-indexed neighbors are connected to each other. The induced width w*(a) of an ordered graph (G,a) is the width of the induced graph {Q*,o). • The induced width of a graph Q, denoted by w*(Q), is the minimum in-duced width for all possible orderings of Q. According to the definition, computing the induced width of a graph Q is to find an ordering a with a minimum induced width of the ordered graph {Q,cr). In this thesis, we also call such an ordering an elimination ordering, because it corresponds to eliminating a vertex after connecting all its lower-indexed neighbors. Treewidth Treewidth [50] is the measure for the study of tree decomposition in the context of graph minor research, it is proved to be an equivalent concept of induced width [22]. Chapter 2. Background 9 Definition 2.3 (Treewidth, Robertson and SeymourfSO]) A tree decomposition of graph Q = ( V , £) is a pair ( T , X), where T = ( C , S) is a tree with vertex set C and edge set S, and X = {Xc,c € C} is a family of subsets of V, one for each vertex of T, such that 1- U c 6 C * c = V , 2. for every edge (v^, vj) € £, there is a c 6 C with V{ '€ Xc and Vj € Xc, and 3. for all i, j, k S C, if k is on the path from i to j in T, then Xi D Xj C X^ • The width of a tree decomposition is maxcSc\Xc\ — 1. The treewidth of a graph Q = (V,£), denoted by w*{Q), is the minimum width over all possible tree decompositions of Q. Once these definitions are compared, it is obvious that the junction tree rep-resentation is a tree decomposition of the primal graph of a given CBI problem. 2.2.2 Tr iangulated G r a p h and Treewidth C o m p u t i n g A graph is triangulated if every cycle of length at least four contains a chord, that is, two non-consecutive vertices on the cycle are adjacent. Triangulated graphs are also called chordal graphs due to the existence of a chord in every cycle. The triangulated graph of graph Q = (V, £) is obtained by adding edges St to Q such that Qt = (V, £ U £t) is a triangulated graph. There exists an equivalent relation between treewidth of a graph and maxi-mum clique size in its triangulated graph [8]: Theorem 2.4 (Bodlaender [8]) Let Q = (V, £) be a graph, and let k > 0. The following statements are equivalent: 1. The treewidth of Q is at most k. 2. The maximum clique size of an triangulated graph of Q is at least k + 1. Corollary 2.5 Computing the treewidth of graph Q is equivalent to finding an optimal triangulated graph of Q, with minimum max-clique size. Computing the treewidth or the induced width of a graph, as well as find-ing an optimal triangulated graph, is Af'P-hard [2] in general. For fixed k, there exists some linear time algorithms [7] to determine whether a graph has treewidth k. However, these algorithms are very limited because the running time contains a large constant that is exponential in k. There also exist some approximate algorithms with factors 4, 4~ and 0(log(n)) [21] which can be cal-culated in polynomial time, even though they contain a large constant which is exponential in k as well. The maximum clique size (minus one) of any optimal triangulated graph of Q provides an upper bound of the treewidth of Q. Although having no guarantee of Chapter 2. Background 10 Algorithm 2.1 Triangulating a Graph, Given an arbitrary Ordering (RAND) Input: A connected graph Q — (V, £), an ordering a = (v\, • • • ,vn) Output: Triangulated graph Qt = (V,£ U£ t ) with w*(5) < w*(£t) = w 1: for each v G V do 2: M ( D ) : = Neighbor(v) 3: end for 4: £ t := 0, w := 0 5: for i = 1 to i = |V| do 6: if \M(vi)\ > w then 7: W := | M ( v ; ) | 8: end if 9: for each x,y £ M(vi) and (x, y) ^ £ U £t do 10: . £t := £tU [x,y) 11: M{x) M(x) Uy and M{y) := M{y)Ux 12: end for 13: for each u G M ( V J ) do 14: M(u) := M(u) \Vi 15: end for 16: end for returning an optimal triangulation, we can use heuristic triangulation algorithms (which are linear and do not contain exponentially large constans) to get the upper bounds of the treewidth. Given an arbitrary elimination ordering a = (v\, • • •, vn), a graph Q = (V, £) can be triangulated according to the procedure in Algorithm 2.1 [47]: Appendix A lists several heuristic triangulation algorithms. Basically they use various heuristics to find an elimination ordering, then triangulate the graph according to Algorithm 2.1. The time complexities of these algorithms are compared in Table 2.1. n = |V| is the number of vertices and m' = |£| + |£t| is the number of edges in the triangulated graph. Chapter 7 presents empirical comparison of different heuristic triangulation algorithms. Details of heuristic triangulation algorithms and their empirical evaluations can be found in [33], 2.2.3 Small World and Scale-Free Topology Small world topology [59] exists widely in graphical representations of many real world problems. In a graph with small world topology, vertices are highly clustered yet the path lengths between them are small. By comparison, random graphs with a similar number of vertices and edges have short path lengths but little clustering, while regular graphs like lattices tend to have high clustering but large path lengths. Watts and Strogatz [59] shows that small world topologies exist widely in social graphs (e.g., collaboration graph of actors in feature files), biological graphs (e.g., the neural network of the nematode worm C.elegans), and man-made graphs (e.g., the electrical power grid of the western United States). A Chapter 2. Background 11 Table 2.1: Heuristic Triangulation Algorithms Alg. Heuristic Short Time 2.1 Random Ordering [47] R A N D 0(n + m') A . l Minimal Width [47] M W 0(n + m') A.2 Minimal Induced Width [37] MIW 0(n + m') A.3 Minimal Fill-in [47] M F O(nm') A.4 Minimal Induced Fill-in [37] MIF 0 (m' 2 ) A.5 Lexicographic BFS , var. Perfect [51] L E X P 0 (n + m') A.6 Lexicographic BFS , var. Minimal [51] L E X M O(nm') A.7 Maximum Cardinality [56] M C 0(n+m') small world topology may have a significant impact on the behavior of dynamical systems. Walsh [58] shows that the cost of solving search problems with small world topologies has a heavy-tailed distribution. In this thesis, we are interested in how the small world topology affects the cost of inference in different CBI problems. Given an undirected graph Q = (V, £), the parameters to characterize the small world topology include: • L: The averaged path length over all pairs of vertices; • Lrand'- The averaged path length over all pairs of vertices in a random graph Qrand = ( V r o n ^ £ r o n d ) w i t h |V| = | V r o n d | and |£| = |£ r a „d | ; • C: The averaged clustering coefficient of all vertices, where the clustering coefficient of a vertex is defined as the fraction of the number of existing 'edges among the neighbors of the vertex, and maximum possible edges among them (a vertex with k neighbors will at most have k(k — l ) /2 edges among its neighbors); • Crand'- The averaged clustering coefficient of all vertices in a random graph Grand = ( V r a n d , £ r a n d ) w i t h |V| = | V r a n d | and |£| = | £ r 0 n d | ; . • (j,: ratio oiC/L normalized by CrandlLrand-Watts and Strogatz [59] propose a procedure to generate graphs with small world topologies. It can be described as Algorithm 2.2. Chapter 7 presents empirical results of small world topology parameters of different CBI problems and their impacts on inference algorithms. Another typical real world topology of the graphical representation of CBI problems is the scale-free network [3], where the degree distribution has a power-law form. The distribution is not related to the scale of the network. In other words, scale-free networks have a large number of vertices with a few connecting edges and a small number of vertices with many connecting edges. It is shown [3] that scale-free networks exists widely in the real world. The Internet, W W W , and metabolic networks are all examples of scale-free networks. Chapter 2. Background 12 Algorithm 2.2 Small World Modelling [59] Input: Number of vertices n, average degree k, rewritten probability p Output: A graph with Small World topology 1: Generate vertices V i , • • •, V n ; 2: for i = 1 to n do 3: for j = 1 to k do 4: Uniformly generate a random number p0 6 [0,1); 5: if po < V then 6: k := i + -V1 • \j/2]{modn) 7: else 8: Uniformly generate a random number k £ {1, • • •, n) \ {i} 9: end if 10: Add an edge between Vi and Vk\ 11: end for 12: end for Algorithm 2.3 generates a graph with scale-free topology. This algorithm is based on the definition of scale-free topology in [3], Empirical results in Chapter 7 show that many scale-free networks are also in small world topology. In other words, scale-free networks usually exhibit high clustered vertices and small path lengths. In contrary, a graph with the small world topology does.not necessarily have a power-law form degree distri-bution. For example, the degree distribution of a small world graph generated by Algorithm 2.3 is uniform. 2.3 The Basis of Abstract Algebra As mentioned in Chapter 1, there are two essential operations in real world CBI problems: (1) combination, which corresponds to an aggregation of con-Algorithm 2.3 Scale-Free Modelling Input: Number of vertices n, number of edges m Output: A graph with Scale-Free topology i: Generate vertices V i , • • •, V n ; 2: Let di be degree of the vertex Vi 3: for i — 2 to n do 4: Connect Vj to Vj according probability pj = dj/J2k dk, k = {I, - • • 5: end for 6: for i = l to m — n + 1 do 7: Uniformly choose i € {1, • • •, n} 8: Connect Vj to Vj according probability pj = dj/J2k^k, k = {1, •••,«•} \ W ; 9: end for Chapter 2. Background 13 straints, and (2) marginalization, which corresponds to focusing of a specified constraint to a narrow domain. These two operations provide possibilities of us-ing algebraic structures to generalize existing inference algorithms from different fields. Algebraic structures are also useful tools for exploring the commonality across algorithms. More specifically, the proposed unified framework for CBI problems is based on the semiring, an important notion in abstract algebra. This section introduces basic algebraic structures and related variants. 2.3.1 Set, Group, and Semigroup Definition 2.6 (Set) A set is defined as an arbitrary collection of objects or elements. There are no predefined operations between elements in a set. The number of elements in the set is referred to as the cardinality of the set. Definition 2.7 (Group) Let G be a set and • be a binary operation defined on G. (G, •) is a group if the operation satisfies the following four conditions: • Closure: a • b = c G G, Va, b G G; • Associativity: (a • b) • c = a • (b • c), Va, o, c G G; • Identity: 31 G G s.t. a • 1 = 1 • a = a, Va G G; • Inverses: Va 6 G, 3 a - 1 € G, s.t. a • a - 1 = 1 A group (G, •) is said to be commutative, or Abelian, if the operation • satisfies one more condition: • Commutativity: a • b = b • a, Va, b € G; A semigroup is a group that does not need an identity element and whose elements do not need inverses within the semigroup. Only closure and associa-tivity hold for a semigroup. A semigroup with an identity is called a monoid. A semigroup is commutative if the associated binary operation is commutative. Figure 2.2 shows relations among axioms in group and related notions. 2.3.2 Ring, Semiring, and k-Semiring Definition 2.8 (Ring) Let R be a set. Let © and <g> be two closed binary operations defined on R. Here we assume operation ® has precedence over operation ©. (R, ©, <g>) is a ring if the operations satisfy the following axioms: • Additive associativity: Va, b,c € R, (a © b) © c = a © (b © c); • Additive commutativity: Va, b G R, a © b = b © a; • Additive Identity: 30 G R, s.t. Va G R, a © 0 = 0 © a = a; • Additive Inverses: Va G R, 3—a G R, s.t. a © — a = —a © a = 0; • Multiplicative associativity: Va, b,c G R, (a ® b) ® c = a ® (b <S> c); Chapter 2. Background 14 C o m m u t a t i v e G r o u p C o m m u t a t i v e M o n o i d C o m m u t a t i v e S e m i g r o u p <S#AssoGiatiyity>«;g) S e m i g r o u p • \ I M o n o i d J >- ' - % i : : A ! t " -^ ' G r o u p Figure 2.2: Axioms in the Definitions of Group and Related Concepts • Left and right distributivity: Va, 6, c 6 R, a ® (6 © c) — a ® 6 © a (8> c and (&ffic)<g>a = 6 ® a © c ® a . Therefore, a ring is a commutative group under addition and a semigroup under multiplication. 0 is called the additive identity element (of operation ©), and 1 is called the multiplicative identity element (of operation ®). Def in i t ion 2.9 (Semiring) Let S be a set. Let © and <g> be two closed binary operations defined on S. (5,©,®) is a semir ing if the operations satisfy the following axioms: • Additive associativity: Va, b,c € S, (a © 6) © c = a © (b © c); • Additive commutativity: Va, b £ S, a © b = b © a; • Multiplicative, associativity: Va, 6, c € 5, ( a ® 6 ) ® c = fl®(!)®c); • Le/t and ria/i£ distributivity: Va, i , c e S , a ® ( i i ® c ) = o 0 i ) $ a ® c and (b f f ic )®a = &<g>a©c®a. A semiring (5, ffi, ®) is a ring that need not have either additive (©) inverses or an additive identity element. In other words, a semiring is a commutative semigroup under addition and a semigroup under multiplication. Chapter 2. Background 15 Commutative Semiring { Semiring d 2 L e f t a n d Right'Distributivrty, Ring . . . ^Inverses.. ! s Addition v- Multiplication Figure 2.3: Axioms in the Definitions of Ring, Semiring, and Commutative Semiring Definition 2.10 (Commutative Semiring) A commutative semiring is a semiring that satisfies the following additional conditions: • Multiplicative commutativity: Va, b G S, a ® b = b ® a; • Multiplicative identity: there exists a multiplicative identity element 1 G S, such that a ® l = l ® a = a for any a G S; • Additive identity: there exists an additive identity element 0 G S, such that a © 0 = 0 © a = a for any a G S; A commutative semiring (5,©,®) is a commutative monoid under both ad-dition and multiplication. Figure 2.3 shows the relations among axioms of ring, semiring, and commu-nicative semiring. Here we define fc-semiring as a generalization of semiring with a ./-semiring corresponding to the definition of a semiring. Chapter 2. Background 16 Table 2.2: Summary of Different Commutative Semirings No. R © Add. Identity ® Multi. Identity 1 {true, false} V false A true 2 [0,1] max 0 min 1 3 [0,1] max 0 X 1 4 (-co, 0] max —oo + 0 5 [0,oo) max 0 + 0 6 [0,oo) + 0 X 1 7 [0, oo) max 0 X 1 8 (—oo, oo) min oo + 0 Def in i t ion 2.11 ( (Commutat ive) fc-Semiring) A (commutative) k-semiring is a tuple (S, opo,opi, • • •,opk) such that for each i 6 {1, • • •, k}, (S, opj,opi) is a (commutative) semiring for every j < i. The definition of (commutative) fc-semiring is sometimes too strong in prac-tice. For example, we may need only (S, opi-\,opi) to satisfy the (commutative) semiring properties. We do not necessarily need (S,opj,opi) to be a (com-mutative) semiring for V? < i always, although the strong definition provides computational flexibility in algorithm designs. ([0, oo),+, x) and ({0,1}, V,A) are all examples of commutative semirings. Table 2.2 shows further examples of commutative semirings. ([0, oo), max, +, x) is an example of a commutative ^-semiring. Furthermore, we say that 0 is a multiplicative absorbing element if a <8> 0 = 0 ® a = 0, and 1 is an additive absorbing element if a © 1 = 1 © a = 1, for any a £ S. We say that © is idempotent if a © a = a, and <8> is idempotent if a (g> a = a for any a € S. We notice that the idempotency of © defines a partial ordering < over the set S, i.e., a < b a © b = b for any a,b € S. Semiring and fc-semiring are enough to generalize the representation of a wide range of CBI problems. However, to characterize inference algorithms for these problems, we will use commutative semiring and commutative ^-semiring in the following chapters. The commutativity and identity elements provide many computational benefits for solving these problems. 2.4 Summary This chapter introduced the basic definitions and notions used in this thesis. Constraint-based inference problems can be graphically represented as primal graphs. Primal graphs can be reorganized as secondary structures, to achieve computational efficiency. The success of inference algorithms for specific problems is largely decided by the parameters of the underlying graphs. One of these graphical parameters is treewidth, or induced width. Computing treewidth, or finding the optimal triangulated graph, is known as AfV-hard [2]. Though some polynomial time Chapter 2. Background 17 approximation algorithms have been designed to compute treewidth with con-stant factors, they are not practical since the time complexities always contain some constant that is exponentially large. For practical scenarios, heuristic searches are used to estimate the upper bound of treewidth. In this chapter we briefly discussed several heuristic triangulation algorithms. Two other im-portant notions to characterize the graphical representation of CBI problems are the small world topology and scale-free networks. This chapter also defined and explained the parameters used to characterize the small world topology and scale-free networks. As powerful tools to generalize CBI problems, this chapter introduced alge-braic structures, and more specifically, semirings and commutative semirings. Basic definitions and axioms of abstract algebra were given. We also discussed the relations among these definitions. Chapter 3. A Semiring-Based Generalized Framework for CBI 18 Chapter 3 A S e m i r i n g - B a s e d G e n e r a l i z e d F r a m e w o r k for C o n s t r a i n t - B a s e d Inference 3.1 A Semiring-Based Generalized Framework A constraint-based inference problem is defined in terms of a set of variables with values in finite domains and a set of constraints on these variables. We use commutative semirings to unify the representation of constraint-based inference problems from various disciplines. Formally: Definition 3.1 (Constraint-Based Inference (CBI) Problem) A constraint-based inference (CBI) problem P is a tuple (X, D, R, F) where: • X = {X\, • • • ,Xn} is a set of variables; • D = {£>!, • • •, Dn} is a collection of finite domains, one for each variable; • R = (S, ffi, ® ) is a commutative semiring; • F = {fi, • • •, fr) is a set of constraints. Each constraint is a function that maps value assignments of a subset of variables, its scope, to values in S. Before defining various tasks for CBI problems, we define the two basic constraint operations as follows, using the two binary operation ® and © of the commutative semiring R. Please note that we use the same two symbols <g> and © to represent the two constraint level operations. The meaning of them can be easily distinguished given the context. Definition 3.2 (The Combination of Two Constraints) The combination of two constraints f\ and fa is a new constraint g = f\® fa, where Scope(g) — Scope(fa) U Scope(fa) and g(w) = / i (w 1 S c o p e ( / 1 ) ) ® / 2 ( w i 5 c o p e ( / 2 ) ) for every value assignment w of variables in the scope of the constraint g. Definition 3.3 (The Marginalization of a Constraint) The marginalization of X{ from a constraint f, where Xi € Scope(f), is a new constraint g = @x. /, where Scope(g) = Scope(f)\Xi andg{w) = ® X i € D o m a i n { X i ) / ( x f , w ) for every value assignment w of variables in the scope of the constraint g. Chapter 3. A Semiring-Based Generalized Framework for CBI 19 Definition 3.4 (The Inference Task for a CBI problem) Given a variable subset of interest Z C X, let Y = X \ Z, then the inference task for a CBI problem P = (X, D, R, F) is defined as computing: 5cB/ (^ ) = 0(g)/ (3.1) Y feF Given a CBI problem P = (X,D,R,F), if © is idempotent, we can define the assignment task for a CBI problem. Definition 3.5 (The Assignment Task for a CBI problem) Given a variable subset of interest Z C X, letY = X\Z, the assignment task for a CBI problem P = (X, D, R, F) is to find a value assignment for the marginalized variables Y, which gives the result of the corresponding inference task gcBi{Z). Formally, a solution to the assignment task is: y = arg0(g)/ (3.2) Y feF where arg is a prefix of operation ©. In other words, arg ffix/(x) returns x$ s.t. fix®) = ®xf(x)-If a total ordering of S exists, we can define the optimization task for a CBI problem as maximizing (or minimizing) the computed result of the correspond-ing CBI inference task by finding a value assignment to variables Z. Formally Definition 3.6 (The Optimization Task for a CBI problem) Given a variable subset of interest Z C X, letY = X\Z and R = (S, max, ©, ®) be a commuta-tive 2-semiring, then the optimization task for a CBI problem P = (X, D, R, F) is defined as computing: goPT = max I 0 (g) / I (3.3) V y f^F ) -The assignment task for an optimization task is then to compute: z = arg max 0(g)/ (3.4) \ Y feF J In general, ® is a combination operation for CBI problems that combines a set of constraints into a constraint with a larger scope; ©y = @x\z is a marginalization operation that projects a constraint over the scope X onto its subset Z, through enumerating all possible value assignments of Y = X \ Z. As discussed in Chapter 2, a CBI problem can be graphically represented by a primal graph, which is an undirected graph with variables as its vertices, Chapter 3. A Semiring-Based Generalized Framework for CBI 20 and there exists an edge between two vertices if the two corresponding variables appear in the scope of a same constraint. Also it can be re-organized as a junction tree or a factor graph to achieve computational benefits. Practically, sometimes there are observations of, or evidence about, the val-ues of some variables. In such cases, constraints with observed variables in their scopes can be modified through instantiation before performing the inference. Although there are many observation-handling techniques to accelerate the in-ference process, we will not incorporate them into our framework here. Another issue concerning practical CBI tasks is the normalization after the computation. In this thesis, we omit normalization constants since they are simple to define and compute in specific application scenarios. 3.2 Instances of the Semiring-Based Generalized Framework Many CBI problems from different disciplines can be embedded into the pro-posed generalized CBI framework. In this section we will briefly introduce these problems as instances of the framework. 3.2.1 Constraint Satisfaction Problems A constraint satisfaction problem (CSP) is denned over a set of variables X = {Xi, • • •, Xn}, a collection of domains D = {D\, • • •, Dn} for variables, and a set of constraints or evaluation functions F = {/ i, • • •, fr}, where each is denned on a subset of variables Si = Scope(fi) C X. The inference task for a constraint satisfaction problem is to find if there exists a value assignment that satisfies all the constraints. The assignment task for a constraint satisfaction problem- is to find such a value assignment or all assignments according to some criteria. Classic CSPs Classic CSP [40] is a class of constraint satisfaction problems such that all constraints have to be satisfied by a valid value assignment. Constraint fi of classic CSP maps possible value assignments of variables in Scope(ji) into logical true if the assignment satisfies the constraint, or into logical false if not. In terms of our semiring-based framework, a classic CSP is a tuple (X,D,R,F), where X, D, and F are defined before, and R = ({false, true}, V, A) is a commutative semiring, where V is the binary OR operation with false as the identity element and A is the binary AND operation with true as the identity element. Given Z = 0 , the inference task (decision problem) of a classic CSP is to compute: 9CSP = V A f Chapter 3. A Semiring-Based Generalized Framework for CBI 21 9CSP = true means there exists a value assignment that satisfies all con-straints. In such a case, the assignment task for classic CSP is to compute: x = arg \f f\ f X f€F Soft C S P s Constraints in classic CSPs are sometimes called hard constraints, which means that all constraints have to be satisfied simultaneously. However, in practice many problems are over-constrained. Several frameworks have been proposed to handle over-constrained problems, mostly by introducing soft con-straints that may be (partially) violated. These frameworks include: (1) the Max CSP framework that tries to maximize the number of satisfied constraints, which is equivalent to minimizing the number of violations; (2) the Weighted CSP framework that associates a degree of violation to each constraint and min-imizing the sum of all weighted violations; and (3) the Fuzzy CSP framework [52] that associates a preference to each value assignment of a constraint. We generalize these soft CSPs into our semiring-based unified framework using var-ious semirings, following the results of the Semiring CSP [6] and Valued CSP [53] frameworks. M a x C S P s and Weighted C S P s Sometimes we do not need all the con-straints to be satisfied at the same time. In Max CSPs, the assignment task is to find a value assignment that satisfies the maximum number of constraints. A Max CSP is same as a Classic CSP except that it is denned on a different commutative semiring R = ([0, oo), max, +). The inference task for a Max CSP is to compute: 9MaxCSP = max f And the assignment task for a Max CSP is to compute: x = arg max f A Weighted CSP is slightly different from a Max CSP in that satisfying different constraints has different weights, whereas weights are always the same for all constraints in a Max CSP. The same, semiring used in the Max CSP applies in the representation of the Weighted CSP in our semiring-based unified framework. Fuzzy C S P s Fuzzy CSP [52] extends Classic CSP by mapping all possible value assignments of each constraint into preference levels. The levels of prefer-ence are defined on some discrete numbers from 0 to 1, where 0 stands for not allowing such an assignment. A solution of Fuzzy CSP is a value assignment that has the maximum value of the minimum preferences of all constraints (max-min Chapter 3. A Semiring-Based Generalized Framework for CBI 22 value). In terms of the semiring-based unified framework, a fuzzy CSP is a tuple (X,D, R,F), where X, D and F are defined beforehand R = ([0,1],max,min) is a commutative semiring with 0 as the identity element of max, and 1 as the identity element of min. Given Z = 0 , the inference task for a Fuzzy CSP is to compute: 9FuzzyCSP = max min / X feF And the assignment task for a Fuzzy CSP is to compute: x = are max min f 6 X feFJ 3.2.2 P r o p o s i t i o n a l Sa t i s f iab i l i ty P r o b l e m s The propositional satisfiability problem (SAT) is a central problem in logic and artificial intelligence, which consists of a logical propositional formula in n variables. For fc-SAT, the formula consists of a conjunction of clauses, and each clause is a disjunction of at most k variables, any of which may be negated. For k > 3, these problems are NP-complete. SAT Like the constraint satisfaction problem, SAT can be abstracted in the semiring-based unified framework as a tuple (X, D, R,F): • X = {Xi, • • •, Xn} is a set of n variables; • D = {true, false} is the domain for each variable; • R = ({false, true}, V, A) is a commutative semiring, V is the binary OR operation with false as its identity element ; A is the binary AND oper-ation with true as its identity element. • F = {fi, • • •, fr} is a set of r clauses, : Dk —» {false, true}. Given Z = 0 , the inference task for SAT is to compute: 9SAT = V / \ / And the assignment task for SAT is to compute: x = arg \J f\ f x feF Chapter 3. A Semiring-Based Generalized Framework for CBI 23 M a x S A T Similarly, MaxSAT is defined as finding a value assignment for each variable that allows the maximum number of clauses to be true. A MaxSAT is the same with SAT except the definitions of R and F: • R = [0, oo), max, +) is a commutative semiring, 0 is the identity elements of both max and +• • F = {/ i , • • •, fr} is a set of r clauses, /* : {true, false}k —> {0,1}, k = \Scope{fi)\. Given Z = 0 , the inference task for MaxSAT is to compute: 9MaxSAT = max ] P / feF And the assignment task for MaxSAT is to compute: x = arg max ^ / feF. 3.2.3 Probabil i ty Inference Problems Probability inference problems can be seen as constraint-based inference by treating conditional probability distributions (CPDs) as soft constraints over variables. A belief network or a Bayesian network (BN) [49] is a graphical representation for probability inference under conditions of uncertainty. B N is defined as a directed acyclic graph (DAG) where vertices X = {Xi, • • • ,Xn} denote n random variables and directed edges denote causal influences be-tween variables. D = {Di,--,Dn} is a collection of finite domains for vari-ables. A set of conditional probability distributions F = {/ i , • • •, / „ } , where fi = P(Xi\Parents(Xi)) is attached to each variable (vertex) Xi. Then the probability distribution over X is given by P(X) = JliLi P{Xi\Parents(Xi)). The moralized graph of a B N is obtained by adding edges among the vertices with a common child vertex and removing directions of all edges. The moralized graph of a B N is the primal graph of the corresponding probability inference problem. Generally, there are three types of tasks in belief inference: probability assessment, most probable explanation, and maximum a posteriori hypothesis (maximum likelihood). Al l of these tasks can be generalized into our semiring-based framework. Probab i l i t y Assessment Probability assessment is a CBI problem over a B N , which computes the pos-terior marginal probability of a subset of variables, given values for some vari-ables as known evidence. Formally, a probability assessment problem is a tuple (X, D, R, F), where X, D, and F are defined in B N , and R = ([0, oo),+, x) is Chapter 3. A Semiring-Based Generalized Framework for CBI 24 a commutative semiring, where (©,0) = (+,0) and (®, 1) = (x, 1). Given Z as the variable subset of interests, the inference task for a probability assessment problem is to compute: 9BA(Z) = a ^ Y [ f x\zfeF where a is a normalization constant. Obviously ([0, oo), x) has the invertible property, making normalization feasible. M o s t Probab le Exp lana t ion ( M P E ) Most probable explanation is a CBI problem over a B N , which is to find a com-plete value assignment of variables that agrees with the available evidence and has highest probability among all such assignments. Formally, the most proba-ble explanation problem is a tuple (X, D, R, F), where X, D, and F are defined in the corresponding belief network, and R = ([0, oo), max, x) is a commutative semiring, where (©,0) = (max,0) and (®, 1) = ( x , l ) . Given Z = 0 , the inference task for a M P E is to compute: 9MPE = max JJ f feF The assignment task for a M P E is to compute: x = arg max JJ / X feF M a x i m u m A Poster ior i Hypothes is ( M A P ) A maximum a posteriori hypothesis problem is an optimization task over a B N , which finds a value assignment of a set of hypothesized variables to agree with the available evidence and has highest probability among all such assignments. Formally, a maximum a posteriori hypothesis problem is a tuple {X, D, R, F), where X, D, and F are defined in B N , and R = ([0,oo),max,+, x) is a com-mutative 2-semiring, where (max, 0) is defined over non-negative real values, (©,0) = (+,0), and = ( x , l ) . Given Z = {Z\, - • •, Zt} C X as a subset of hypothesized variables, the inference task for a M A P is to compute: 9MAP = max( ] T JJ /) x\z feF The assignment task for a M A P is to compute: z = argmax(]T JJ /) x\zfeF Chapter 3. A Semiring-Based Generalized Framework for CBI 25 3.2.4 Dynamic Bayesian Networks Many discrete-time stochastic processes can be graphically represented as dy-namic Bayesian networks (DBN) [13]. D B N is a powerful tool to model dynamic systems with N random variables X = {X\,; • • ,Xn}. Let X1 be the state vec-tor of X at time t. As an extension of Bayesian networks (BN), inference tasks can be performed over a DBN. Formally, a dynamic Bayesian network is defined [45] to be a pair (B\, B_.), where B\ is a B N which defines the prior P{Xl) , and B _ is a two-slice temporal Bayesian net (2TBN) which defines PiX^X1'1) as n PiX^X1-1) = rjP(X?|Parents(X/)) t=i Like representing B N , a D B N can be abstracted in the semiring-based frame-work as a tuple (X, D, R, F): • X1 = {X\, • • •, Xn} is a set of random variables at time t = 1, • • •, T; • D = {D\, • • •, Dn} is a collection of finite domains for each item in X\ • R = ([0, oo), +, x) is a commutative semiring, (©, 0) = (+, 0) and (®, 1) = ( x , l ) . • = {/{, • • • , /£ } where / / = P{X\\Parents(X\)) Given ZT = {Zj, • • •, ZT} as a subset of random variables of interests at time t — T, the inference task in a D B N is to compute the marginal of joint distribution: 9 D B N = E n n /* X\-,XT-\XT\ZT t=l / l 6 F ' 3.2.5 Decision-Making Problems The Decision Network (DN), or the influence diagram [27], is a graphical repre-sentation for decision-making problems. A decision network is a directed acyclic graph with three types of nodes: random nodes, decision nodes, and value nodes. Each random node represents a random variable, associating with a conditional probability table; each decision node represents a decision variable with a finite set of possible actions (or decisions); each value node associates with a utility function. The edge from a random node to a decision node is called an in-formation edge, and so a decision node Ai can be viewed as a random node. Given a value assignment of its parents nodes, Ai has a conditional probability P(Ai = a\Parents(Ai)) — 1 for an decisions G domain(Ai), and 0 for other decisions. The goal of the one-off decision-making problem in decision networks is to find such policies for each decision node that maximize the summation of ex-pected utilities. We do not integrate sequential decision-making problems [62] in decision networks into our framework and leave it in our future work. Chapter 3. A Semiring-Based Generalized Framework for CBI 26 In terms of our semiring-based framework, a one-off decision-making problem is a tuple (X,D,R,F): • X = CU A, where C — {Xi, • • • s X^} is a set of random (chance) variables and A = {Ak+\, • • •, An} = {X^+i, • • •, Xn} is a set of decision (action) variables; • D = {D i , • • •, Dn} is a collection of finite domains for each item in X; • R = ((—oo, oo), max,+, x) is a commutative 2-semiring, where —oo is the identity element of max, 0 is the identity element of +, and 1 is the identity element of x; • F = {u} where u = Yli=i ui 1S the summation of t utility functions, each Ui associating with a value node, u : Xn —» 7c. Given Z = A = {Ak+i, • • •, An} as a subset of variables of interest, the inference task for a one-off decision-making problem is to compute: n gDN = mzx^X\P{Xi\Parents{Xi))-u{A) (3.5) C i=l The assignment task for a one-off decision making problem is to compute: n a = arg max Yl I I P{Xi\Parents{Xi)) • u{A) C i=l 3.2.6 Possibility Inference Problems Possibility theory [61] is another approach to characterizing uncertainties. A possibility inference problem is defined over a set of variables X = {Xi, • • •, Xn}, a collection of discrete domains D = {D\, - • • ,Dn} for variables, and a set of constraints or possibility distribution functions F = •••,/»•}• Each /j specifies the degree of possibility of the configuration of variables in Scope(fi). The degree of possibility, or possibility potential, is a value in [0,1], representing non-possibility to full certainty. In possibility theory, a binary operation T is a function T : [0,1] x [0,1] —» [0,1], which satisfies the following conditions: • Boundary: Va G [0,1], ITa = a and OTa = 0; • Monotonicity: Va i , 0 2 , 6 1 , 6 2 € [0,1] s.t. ai < a 2 and 61 < 62, a i T 6 i < a2Tf)2 holds; • Commutativity: Va ,6 ,c £ [0,1], aTb = bTa and (aTb)Tc = a T ( 6 T c ) . Such a binary operation is also called a i-norm. There are many i-norms in possibility theory research. The most frequently used i-norms are: Chapter 3. A Semiring-Based Generalized Framework for CBI 27 • Product i-norm: aTb — a x b; • Godel's i-norm: aTb = min(a,b); • Lukasziewcz's i-norm: aTb = max(0, a + b — 1). The inference task for possibility inference is to find the maximum possibility of some variables and the corresponding configuration of these variables. In terms of the semiring-based unified framework, a possibility inference problem is a tuple (X, D, R, F) if the corresponding i-norm satisfies the commutative semiring properties: • X = {Xi, • • •,X n] is a set of random variables; • D = {£>!, • • •, Dn} is a collection of finite domains for each item in X; • R = ([0,1], max, T) is a commutative semiring, (©,0) = (raox, 0) and (®, 1) = (T, 1) where T is a i-norm and 1 is the identity element for the i-norm; • F = {fi, • • •, fr) is a set of possibility distribution functions. Given Z C X a s a subset of variables of interests and let Y = X \ Z, the inference task for a possibility inference problem {X, D, R, F) is to compute the marginal of joint possibility distribution: gposs = m a x T / s F / The assignment task for a possibility inference problem is to compute: y = a r g m a x T / e F / 3.2.7 Coordination Graph In decision-making problems, we sometimes assume that utility functions are additive, i.e., u(A) = Yl\=iui(Aj), then Eq. 3.5 can be re-written as: n t gDN = max^2Y[P(Xi\Parents(Xi) • (^2UJ(AJ)) c i=l j=l t n = m a x ^ ^ Y \ P { X i \ P a r e n t s { X i ) ) • UJ(AJ) c j=l i=l t n = max^^YlPiXilParentsiXifi-UjiAj) A j = l c t=l t = m a x ^ Q ^ ) (3.6) Chapter 3. A Semiring-Based Generalized Framework for CBI 28 where Qj(Aj) = 2~2c 117=1 P{Xi\Parents(Xi)) • Uj is a utility function that depends only on a subset actions Aj C A, but does not depend on random variables. After computing Qj(Aj), the task is transformed to find an equilib-rium of a coordination game, which requires finding a decision or action from a finite set of possible actions for each player, to maximize the summation of players' utilities. Such a reduced problem is formalized by Guestrin et al. [25] and represented by a Coordination Graph. Def in i t ion 3.7 A Coord ina t ion G r a p h for a set of players with local utili-ties {Qi , • • • ,Qn} is i directed graph whose nodes are the actions for the play-ers {Ai, • • •, An), and there exists an edge from Ai to Aj if and only if Ai £ Scope(Qj). The task for finding an optimal joint action in a coordination graph can be easily embedded in our semiring-based framework as an assignment task for a CBI problem (X, D, R, F) where: • X = A = {Ai, • • •, At} is a set of decision (or action) variables; • D = {Di, • • •, Dt} is a collection of finite domains, each for a variable; • R = ([0, oo), max,+) is a commutative semiring, (©,0) = (max, 0) and (®,1) = (+,0); • F = {Qi, • • •, Qt} is a set of utility functions, Qi : Dtl x • • • x D i k —> S, where Scope(Qi) = {A^, • • •, Aik}. The inference task for computing optimal joint action in coordination graph is just to compute Eq. (3.6). The assignment task for a coordination graph is ' then to compute: t a = a r g m a x £ Q j ( A i ) 3.2.8 M a x i m u m Likelihood Decoding In digital communication, decoding is an important step in recovering the orig-inal information transmitted through a discrete memoryless channel. How to decode information efficiently and exactly is always a central problem in coding theory. Maximum Likelihood Decoding (MLD) is one widely used approach. Semiring-based general massage-passing approach for M L D were first proposed in [1]. Suppose a codeword Z = { Z i , • • •, Zn} is transmitted over a discrete mem-oryless channel. A vector Y = {Yi, • • • ,Yn} is received. The likelihood of the codeword Z is then: n P{Z\Y) = \{P{Zi\Yi) Chapter 3. A Semiring-Based Generalized Framework for CBI 29 where P(Zi\Yi) is the transition probability of the channel. For computational convenience, sometimes we compute the log-likelihood: • iog(P(z|y)) = ^ - i o g ( P ( z i | y i ) ) It is equivalent to minimizing the log-likelihood decoding. We use hi to denote -logfPCZilYO). The M L D problem is to find the codeword Z that maximizes the likelihood. In addition, different linear block codes define a set of indicator functions x = { x i i ' ' • , Xfc} to specify if a subset of Z is a valid part of the codeword. An indicator function Xi Is defined as: , _ 7 \ _ / 0 : {Zn, Zik) is a part of the codeword; ^ • - ^ - ( o o : otherwise In terms of the semiring-based unified framework, a maximum likelihood decoding problem is a tuple (X, D, R, F), where: • X = Z U Y is a set of 2n variables; • D = {Di, • • •, Dn) is the domains for each Zt and Y*; • R = ([0, oo), min, +) is a commutative semiring, (©,0) = (min, oo) and (® , l ) = (+,0); • F = // U x is a s e t of constraints where = {/ii, • • •, hn} and x = {Xi, • • - ,Xk}. The inference task for a M L D is to compute: 9MLD = nun ^ And the assignment task for a M L D is to compute: z = arg mm ^ hi Y x i hiGH XjGx 3.3 Summary This chapter presented a semiring-based unified framework to generalize the representation of CBI problems from different fields. The general multiplication operation ® denotes the combination of constraints over local variables. The general addition operation © denotes the marginalization of a large constraint into a smaller scope. Through specifying variables and local constraints, CBI problems are generalized as computing a general sum-of-products in semirings. Chapter 3. A Semiring-Based Generalized Framework for CBI 30 We discussed many concrete CBI problems in this chapter as instances of the proposed semiring-based unified framework. These problems cover a wide range of research fields like probabilistic inference, decision-making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference. We will show in following chapters that the abstract representation provides a powerful tool for studying and analyzing inference algorithms for these problems. The generalized representation, together with abstract level algorithm analyses, will eventually improve the concrete inference approach design for specified CBI problems. Chapter 4. Exact Inference Algorithms 31 Chapter 4 E x a c t Inference A l g o r i t h m s The distributivity and the commutativity properties of a commutative semiring R = (S, ffi, ®) provide computational opportunities for solving constraint-based inference problems more efficiently. Comparing the computation costs (in terms of the number of operations) of the two tasks listed in Table 4.1, we will find that the two computation tasks are equivalent if distributivity holds. Both tasks need n — 1 © operations, however, Task 2 only needs one ® operation, compared to n ® operations in computing Task 1. Table 4.1: Computation Costs of Two Equivalent Tasks Task Formula To Compute # of © Operations # of ® Operations 1 ( a ® O x ) © , • • •, © ( a ® 6 n) n - 1 n 2 a ® (6i®, • • •,©&„) n - 1 1 The basic idea behind these two computation tasks is: different computation sequences lead to different costs, given the same computation task defined by a commutative semiring. The distributivity of the two operations of a semiring provides possibilities to change the computation sequences equivalently, which makes finding a lower computation cost, for the task possible. Many inference algorithms have been developed independently, explicitly or implicitly exploiting the distributivity of commutative semirings. The key idea of these algorithms is to rearrange the computation sequences of the task solutions. In Chapter 3 we generalize these algorithms into two categories: variable elimination algorithms and junction tree algorithms. 4.1 Variable Elimination Algorithm 4.1.1 G e n e r a l i z e d V a r i a b l e E l i m i n a t i o n A l g o r i t h m The basic idea behind variable elimination (VE) algorithms, or bucket elimi-nation algorithms, comes from non-serial dynamic programming [4]. Since dy-namic programming algorithms work by eliminating variables one by one while computing the effect of each eliminated variable on the remaining problem, they can be viewed as variable elimination algorithms. Formally, given a CBI problem (X, D, R, F) and a variable subset of interest ZQX,\etY = X\ Z and a =< Y\, • • •, Yfc > be a permutation of elements in Y. The ordering a is decided by the variable elimination algorithm. Let Fyk Chapter 4. Exact Inference Algorithms 32 be the subset of constraints in F whose scopes contain VV Let Fyk be F \ Fyk. Then Eq. 3.1 can be re-written as: 9CBI{Z) = 0(g)/ X\Z f€F = ©0/ Y f€F = © © (<gj / <8> /) = © <8> / ( © <8 /) Y\{Yk}f€FVk \{Yk}feFyk J = © (8) / ® / ' (41) y\{y fc}/eFv fc where / ' = ©y. fe ®fepY f is a new constraint that does not contain variable Ffc in its scope. In Eq. 4.1, variable is eliminated. In another words, now the inference task for the CBI problem does not depend on Yk-The generalized variable elimination algorithm follows the same procedure, eliminating variables recursively one by one, according to some ordering of these variables. See Fig. 4.1 for the description of our generalized variable elimination algorithm for the CBI inference task, abstracted from the variable elimination algorithm [64] in probability inference. Another revision of the generalized-VE algorithm is the generalized bucket elimination algorithm, which is abstracted from the bucket elimination algorithm [14] in the probabilistic inference field. See Algorithm 4.2 for details. The difference of these two algorithms is in the implementation, where the bucket elimination algorithm requires sorting constraints before the elimination of variables. A common concern of applying V E algorithms is how to find an optimal elimination ordering to minimize the computation cost. Finding an optimal elimination ordering, which is equivalent to finding an optimal triangulated graph or finding the treewidth, the minimum width of all tree decompositions, is MV-hard [2]. Several heuristic search methods discussed in Chapter 2 can be used to generate elimination orderings. We will show their empirical effective-ness in Chapter 7. • For the assignment task for a CBI problem, backtracking approaches can be applied. A l l intermediate constraints during the elimination procedure are cached. The final computation value of gcBi(Z) is used as an index to track valid value assignments in the intermediate constraints. To apply V E algorithms, commutativity of (g> is required; for the CBI in-ference task that enables us to rearrange the combination sequence of local functions. The commutativity of © is a desired property as well; that enables Chapter 4. Exact Inference Algorithms 33 Algor i thm 4.1 Generalized Variable Elimination Algorithm (GVE) for a CBI Inference Task Input: A CBI problem (X, D, R, F) and a variable subset of interest Z Output: gCBi{Z) = 0 X _ Z ® / e F / 1: Let Y = X \ Z 2: Choose an elimination ordering cr =< Y\, • • •, Yt > of Y 3: for i = k to 1 do 4: F' := 0 5: for each / € F do 6: if Yi £ Scope{f) then 7: F' := F ' U {/} 8: F : = F \ { / } 9: end if 10: end for 11: / ' := ®Yi <8W< / 12: F : = F U { / ' } 13: end for 14: Return gcBi(Z) := 0 / 6 F / Algor i thm 4.2 Generalized Bucket Elimination Algorithm (GBE) for a CBI Inference Task ; Input: A CBI problem (X, D, R, F) and a variable subset of interest Z Output: gcBi(Z) = © X - Z 0 / S F / l: Choose an elimination ordering Y =< Y i , • • •, Y& >, where Y = X \ Z\ 2: Initialize k + l empty buckets B = {BQ, B\, • • •, Bk], Bi = 0; 3: for i — 1 to r do 4: if Y n Scope(fi) = 0 then 5: BQ:=BQ\j{fi}; 6: else 7: Find the variable Yt £ Y 0 Scope(fi) with maximum index i; 8: B t := B t U {/<}; 9: end if 10: end for 11: for i = k to 1 do 12: / ' : = 0 y i ( 8 ) / , . . B 4 / i 13: if Y n Scope(f') = 0 then 14: B 0 := J50 U • { / ' } ; 15: else 16: Find a variable Yt £ Y D Scope(f') with highest order; 17: S T : = B T U { / ' } ; 18: end if 19: end for 20: Return gCBi{Z) := (8>/sB0 / Chapter 4. Exact Inference Algorithms 34 us to exploit different elimination orderings. Identity elements for the two opera-tions are not required, so we can relax the requirement of commutative semiring here. Then we can solve the problem using the concrete V E algorithm through instantiating our generalized algorithms. 4.1.2 Generalized V E Algor i thm for k-Semiring An extension of the generalized-VE algorithm works with CBI problems defined on a commutative fc-semiring (S, opo, op\, • • •, opk)- In such cases, we repeatedly use the generalized-VE for the last two operations, then replace F by the new marginalized constraints1. Details of the generalized V E algorithm for CBI problems defined on a commutative fc-semiring are shown in Algorithm 4.3. A l go r i t hm 4.3 Generalized V E Algorithm for k-Semiring (kGVE) for a CBI Inference Task Input: A A-semiring-based CBI problem (X, D, R, F) where R = (S, opi,op2, • • • ,opk+i) is a k-semiring, and Z = {Z\,--,Zk} , Zi C • • • C Zk is a set of variable subsets of interest Outpu t : gkCBi(Z) = opix_Zlop2X-z2- •' °Pkx-zk°Pk+ij=ifj 1: for m = k to 1 do 2: F := GVE(X, D, Zm, (5, oPm, oPk+1), F) 3: end for 4: 9kAR{Z) ••= Opk+lf£Ff 4.1.3 Instances of Variable El iminat ion Algor i thm Many concrete algorithms from different fields can be seen as instances of vari-able elimination. For classic CSPs, Seidel [54] proposed a variable elimination algorithm in the early 80s. Later, the Adaptive Consistency (AC) algorithm [16] was proposed. It works by eliminating variables one by one, while deducing the effect of the eliminated variable on the rest of the problem. Adaptive Consistency algorithm can be easily understood as a V E algorithm through instantiating the semiring {S, ©, ®) by ({false, true}, V , A) in our generalized V E algorithm. For propositional SAT problems, directional resolution is the core of the well-known Davis-Putnam (DP) algorithm [18]. The basic idea of directional resolution is exactly variable elimination. We can get the concrete DP algo-rithm through instantiating the generalized V E algorithm through using semir-ing ({false, true}, V, A) as well. For probabilistic inference, the variable elimination algorithm [64] and the bucket elimination algorithm [14] are the first two V E algorithms that are widely studied in tackling inference tasks for probability assessment, M P E , and interestingly, another variable elimination process eliminates the second last operation repeatedly. Chapter 4. Exact Inference Algorithms 35 M A P problems. The concrete algorithms can be obtained through instanti-ating the generalized V E algorithm through using semiring ([0, oo),+, x) or ([0,oo),max, x). For decision-making problems, Zhang [63] reduced the influence diagrams, the graphical representations of the decision-making problems to Bayesian net-works; the variable elimination algorithm for probabilistic inference in B N is then applied to solve decision-making problems. For maximum likelihood decoding of coding theory, Viterbi decoding [57] is an instance of variable elimination, which is defined on the semiring ([0, oo), max, x In [25], -a variable elimination algorithm was proposed to find an optimal joint action in the coordination graph. The algorithm was applied in RoboSoccer successfully [36], incorporating other techniques like discretizing the continuous domain and introducing role assignments. We can design other concrete V E algorithms for problems that can be ab-stractly represented by the.semiring-based unified framework, through instanti-ating different semirings. In general, variable elimination is a variant of dynamic programming, which is the key idea of these algorithms. 4.1.4 Space and T i m e C o m p l e x i t y Given a CBI problem (X, D, R, F), the space and time complexities of the gen-eralized variable elimination algorithm are discussed in terms of the following: • G = (V, £): primal graph representation of CBI problem; • d = max(\Di\), Di £ D: maximum domain size of variables; • w. width of the tree decomposition of Q along the elimination ordering; • r = |F | : number of constraints; • re number of constraints with Vj in their scopes; Theorem 4.1 (Space Complexity of VE) The space complexity of the general-ized variable elimination algorithm is 0( |V| + dw+1). Proof : To eliminate a variable Vi, all constraints with Vi in their scopes are combined. Because \Neighbors(Vi)\ < w, the size of the combined constraint is at most dw+1. We eliminate variables sequentially and reuse the space. So the total size required of the generalized V E is 0(|V| + dw+1). • Theorem 4.2 (Time Complexity of VE) The time complexity of the generalized variable elimination algorithm is 0(r • dw+l). Proof : To eliminate a variable Yj S Y = X \ Z, we need at most t;((8>) = (rj - 1) • dw+1 ® operations, and at most tj(ffi) = dw+1 © operations. So in total we need at most T((g>) = __ i= i ^ (® ) = (r - |V|) • dw+l, and T(©) = __i=i £,((§>) = | V| • dw+l. If in the semiring R, © and ® have the same processing times, the total time of the generalized V E is 0(r • dw+1). • Chapter 4. Exact Inference Algorithms 36 4.1.5 Discussion of Applying the V E Algorithm The variable elimination algorithm is essentially a dynamic programming ap-proach, which exploits the semiring structure of a constraint-based inference problem. The inference task is finished through repeatedly combining local constraints with a common variable in their scopes, caching the result of the marginalization as a new constraint, and posting the new constraint to refine the solution of the original problem. In theory, all CBI problems with semiring-based structures can apply the scheme of the generalized variable elimination algorithm. The only difference between one problem and another is in the dif-ferent implementations of combination and marginalization. In the application of V E algorithms, the identity elements of combination and marginalization operations are not necessary. The commutativity of combi-nation is not a necessary property but desired. If commutativity does not hold, the only possible elimination ordering is decided by the representation of the problem. Otherwise, we could use heuristic or approximate techniques to find a near-optimal elimination ordering to reduce computation costs. According to the complexity analysis, both the space and time complexities of the generalized V E algorithm are linear in the size of the task, but expo-nential in the width of a tree decomposition of the corresponding primal graph representation. For a large scale CBI problem with a complex graphical rep-resentation, treewidth (the lower bound of widths of all tree decompositions of the problem) is often intractable, which makes the direct application of the V E algorithm infeasible. There are basically two ways to overcome it: the first one is to seek approximation inference solutions in terms of some criteria; the sec-ond is to incorporate value properties of the variables and exploit the structure properties of the problem. The approximation inference will be discussed in Chapter 5. The improvement of V E algorithms from the value point of view will be discussed briefly in Section 8.2. 4.2 Junction Tree Algorithm 4.2.1 Generalized JT Algorithm The major motivation that prompts researchers to use junction tree algorithms to solve CBI problems is handling multiple queries. The structure of a given CBI problem always remains unchanged, whereas the variables of interest may be frequently changed according to the requests of users. Junction tree algorithms can share intermediate computational results among different queries, which is an advantage relative to variable elimination algorithms. In fact, junction tree algorithms can be seen as memorized dynamic programming [9], where solutions for subproblems are memorized for later use. A junction tree is a structure that efficiently divides the original problem into subproblems. Given a CBI problem P = (X,D,R,F), the primal graph Q = (V,£) of P can be re-organized as a secondary structure, junction tree T = (C,S), by triangulating the primal graph and identifying the maximal cliques. A junction Chapter 4. Exact Inference Algorithms 37 tree is a tree decomposition of the original graph Q. A junction tree is optimal if the maximum clique size (minus 1) is equal to the treewidth of Q. Finding an optimal junction tree of the given graph, equivalent to finding the treewidth, is ATP-hard in general. Various heuristics search algorithms or approximation algorithms can be applied to find junction tree decompositions with near-optimal width. Let C = {Ci, • • •, Cp) be a set of clusters, where Cj C V, and <S = {Si , • • •, Sq} is a set of separators. Here we also use Ci and Sj to denote a subset of variables X or vertices V. S, = C; n Cr if Q and C r are connected by separators Sj. Then we say T = (C, S) is a junction tree for a CBI problem P = (X, D, R, F) if the following three properties hold: • Singly connected property: T is a tree; • Constraint allocation property: For any constraint /j £ F, there are some clusters Cj such that Scope(fi) C Cj\ • Running intersection property: If a vertex (or variable) appears in two clusters Cj and Cj, then it also appears in all clusters on the unique path between Ci and Cj. In general, junction tree algorithms assign constraints to clusters and com-bine constraints in the same cluster. The combined constraint is marginalized and passed as a message between clusters. Following a specified message-passing scheme, the junction tree reaches consistency and any variable of interest can be queried through marginalizing out other variables in the cluster that contains that variable. Formally, the generalized junction tree algorithm for a CBI prob-lem is shown in Algorithm 4.4, which is abstracted from the message-passing scheme in the Shenoy-Shafer architecture [55] of probabilistic inference. One interesting case of applying the JT algorithm is that the variable subset of interest Z in query is not contained in a single cluster. One naive way to overcome it is adding a clique of Z before constructing the junction tree. However, such an approach hurts the flexibility of the JT algorithm to handle multiple queries. A more elegant solution is to find a subtree Tz = (Cz,Sz) of T = (C,S), where Z C C = Ucecz ^- The answer to the query is computed as the marginal of the combination of all the local constraints together with all the incoming messages of the subtree Tz. Formally: gcBi(Z) := 0 0 (4>Ci-® 0 m(Cj,Ci)) (4.2) C \ Z d € C z Cj&Neighbors(Ci),CiiCz The basic idea of Eq. 4.2 is to treat the subtree Tz as a virtual cluster and compute the marginal as treating normal concrete clusters. We can modify the generalized JT algorithm to solve the assignment task for a CBI problem. After computing gcBi(Z) in some cluster C j , backtracking is applied to variables contained in C,. After a value assignment for these variables is decided, the assignment is passed to all neighboring clusters of C j . The Chapter 4. Exact Inference Algorithms 38 Algor i thm 4.4 Generalized JT Algorithm (GJT) for a CBI Inference Task Input: A junction tree T = (C, <S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: 9CBI{Z) = © x \ z <8>f€Ff 1: Attach to each cluster C i a constraint (fid — 1 2: for each / £ F do 3: Find a cluster Ci such that Scope(f) C C* 4: ^ C i := <fid ® / 5: end for 6: for each Edge which is from clusters d to C j do 7: if Ci has received messages from all its neighbors other than Cj then 8: Ni„j := Neighbor{d) \{Cj} 9: % m(Ci,Cj) is the message sent from C i to CJ; 10: m(Ci, Cj) := 0 C i \ s y ( f o , ® m(Cj, Ci)) 11: end if 12: end for 13: for each d £ C do 14: if Z C C i then 15: 0 C i : = 0 C i ® <8>C,eAfeifl/i6<,r(C() m ^ C i ) 16: Return gcBi(Z) : = © C i \ _ 0 C i 17: end if 18: end for procedure recursively instantiates values of variables according to the messages passing from the instantiated clusters to the un-instantiated ones. Following the result of [55], commutativity of both © and ® is required, which ensures the correctness and completeness of the JT algorithm. The message-passing scheme in the generalized junction tree algorithm can be seen as consisting of two passes. By choosing any cluster as the root of the tree, the first pass is from the root to leaves and the second pass is from leaves to the root. We can use several message-passing scheduling techniques, such as parallel or hybrid computing, to improve the efficiency of message-passing. The message-passing scheme in Algorithm 4.4 is sometimes called Shenoy-Shafer architecture, following the authors' names of [55]. To apply junction tree algorithms, the commutativity of either © or ® is required, which ensures the correctness and completeness of the JT algorithm. The identity elements of the two operations are required as well. Some revised versions of the junction tree algorithm require ® has the combination invertible property, which makes the message-passing scheme more efficient, though in our generalized JT algorithm it is not mandatory. In practice, many problems are graphically represented as disconnected junc-tion trees. We can add an arbitrary edge between two clusters in different subtrees and assign an empty separator set to the new edge. The empty sep-arator set between subtrees separates the message passing, as well as satisfies Chapter 4. Exact Inference Algorithms 39 the junction tree properties. Now the problem can be solved by the proposed generalized junction tree algorithm. Another class of problems can not be ef-ficiently re-organized as junction trees due to the intractable maximum cluster sizes. One revision is to use junction graphs. In such a case, the generalized JT algorithm does not work directly since cycles in the junction graph pro-hibit message-passing being terminated. However, some experimental results [1] show that we can get a high quality approximation through using the itera-tively (loopy) message-passing scheme. 4.2.2 Generalized JT Algorithm for /^Semiring Given a CBI problem defined on the commutative fc-semiring (5, opo, opi, • • •, opt), we can solve the problem by attaching k constraints 4>c. to each cluster Ci, where 1 < j < k. Therefore, we have k messages to pass to neighboring clusters of Ci, each corresponds to marginalize </>^ by opj_j. The constraints are up-dated by combining all received messages through using opj. Each constraint for a cluster is initialized as the identity element of the corresponding operation. At the beginning of the-algorithm, cf>^ = f^opj+i(f>Q+l\ where / ^ denotes the combination of constraints attached to the constraint 4>cr of Cj. It is actually the basic idea behind the junction tree algorithm for solving decision-making problems in influence diagrams by Jensen et al. Details of the generalized JT algorithm for CBI problems defined on fc-semirings are given in Algorithm 4.5. 4.2.3 Generalized JT Algorithm for Idempotent Semirings The setup of the generalized junction tree algorithm in Algorithm 4.4 requires semirings of CBI problems be to commutative under both the combination and marginalization. The identity element of the combination is required though the identity element of marginalization is not necessary. According to these observations, any CBI problem with commutative semiring representation can be solved by instantiating the generalized JT algorithm. In some CBI problems, the corresponding semirings have idempotency properties under combination operations, which provide possibilities for improving computational efficiencies of the generalized JT algorithm. A commu-tative semiring (R, ©, ®) is idempotent under combination if Va G R, a® a = a. We call such a semiring an idempotent semiring, without explicitly mentioning that the idempotent property is actually for the combination operation. Since combination corresponds to information (or constraint) gathering, the idempotency of combination implies that repeatedly combining the same in-formation will not produce new information. Furthermore, considering the marginalization of a set of information m = ©^ m* in an idempotent semir-ing, we get m = m ® m = m® ©^ mi = ©^ (m ® rrii). This induction implies m = ©^m, = ©j (m®mj) , which means that combining the marginaliza-tion with original information will not produce new information after another Chapter 4. Exact Inference Algorithms 40 Algorithm 4.5 Generalized JT Algorithm for k-Semiring (kGJT) for a CBI Inference Task Input: A junction tree T = {C,S) of k-semiring-based CBI problem (X,D,R,F), where R = (S, op\, op2, • • •, opk+i) is a k-semiring. Z = {Zi, • • •, Zk) , Z\ C • • • C Zk are variable subsets of interest and Zk is contained in at least one cluster of T. Output: gkCBi(Z) = opix\Ziop2x\z2-• • opkx\zk°Pk+irj=Jj 1: Build k empty function buckets F j , each corresponds to opi+\ 2: for each / e F do 3: if / is a factor of opi+\ then 4: Allocate / to Ft 5: end if 6: end for 7: for each Cj £ C do 8: Attach Cjwith a constraint 4>d 9: Initialize 4>d to identity element of opk+i 10: end for 11: for m = k to 1 do 12: for each /j £ F m do 13: Find a cluster Cj such that Scope(fi) C Cj 14: 4>Cj ••- 4>CjOPk+lfi 15: end for 16: for each Edge 5^ which is between clusters Ci and Cj do 17: if Ci has received messages from all its neighbors other than Cj then 18: Ni-j := Neighbor(Ci) - {Cj} 19: m(Ci,Cj) = opmCi\Si.opk+iCieNi_jm(C[,Ci)opk+i(f>ci 20: end if 21: end for 22: for each d £ C do 23: := op„ + i C | e N ._ j .m(Q ,Ci)opfc+i(/>c i 24: end for 25: end for 26: Find a cluster Cj such that Z\ QCi 27: Return gkCBi{Zt) := opic.\Zi4>Ci Chapter 4. Exact Inference Algorithms 41 marginalization. The observation above shows that for an idempotent semiring, we can ignore the double-counted1 messages in the generalized junction tree algorithm, without loss of correctness. In Algorithm 4.6, the generalized JT is revised to cope with idempotent semirings. Algorithm 4.6 Generalized JT Algorithm for Idempotent Semirings (GJT-Idemp) for a CBI Inference Task Input: A junction tree T = (C, S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: gcBi(Z) = 0X\Z<8>/£F/ l: Attach each cluster C^with a constraint 4>ct = 1 2: for each / € F do 3: Find a cluster Ci such that Scope(f) C d 4: (j>Ci •= 4>d ® / 5: end for 6: Choose an arbitrary cluster R G C as the root of T 7: for each Cluster Cj {Upward Phase} do 8: if A l l messages from d's children clusters are ready then 9: 4>d •= 0 C < ® «8>C f ceC/nldren(C0 m(Ci> Ck)) 10: if Ci 7^ R then 11: Cj := Parent(d) 12: m{d,Cj)~ ®ci\Sa<t>ci 13: end if 14: end if 15: end for 16: for each Non-leaf cluster C, {Downward Phase} do 17: if message from d's parent clusters is ready then 18: if Ci ^ R then 19: </f>Ci := 4>Ci ®m{Ci,Parent{d)) 20: end if 21: for each Cj € Children{Ci) do 22: m(d,Cj) := © C A S < . 0c, 23: end for 24: end if 25: end for 26: Find a cluster Cj such that Z C d 27: Return gCBi{Z) := © c x z ^ c . 4.2.4 Generalized JT Algorithm for Invertible Semirings Some CBI problems are denned on semirings with invertible properties un-der combination operations, which provide possibilities for improving the generalized JT algorithm. We call such a semiring an invertible semiring, with-Chapter 4. Exact Inference Algorithms 42 out explicitly mentioning that the invertible property under the combination operation. A commutative semiring (R, ©, ®) is an invertible semiring if Va £ R, B a _ 1 £ R, s.t. a ® a - 1 = a - 1 ® a = 1. The binary operation 0 of i? is then defined as: Va, b £ i i , a 0 6 = a ® b~l Generalized JT Algorithm for Invertible Semirings The generalized JT algorithm can be modified to use the combination invertible property, as listed in Algorithm 4.7. The basic idea here is caching the combina-tion of all incoming messages and dividing the specific incoming message to get the outgoing message for that separator. Dividing here eliminates duplicated information from the combined message. Lauritzen-Spiegelhalter (LS) architecture and HUGIN architecture There are another two message-passing schemes for semirings with combination invertible properties: the Lauritzen-Spiegelhalter (LS) architecture [39] and the H U G I N architecture [29]. We generalize the JT algorithm for LS architecture in Algorithm 4.8 and the JT algorithm for H U G I N architecture in Algorithm 4.9. The major differences between LS and H U G I N architectures are: (1) The reverses operation is done in the domain of separators in H U G I N but in the do-main of clusters in LS. A small separator domain size enables H U G I N to achieve more time-efficiency than LS architecture; (2) H U G I N architecture requires ad-ditional storage for the constrains attached to separators, whereas LS architec-ture passes messages on the fly (do not need to be stored). A detailed discussion of the time and space complexities of Shenoy-Shafer architecture,' Lauritzen-Spiegelhalter (LS) architecture, and HUGIN architecture can be found in the following sections. 4.2.5 Instances of Junction Tree Algor i thm Many concrete algorithms from different fields can be seen as instances of the generalized junction tree algorithm. Junction tree algorithms have been widely used and studied in the proba-bilistic inference community since the late 1980s. Lauritzen and Spiegelhalter [39] developed a message-passing scheme in probabilistic inference based on clustered tree structures, which exploits the combination invertible property of semirings ([0, 0 0 ) , + , x) and ([0,00), max, x). Shenoy and Shafer [55] indepen-dently developed another general message-passing scheme that does not rely on the combination reverse property. Such a scheme is general enough to be applied to CBI problems with any type of commutative semirings. Lately, Jensen et al. [29] revised the message-passing scheme and developed H U G I N architecture for probabilistic inference problems. Chapter 4. Exact Inference Algorithms 43 Algorithm 4.7 Generalized JT Algorithm for Invertible Semirings (GJT-Inv) for a CBI Inference Task Input: A junction tree T = (C, S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: 5CB/(^) = 0 X \ Z ® / € F / l: Attach each cluster C,with a constraint (fid — 1 2: for each / £ F do 3: Find a cluster d such that Scope(f) C d 4: 0 C i := (fid ® f 5: end for 6: Choose an arbitrary cluster R £ C as the root of T 7: for each Cluster Cj {Upward Phase} do 8: if A l l messages from C,'s children clusters are ready then 9: <fid := 4>d ® (<g>Ck€Children(d) m ^ Ck)) 10: if d ^ R then 11: Cj :— Parent(d) 12: m{CUCj) := © C A S y ^ C i 13: end if 14: end if 15: end for 16: for each Non-leaf cluster C, {Downward Phase} do 17: if message from CVs parent clusters is ready then 18: if d ± R then 19: (fid •= 4>d ® m{d, Parent{Ci)) 20: end if 21: for each Cj £ Children(Ci) do 22: m(d, Cj) := © C A S y 0 m(Cj , C*)) 23: end for 24: end if 25: end for 26: Find a cluster Ci such that Z C d 27: R e t u r n . o C B / ( ^ ) : = © c i\z0c i Chapter 4. Exact Inference Algorithms 44 Algorithm 4.8 Generalized JT Algorithm for LS Architecture (GJT-LS) Input: A junction tree T = (C, S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: gAR{Z) = (&x\z <8>f€F f 1: Attach each cluster Cjwith a constraint 4>Ci = 1 2: for each / £ F do 3: Find a cluster Ci such that Scope(f) C Cj 4: 4>Ci ••= 4>d ® / 5: end for 6: Choose an arbitrary cluster R G C as the root of T 7: for each Cluster Ci ^ R {Upward Phase} do 8: if Ci has received messages from all its children clusters then 9: C j := Parent(d) 10: m{Ci,Cj) := 0 C A S y 0C< 11: 0c, := 4>Cj (Smid^j) 12: (fid <f>Ci0rn(d,Cj) 13: end if 14: end for 15: for each Non-leaf cluster Ci {Downward Phase} do 16: if Cj has received messages from its parent cluster then 17: for each Cj € Children(Ci) do 18: m(d,Cj) := © C A S i j <f>ct 19: 4>Cj •- (t>Cj ®m{Ci,Cj) 20: end for 21: end if 22: end for 23: Find a cluster Cj such that Z C Cj 24: Return gcBi(Z) := (&Ci\z ^Ci Chapter 4. Exact Inference Algorithms 45 Algor i thm 4.9 Generalized JT Algorithm for HUGIN Architecture (GJT-HUGIN) Input: A junction tree T = (C, <S) of a CBI problem (X, D, R, F) and a variable subset of interest Z Output: gAR{Z) = (&X\Z <S>feF f 1: Attach each cluster Cjwith a constraint 4>ct = 1 2: for each / e F d o 3: Find a cluster Ci such that Scope(f) C C* 4: 4>Ci := 4>Ci®f 5: end for 6: Choose an arbitrary cluster R e C as the root of T 7: for each Cluster Ci^ R {Upward Phase} do 8: if Ci has received messages from all its children clusters then 9: Cj := Parent(Ci) 10: <j>Sij = m{Ci, Cj) := © C A S i . 4>Ci 11: </>c := 4>Cj'®Tn{Ci,Cj) 12: end if 13: end for 14: for each Non-leaf cluster C, {Downward Phase} do 15: if Ci has received messages from its parent cluster then 16: for each Cj G Children(Ci) do 17: T n ( C i , C i ) : = © c A S „ ^ c 4 18: <£C j := ® (m(Ci, Cj) 0 < £ S i . ) 19: 0 S y :=m{CuCj) 20: end for 21: end if 22: end for 23: Find a cluster C, such that 'Z C Cj ~ 24: Return gCBi{Z) := ®Ci\z ^ Chapter 4. Exact Inference Algorithms 46 The application of junction tree algorithms in constraint satisfaction prob-lems can be traced back to [17]. A message-passing algorithm on the tree clus-tering structures of constraint networks were developed. The algorithm is essen-tially the same scheme as Shenoy-Shafer architecture in probabilistic inference, except that the join and project operations are used as relation combination and marginalization operations. As a special case of CSP, propositional SAT prob-lems can be solved by the same algorithm on the semiring ({false, true}, V , A ) . In [65], Zhang and Mackworth generalized the Arc Consistency (AC) algorithm for constraint networks and designed TAC, a AC algorithm for the rooted join tree, to solve CSP. In the TAC algorithm, the marginalization operation corre-sponds to relational projection and the combination operation corresponds to re-lational join. As a result, the marginal of combining two relations is represented as relational semi-join. The contribution of [65] is proposing and analyzing par-allel and distributed message-passing algorithms (PTAC and DTAC) that can be generalized to apply to junction tree algorithms from other disciplines. The junction tree algorithm was explicitly generalized in decoding applica-tions as Generalized Distributive Law (GDL) by Aji and McEliece [1]. Many concrete decoding algorithms, including the Baum-Welch algorithm, the Gallager-Tanner-Wiberg decoding algorithm, and the B C J R algorithm are all special cases of the G D L algorithm. In decision-making problems, Jensen et al. [30] proposed an algorithm that attaches two types of potentials to the junction tree converted from the influ-ence diagram, and solves the corresponding decision-making problem by passing messages in the junction tree. We could design other concrete JT algorithms for problems that can be abstractly represented in the semiring-based unified framework by instantiating different semirings. Specifically, when a CBI problem is defined on the semiring with the combination invertible property, the generalized LS architecture and HUGIN architecture are both suitable to solve the inference task. 4.2.6 Space and T i m e Complex i t y Given a CBI problem (X, D, R, F), the space and time complexities of the gen-eralized junction tree algorithm are discussed in terms of the following notions: • T = (C,<S): a junction tree of the CBI problem; • d = max(\Di\), Di G D: the maximum domain size of variables; • w: the width of the junction tree T ; • sep: the maximum separator size of the junction tree T; '• r = |F | : the number of constraints; • r%: the number of constraints allocated to cluster Cf, • dep,: degree of cluster C,; Chapter 4. Exact Inference Algorithms 47 • k: ratio of r and n , etc. r = k • n. Theorem 4 .3 (Space Complexity of JT in Shenoy-Shafer (SS) Architecture) The space complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture is 0{\C\ • dw+1). Proof: To pass messages in SS architecture, each cluster needs one storage for the combination of constraints allocated to this cluster, and another storage for the marginalized constraint after absorbing all the messages passing to it. So the needed storage for all clusters is at most 2 • \C\ • dw+1. In each separator, both the upward and downward messages are stored, and the needed storage is then 2 • |<S| • dsep at most. The total storage needed for the JT algorithm in SS architecture is 2 • \C\ • dw+1 + 2 • \S\ • dsep. Since sep < w and \C\ = \S\ + 1 in trees, the space complexity is 0(\C\ • dw+1). • Theorem 4 .4 (Space Complexity of JT in SS Architecture with Idempotent Semiring (SS-Idemp)) The space complexity of the generalized junction tree algo-rithm in Shenoy-Shafer architecture with an idempotent semiring is 0(\C\-dw+1). Proof: To pass messages in SS-Idemp, each cluster needs only one storage for the combination of constraints initially allocated to this cluster. Lately, it is used to store the marginalized constraint after absorbing all the incoming messages. So the needed storage for all clusters is at most |C| • dw+1. , A l l messages do not need to be stored, so the space complexity is 0(\C\ • dw+1). • Theorem 4 .5 (Space Complexity of JT in SS Architecture with Invertible Semir-ing (SS-Inv) ) The space complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture with the combination invertible property is 0(\C\ • dw+1). Proof: To pass messages in SS-Inv, each cluster needs only one storage for absorbing all the messages passing to it. So the needed storage for all clusters is at most \C\ •dw+1. In each separator, the upward and downward messages are stored in the same place, so the needed storage is then \S\-dsep at most. The total storage needed for the JT algorithm in SS-Inv architecture is |C|-a™ + 1 + | iS | -d s e p . Since sep < w and |C| = |<S| + 1 in trees, the space complexity is 0(\C\ • dw+l). a Theorem 4 . 6 (Space Complexity of JT in Lauritzen-Spiegelhalter (LS) Ar-chitecture) The space complexity of the generalized junction tree algorithm in Lauritzen-Spiegelhalter architecture is 0(\C\ • dw+1). Proof: To pass messages in LS, each cluster requires only one storage for the combination of constraints allocated to this cluster. After passing messages, the combined constraint is revised to be the marginal of the total constraints combination. So the needed storage for all clusters is at most |C| • dw+1. In each separator, no message is stored since all messages are absorbed by the cluster on the fly. Then the total storage needed for JT algorithm in the LS architecture is at most \C\ • dw+1, which means the space complexity is 0(\C\ • dw+1). • Chapter 4. Exact Inference Algorithms 48 Theorem 4.7 (Space Complexity of JT in HUGIN Architecture) The space complexity of the generalized junction tree algorithm in HUGIN architecture is 0{\C\)-dw+1). Proof : To pass messages in HUGIN, each cluster requires only one stor-age for the combination of constraints allocated to this cluster. After passing messages, the combined constraint is revised to be the marginal of the total constraints combination. So the needed storage for all the clusters is at most \C\ • dw+1. In each separator, one storage is required to save the intermediate marginal constraint, which requires |«S| • dsep storage at most. The total storage needed for the JT algorithm in HUGIN architecture is \C\ • dw+1 + \S\ • d s e p , which is half of the. storage needed for the JT algorithm in SS architecture. Since sep < w and \C\ = |<S| + 1 in trees, the space complexity is 0(\C\ • dw+l). • Then we discuss the time complexities of the generalized JT algorithm and its variants. Theorem 4.8 (Time Complexity of JT in Shenoy-Shafer (SS) Architecture) The time complexity of the generalized junction tree algorithm in Shenoy-Shafer architecture is 0{\C\2'- dw+1). Proof : To combine the initial constraints allocated to clusters, X)l=i (r* ~ -0 ' °" ® operations are needed at most. To compute a message from Ci to Cj, (degi — l)-dw+1 ® operations and (dw+1 /dsep — 1)• dsep © operations are required at most. For each cluster Ci, there are degi messages to compute, so in total £ i = i de9i • (de9i - !) • d W + 1 ® operations and £ ' = i degi • (dw+1 - dsep) © op-erations. Finally, to combine all the incoming messages, each cluster requires at most degi • dw+l ® operations. Combining all of them, the upper bound of ® operations T(®) of the JT algorithm in SS architecture is: T(®) = < < And the upper bound of © operations T(©) of the JT algorithm in SS architecture is: |C| £ (n - 1 + deg} - degi + degi) • d |C| (^degj + r-\C\)-dw+l-i=l |C| 2 ($2 de9i +r-\C\)-dw+' i=l {2 • \S\2 + r - \C\) • OT+1 (4 • \C\2 + r - |C|) • dw+l Chapter 4. Exact Inference Algorithms 49 | C | T(©) = Y degi • (dw+1 - dsep) i = l < 2 • \S\ • (dw+l - dsep) < 2-\C\-dw+1 If in the semiring R, © and <g> have the same processing time, the total time of the generalized JT in SS architecture is 0(\C\2 • dw+l). • Theorem 4 .9 (Time Complexity of JT in SS Architecture with Idempotent Semiring (SS-Idemp)) The time complexity of the generalized junction tree algo-rithm in Shenoy-Shafer architecture with an idempotent semiring is 0(\C\-dw+1). Proof : To combine the initial constraints allocated to clusters, {r, — 1) • dw+l <g> operations are needed at most. In the upward phase, SS-Idemp architecture needs at most __!=i (degt - 1) • dw+1 <g> operations and __[=i (dw+1 - dsep) ©. In the downward phase, __!=i ® operations and ((degt - 1) • (dw+1 - dsep)) © operations are required. Combining all of them, the upper bound of ® operations T(<gi) of the JT algorithm in SS-Idemp architecture is: | C | = J^{ri-l + degi-l + l)-dw+1 i = l = (r-\C\ + 2.-\S\)-dw+1 < (ICI+r)-^-*" 1 The upper bound of © operations T(ffi) of the JT algorithm in SS-Idemp architecture is: | C | T(©) = J2(de9i-l + l)-(dw+1 -dsep) i=l = 2 • |5| • {dw+1 - dsep) • < 2-\C\-dw+1. If in the semiring R, © and ® have the same processing time, the total time of the generalized JT in SS-Inv architecture is 0(\C\ • dw+l). • If semiring R has the combination invertible property, message-computing in SS architecture can be improved by caching the combination of all the incoming messages and dividing the specific incoming message to get the outgoing message for that separator. We call it SS-Inv architecture. Chapter 4. Exact Inference Algorithms 50 Theorem 4.10 (Time Complexity of JT in Shenoy-Shafer Architecture with Combination Invertible (SS-Inv)) The time complexity of the generalized junc-tion tree algorithm in Shenoy-Shafer architecture with the combination invertible property is 0{\C\ • dw+l). Proof : To combine the initial constraints allocated to clusters, __l=i (ri ~ 1) ' d _> operations are needed at most. In the upward phase, SS-Inv architecture needs at most J_|=i (degi - 1) • dw+l 0 operations and __!=i (dw+l - d°ep) ©• In the downward phase, £J_ 1'_: u' + 1 0 operations, ((degi - 1) • (dw+l - dsep)) © operations, and __i=i ((.degi — 1) • dw+l) 0 operations are required. Combining all of them, the upper bound of <g> operations T(®) of the JT algorithm in SS-Inv architecture is: | C | T(®) = ]T ^ ~ 1 + de9i - 1 + ! ) ' dw+l i=l = (r-\C\ + 2-\S\)-dw+1 < (|C| + r) • dw+1 The upper bound of © operations T(ffi) of the JT algorithm in SS-Inv ar-chitecture is: | C | (^©) = J_(deffi-l + l)-(<r+ 1-d s e p) t=l = 2 • |5| • (dw+1 - dsep) < 2-\C\-dw+1 And the upper bound of 0 operations T ( 0 ) of JT algorithm in SS-Inv ar-chitecture is: | C | T(0) = Y ^ d e 9 i ~ l ) - d w + l ) i=l = (2 • |5| - |C|) • dw+1 < |C| • c T + 1 If in the semiring R, © , 0 and 0 have the same processing time, the total time of the generalized JT in SS-Inv architecture is 0(|C| • dw+l). • Theorem 4.11 (Time Complexity of JT in Lauritzen-Spiegelhalter (LS) Ar-chitecture) The time complexity of the generalized junction tree algorithm in Lauritzen-Spiegelhalter architecture is 0(\C\ • dw+1). Chapter 4. Exact Inference Algorithms 51 Proo f : To combine the initial constraints allocated to clusters, £ - = 1 (r* — 1) • dv 0 operations are needed at most. In the upward phase, LS architecture needs at most Y^fli (de9i ~ l) ' dw+1 ® operations, Y)ili (dw+1 - dsep) © operations and dw+1 0 operations. In the downward phase, £ i= i d u , + 1 0 operations and X)-^! (deaj — 1) • (dw+1 — dsep) © operations are required. Combining all of them, the upper bound of 0 operations T(<g>) of the JT algorithm in LS architecture is: |C | T(®) = ] T ( r i - l + d e 5 i - l + l ) - d ' " + 1 i=i = (r-\C\ + 2-\S\)-dw+1 < (\C\ + r)-dw+l The upper bound of ffi operations T(ffi) of the JT algorithm in LS architec-ture is: | C | T(®) = ^ ( d e g i - l + l)-(dw+1 -dsep) i=i = 2 • |5| • (d™ + 1 - dsep) < 2 • \C\ • dw+1 And the upper bound of 0 operations T ( 0 ) of the JT algorithm in LS architecture is: |C | T ( 0 ) .= ^ d ™ 4 1 i=l = 2 • |5| • dw+l < 2 • \C\ • dw+1 If in the semiring R, ffi , 0 , and 0 have the same processing time, the total time of the generalized JT in LS architecture is 0(\C\ • dw+1). • Theorem 4.12 (Time Complexity of JT in HUGIN Architecture) The time complexity of the generalized junction tree algorithm in HUGIN architecture is 0(\C\ -dw+1). Proof : To combine the initial constraints allocated to clusters, Y^li (ri ~ -0 ' °" 0 operations are needed at most. In the upward phase, HUGIN architecture Chapter 4. Exact Inference Algorithms 52 needs at most £ [ = i {degi - 1) • dw+1 0 operations and Ylf=\ (dw+1 ~ daep © op-erations. In the downward phase, £ [ = 1 dw+1 ® operations, Jlf2i (degi - 1) • {dw+l - dsep) © operations and Yl^li dsep 0 operations are required. Combining all of them, the upper bound of 0 operations T{®) of the JT algorithm in HUGIN architecture is: |C | £ (n - 1 + de9i - 1 + 1) • i=l ( r - | C | + 2 - | « S | ) - ( i w + 1 (\C\+r)-dw+1 operations T(©) of the JT algorithm in HUGIN T(®) = < The upper bound of © architecture is: |C | r(8) = ^ ( d e o i - l + l ) - ^ 1 - ^ ) i=l = 2 • |5 | • (dw+1 - d s e p ) < 2 • |C| • ' And the upper bound of 0 operations T ( 0 ) of the JT algorithm in HUGIN architecture is: T ( 0 ) = £ d s e p i=l = 2-\S\-dsep < 2-\C\-dw+l If in the semiring R, ffi , 0 , and 0 have the same processing time, the total time of the generalized JT in HUGIN architecture is 0(\C\ • d w + 1 ) . • 4.2.7 Discussion of App ly ing the J T Algor i thm In general, junction tree algorithms are memorized dynamic programming, which cache solutions for subproblems to answer multiple queries. Both the time and space complexities of junction tree algorithms are polynomial in the size of the junction tree, with a constant factor exponential in the maximum subproblem size (width of the tree plus 1). So the key to applying JT algorithms is to di-vide the original CBI problem into subproblems with tractable sizes. However, Chapter 4. Exact Inference Algorithms 53 many practical CBI problems have large treewidth. As the lower bound of the width of any tree decomposition, a large treewidth makes exact inference in the junction tree infeasible. There are two ways to perform approximate inference In such a situation: (1) Splitting the oversized clusters, which corresponds to removing some edges from the primal graph representation, or retracting some constraints in the original CBI problem; (2) Using junction graphs to perform inference, instead of junction trees. In a junction graph, messages can pass in loops and may not terminate, which means that information may be counted repeatedly. Some criteria are used to terminate the message-passing after reach-ing preset thresholds. Both of these approaches will be discussed in Chapter 5. Another interesting approach for performing exact inference in junction trees is to consider the effect of variable values and their impacts on the structure of the tree. We will briefly discuss this in Section 8.2. Al l concrete junction tree algorithms discussed in this chapter use serial message-passing schemes. In practical applications, there are more efficient implementations. Both paralleled and hybrid message-passing schemes can be applied to achieve computational efficiency. The time and space complexities of the generalized junction tree algorithm, as well as the generalized variable elimination algorithm, are compared in Ta-ble 4.2. Here Q = (V,£) is the primal graph of a CBI problem (X,D,R,F). T = (C,S) is a junction tree transformed from Q. Both the V E and the JT algorithms are exponential in the induced width of a given elimination order-ing, or the width of the junction tree representation. For the V E and the JT algorithms with combination invertible properties or combination idempotency properties, the time complexity is linear in the size of the problem, whereas in the generalized JT, it is quadratic in the size of the problem. Table 4.2: V E and JT Complexity Comparisons Algor. Space Complexity Time Complexity G V E 0(|V| + 0(\F\ • <r+1) G J T 0 ( \ C \ - d w + l ) 0{\C\2-dw+1) GJT-Idemp 0(\C \-dw+l) 0(\C\ -dw+i) GJT-Inv 0 ( \ C \ - d w + L ) 0{\C\-dw+v) GJT-LS 0(\C\ • dw+v) o ( \ c \ • dw+v) GJT-HUGIN 0 ( \ C \ - d w + l ) 0{\C\-dw+v) The big O expressions in Table 4.2 do not convey sufficient information about the running times of these algorithms. The upper bounds of different operations in these algorithms are listed in Table 4.3. According to the running time of each concrete operation in Java (cf. Table 4.4), users can compare the running times of different junction tree algorithms and instantiate these algorithms for their own purposes. For a CBI problem with the combination invertible property, More specifically, for a CBI problem defined on a semiring with the combination invertible property, generally we conclude: (1) GJT-Inv uses the least time, followed by GJT-HUGIN and then Chapter 4. Exact Inference Algorithms 54 Table 4.3: Upper Bounds of V E and JT Running Times Algor. 0 © 0 G V E (r - |V|) • dw+L |V| • dw+l- 0 G J T (4 • |C|* - Id + r) • dw+l 2 . |C| • (dw+1 - dseP) 0 GJT-Idemp (\C\+r)-dw+i 2 • |C| • {dw+1 - dseP) 0 GJT-Inv (\C\+r)-dw+i 2 • |C| • (dw+1 - dseP) Id •dw + l GJT-LS (\C\ + r)-dw+L 2 • |C| • ( d U J + 1 - dseP) 2- |d -dw+1 GJT-HUGIN (\C\+r)-dw+l 2-\C\- (dw+L -dseP) 2 • |d • d s e p GJT-LS; (2) GJT-LS uses the least space while GJT-HUGIN and GJT-Inv use about the same space. This conclusion is supported by the experimental results in Chapter 7. Table 4.4: Running Times (per 106 operation) for Different Operations in Java on a PIII833MHz P C , Running Windows X P a n d JDK1.4 Operations Running Time (ms) Objects in Java X 1570 Double, Float, Integer + 1534 Double, Float, Integer -f- 1584 Double, Float, Integer max 1687 Double, Float, Integer min 1688 Double, Float, Integer V 128 Boolean A 128 Boolean 4.3 Summary In this chapter, two popular exact inference algorithms for CBI problems are generalized based on our semiring-based unified framework. Both variable elim-ination algorithms and junction tree algorithms are dynamic programming, where junction tree algorithms cache solutions of subproblems for answering multiple queries. In some semirings with the combination invertible property or the combination idempotency property, the JT algorithms can be improved to achieve more computational efficiencies. Generalized versions of these improved algorithms are discussed as well. In this chapter, we conclude that many concrete algorithms for CBI problems from different disciplines are actually special cases of the generalized variable elimination algorithms and the generalized junction tree algorithms. These algorithms are listed in Table 4.5 for comparison. The complexity analyses indicate that both the time and space complexities of V E and JT algorithms are polynomial in the size of the CBI problem, but have a constant factor which is exponential in the width of the tree decomposition. Chapter 4. Exact Inference Algorithms 55 Therefore, finding an optimal elimination ordering to minimize the induced width, which is equivalent to finding a tree decomposition with the minimum width, is the key of applying these two exact inference algorithms. For many practical problems with intractable treewidth (lower bound of the widths of all tree decompositions), exact V E and JT algorithms are infeasible. In such cases, we can either use approximate inference approaches if permitted, or exploit the value of variables, as well as the structures of the CBI problems to reduce the scale of the problems. Table 4.5: Concrete V E and JT Algorithms in Different Fields. Fields V E J T Constraint Satisfaction V E for CSP(Seidel [54]); Adaptive Consistency (AC)(Dechter and Pearl [16]) Bucket-Tree Elimination (Dechter and Pearl [17]); Arc Consistency for JT (TAC) (Zhang and Mackworth [65]) Satisfiability Problems Davis-Putnam (DP)(Dechter and Push [18]) Bucket-Tree Elimination (Dechter and Pearl [17]) Probability Inference in BN Variable Elimination Algorithm [64]; Bucked Elimination (Dechter [14]) SS (Shenoy and Shafer [55]); LS (Lauritzen and Spiegelhalter [39]); and HUGIN (Jensen et al. [29]) Decision Making Transform to V E of B N (Zhang [63]) J T for ID (Jensen et al. [30]) Dynamic Bayesian Net-work - Belief Propagation (BP) Possibility Inference - -Coordination Games V E for Coordination Graph (Guestrin et al. [25]) -Max-Likelihood Decod-ing Viterbi Decoding (Viterbi [57]) Generalized Distributive Law (GDL) (Aji and McEliece [1]) Chapter 5. Approximate Inference Algorithms 57 Chapter 5 A p p r o x i m a t e Inference A l g o r i t h m s 5.1 Motivation for the Design of Approximate Algorithms The analyses in previous chapters show that both the variable elimination al-gorithm and the junction tree algorithm can perform exact inference for a constraint-based inference problem in polynomial time and space. However, there always exist constant factors in the time and space complexities that are exponential in the maximum cluster size of the problem's tree decomposition. This means that the generalized exact inference algorithms would be infeasible when the treewidth of the corresponding problem, a lower bound of width for all possible tree decompositions, is intractable. This practical challenge encour-ages researchers to develop approximate inference algorithms for CBI problems from different disciplines, if approximate inference results with some quality guarantee are acceptable in their application domains. The key idea of these approximate inference algorithms is to restrict the size of the maximum subproblem, or equivalently, the maximum cluster size of a tree decomposition or the induced width given an elimination ordering, to an acceptable level. In general, there are at least two possible ways to design an algorithm to achieve this purpose. The first approach is to revise the original CBI problems by removing some less important constraints, which makes the structure of the problem's graphical representation much simpler; another ap-proach does not touch the original CBI problem but re-organizes it into more complex graphical representations, e.g., a junction graph with loops. Inference procedures are carefully re-designed to cope with these graphical representa-tions. 5.2 Algebraic Approximation for V E and J T The algebraic foundation of approximate algorithms is based on Eq. 5.1. Here F is a set of constraints and Fi is a subset of constraints for i = 1, • • •, b, where F i U • • • U Fb = F and F; n Fj = 0 for any i, j € {1, • • •, b], i ^ j. The basic idea here is breaking the original CBI problem into 6 (overlapped) subproblems, solving them individually, then joining them again to get the solution. Chapter 5. Approximate Inference Algorithms 58 ©<8>/«<8>(©<g>/) (s-i) Y f£F i=l Y f€Fi Algorithm 5.1 describes how to compute the approximate marginal of com-bination. Algor i thm 5.1 The Approximate Marginal of Combination Algorithm (Ap-proxMC) Input: F : set of constraints with variable Y in their scopes, b: number of approximate sets, wmax: threshold width Output: F' = {A, •••Jb} where 0 / < 6 F , / ' « © y ® / g F / l: 5 = 0 2: for each / G F do 3: 5 := SL)Scope(f) 4: end for 5: if |5| - 1 < wmax then 6: / ' : = © y < g > / e F / 7: F' := {/'} 8: else 9: F' := 0 10: for j = 1 to b do ' 11: F j := 0 12: end for 13: for each / G F do 14: Choose j G {1, • • •, b} 15: Fj := Fj- U {/} 16: end for 17: for j — 1 to 6 do 18: F'j := ApproxMC(Fj, Y, 6, wmaX) 19: F' := F' U F ' j 20: end for 21: end if 22: Return F ' In the following sections we will present the approximate V E and JT algo-rithms based on the subroutine ApproxMC. 5.2.1 Approximate Variable Elimination Algorithm According to the generalized exact variable elimination algorithm in Chapter 4, the bottleneck of computation occurs when we eliminate a variable Xi, too many constraints with Xi in their scopes have to be combined, which implies a large constraint after combining and marginalizing. For example, in Figure 5.1(a), Y\,---,YQ are fully connected after eliminating Xi, which means that the size of the marginal constraint is exponential in 6. Chapter 5. Approximate Inference Algorithms 59 One way to overcome this bottleneck is to clone X i with many identical copies • • •, x\b\ Constraints with X i in their scopes are revised by re-placing X i with X ^ \ j G {1, •••,&} according to specified rules. Then the V E algorithm can be applied. In Figure 5.1(b), applying 6 = 2 leads to two marginal constraints with cubic sizes. Of course, introducing identical copies x\X\ • • • , X ^ of X i will introduce conflicts of the X i values and errors in the marginalization. Experimental analyses appear in Chapter 8. (a) (b) Figure 5.1: Graphical Interpretation of Approximate V E . Min-Buckets [19] is the first algorithm proposed for applying this idea to solve probability inference problems approximately. In this section, we gener-alize the idea of Min-Buckets as the approximate V E algorithm in Algorithm 5.2. Algor i thm 5.2 Generalized Approximate V E Algorithm (GVE-Approx) Input: A CBI problem {X, D, R, F ) , a variable subset of interest Z, 6, w m a x Output: gCBi{Z) = ® / e F , / ~ @ X - z <S>/eF / l: Choose an elimination ordering < Y i , • • •, Yk >, where {Yi, • • •, Yk) = X - Z 2: for i = k to 1 do 3: for each / G F do 4: F b := 0 5: • if Yi G Scopetf) then 6:" F :-• F \ { f } 7: F b : = F b U { f } 8: end if 9: end for 10: F ' : = A p p r o x M C { F b , Y i , b , w m a x ) 11: F - - F U F ' ' -12: end for 13: Return gcBi{Z) = ( g ) / G F / Chapter 5. Approximate Inference Algorithms 60 5.2.2 Approximate Junction Tree Algorithm The generalized junction tree algorithm can be modified to perform approximate inference based on Eq. 5.1. The basic idea of the approximate JT algorithm is to pass a set of messages from one cluster to another, instead of passing a single message. The combination of these messages is an approximation of the message passed in the exact JT algorithm. For example, in Figure 5.2(a), rriij is marginalized from the combination of m i to me, together with the local constraint of CV, in Figure 5.2(b), rrty is approximated by the combination of m|j' and m | | \ each of which is generated from the marginalization of three messages and a part of local constraints. (a) ; (b) Figure 5.2: Graphical Interpretation of Approximate JT. The Min-Clustering Tree [43] is an approximate probability inference algo-rithm following this idea. In this section, we generalize the idea of The Min-Clustering Tree as the approximate junction tree algorithm in Algorithm 5.3. For semirings with combination invertible properties, the reverse of a mes-sage is similarly approximated by the combination of a set of the messages' inverses. The approximation can be formalized as Eq. 5.2. i ® ( © ® / ) « ® (i0(0(g)/)] (5-2) Y f£F t= l \ y fG.Fi J After applying Eq. 5.2, we can revise any one of the three exact junction tree algorithms for invertible semirings to cope with approximate inference tasks. 5.2.3 Discussion Algebraic approximation is the foundation of many approximate inference ap-proaches for CBI problems. On the CBI problems level, it is equivalent to retracting some given constraints. On the primal graph representation level, it is equivalent to removing some edges from the graph. On the junction tree representation level, it is equivalent to splitting some clusters in the junction Chapter 5. Approximate Inference Algorithms 61 Algor i thm 5.3 Generalized Approximate JT Algorithm (GJT-Approx) Input: A junction tree T = (C,S) of a CBI problem (X, D, R, F), a variable subset of interest Z, b and w m a x Output: { * C J s.t. 3d e C, gCBi(Z) = ® * e * c . {®Ct\z <t>) ~ © x - 2 •S'/eF / l: Attach each cluster Ciwith a constraint set$c 4 = 0 2: for i = 1 to r do 3: Find a cluster C j such that Scope(fi) C Cj 4: * C i :=*c<U{ / i } 5: end for 6: for each Edge which is from cluster Cj to C j do 7: M ( C j , C j ) : = 0 8: if Cj has received messages from all its neighbors other than Cj then 9: Q := $Ci 10: for each C; e (Neighbor(Ci) \ {Cj}) do 11: Q:=Q\JM{Ci,d) 12: end for 13: M{CU Cj) := J 4pproxMC (Q, C i \ Sij, 6, u i m Q X ) 14: end if 15: end for 16: for each C i £ C do 17: for each C j £ Neighbors(d) do 18: $ C i : = $ C , U M ( C j , C i ) 19: end for 20: •= ApprOxMCi^Ci^^^m^) 21: end for tree. The purpose of these approximations is to restrict the size of the maximum subproblem to a tractable level. Though the idea of algebraic approximation is straightforward, it is hard to analyze the error bounds of these approaches, especially when the combination and marginalization operations are abstract. So far there are few theoretical guidelines for choosing which constraints should be released (which edges should be moved, how to split a cluster); only empirical analyses currently apply in such cases. 5.3 Thin Junction Tree Algorithm Recently, Paskin [48] introduced the thin junction tree algorithm to solve Si-multaneous Localization and Mapping (SLAM) problems. A thin junction tree is a tractable approximation of the original junction tree representation that imposes an upper bound of the cluster size. Generally, it can be seen as a spe-cial case of the generalized approximate JT algorithm in Section 5.2.2. Given a cluster C with intractable size |C | , the thin junction tree algorithm splits it into two clusters. One cluster C i consists of only one variable v and another cluster Chapter 5. Approximate Inference Algorithms 62 C2 consists of \C\ — 1 variables, where v £ C2. If the cluster C has only one neighbor cluster C" with v € C, C\ can be safely absorbed by C". Al l of the incoming messages to C, other than the one from C", can be redirected to C2. Variable v is not contained in these messages. Then we can connect C2 to C and remove C from the junction tree. The size of C is reduced to \C2\ = \C\ — 1. The procedure can be performed repeatedly to restrict the cluster size under an imposed level. 5.3.1 Thinning through Variable Contraction According to the description in [48], the basic operation for making a thin junction tree is variable contraction. Definition 5.1 (Variable Contraction [48]) Let x be a variable that appears in more than one cluster of the junction tree T. Let C be a leaf of Tx (the subtree of T induced by clusters containing x). Let S be the separator joining C to Tx. A variable contraction of x from C removes x from C and S, then marginalizes x out of 4>C (and out of 4>s if using HUGIN architecture). If G becomes non-maximum, it is merged into its subsuming neighbor. Let T be a thin junction tree after performing a variable contraction of T. It is shown [48] that T is still a tree with the running intersection property. The procedure of variable contraction is formalized in Algorithm 5.4, according to [48]. Algor i thm 5.4 Variable Contraction (Contract(v,Ci)) Input: Contract variable v from cluster d, $ C i is a set of constraints allocated to d Output: after contraction 1: Let Cj be the parent cluster of Ci in Tv 2: 4>Ci '•= 0 3: < />„ := 1 4: for each / £ $ C j do 5: if Scope(f) C Cj then 6: <l'c, := U {/} 7: else if v e Scope(f) then 8: 4>v •= 4>v ® / 9: else 10: $ Q := $Ci U { /} 11: end if 12: end for 13: Ci := Ci \ {v} 14: Sij := Sij \{v} where Cj i 1 5 : / := ®v4>v 16: $ C i := $ C j U {/} Chapter 5. Approximate Inference Algorithms' 63 There are two special cases for performing variable contractions. The first case is that v appears only in one cluster C, of the tree. In this case, we can clone Ci to obtain an identical cluster Cj, add an edge between Cj and Cj, contract v from Ci to Cj, and finally contract some other variable u from Cj to Ci. The second case is that v appears elsewhere, but v is not a leaf of Tv. In this case, we can contract v from the other clusters until Cj is a leaf of Tv, then contract v from Cj. 5.3.2 Generalized T h i n Junction Tree Algor i thm The basic procedure of the thin junction tree algorithm is to (1) build a junction tree, (2) limit the maximum cluster size by a given upper bound by performing variable contraction, and (3) pass messages within the thin junction tree. The generalized thin junction tree algorithm within the semiring-based CBI frame-work is shown in Algorithm 5.5. Algorithm 5.5 Generalized Thin Junction Tree Algorithm (GThinJT) Input: A junction tree T = (C,S) of a CBI problem P = {X, D, R, F)\ Z: an variable set of interests; wmax: the upper bound of cluster size Output: gCBi{Z) « ®x\z <8>/SF/ ' 1: Attach each cluster Cjwith a constraint set$C; : = 0 2: for each / e F d o 3: Find a cluster d such that Scope(f) C Cj 4: $ C i := U {/} 5: end for 6: for each d € C s.t. |C i | > wmax do 7: v = ToBeContracted(Ci) 8: if v only appears in Cj then 9: Clone Cj = Cj 10: C := C U Cj and 5 : = 5 U 11: Contract(v,Ci) 12: u = ToBeContracted(Cj) 13: Contractu, Cj) 14: else if v is the leaf of Tv then 15: Contract(v,Ci) 16: else 17: Recursively contract v from leaf of Tv until Cj is the leaf of Tv 18: Contract(v, Ci) 19: end if 20: end for 21: Perform message-passing of the generalized JT algorithm in the thin JT f=(C,S) 22: Find a cluster C G C s.t. Z C C 23: Return __•_»/(£) « © c \ z ® ®_,€Jvei_/,fcor.(6) ™(CjiC) Chapter 5. Approximate Inference Algorithms 64 5.3.3 Discussion The key of the thinning junction tree algorithm is to implement procedure v = ToBeContracted(Ci), which chooses a variable to contract from the cluster. The chosen variable should minimize the approximation error. In probabilistic inference, the approximation error is often measured as Kullback Leibler(KL) divergence. Definition 5.2 (Kullback Leibler(KL) Divergence [34]) KL divergence D(p \\ q) of two discrete probability distributions p(x) and p(y) is defined as: Paskin [48] shows that the error introduced by contracting x from a cluster C can be computed using only the marginal over C. Thus the error can be estimated through local computation; also the errors of a series of contractions are additive, providing a way to estimate the approximation of the accumulated error of the thinning junction tree algorithm. For other application domains without explicit error measurements or error additive properties, heuristic searches can be applied. For example, a variable induced minimum local constraint (within a cluster) can be the one to contract. There are very few studies on this topic. It deserves more attention. 5.4 Loopy Message Propagation The algebraic approximation is equivalent to relaxing some constraints of the original CBI problems, removing some edges in the primal graph representa-tions, or splitting some clusters in the junction tree representations. These approximations mean that the original CBI problems are revised and simplified to make computation feasible. Loopy message propagation, by contrast, does not revise the original CBI problems. As already known, in junction tree algorithms both time and space com-plexities are bounded by the maximum cluster size. To maintain junction tree properties, the maximum cluster size is usually large in practical problems. If junction tree properties are relaxed, in other words, if the secondary structure is not necessarily a tree but a graph with loops, the maximum cluster size can be dramatically reduced. At the same time, the message-passing may not terminate due to the intro-duction of loops. Also messages will be repeatedly counted. Both of these bring errors of inference. However, empirical results in the probability inference [46] and Turbo Codes decodings [44] show that the same message-passing schemes in exact junction tree algorithms work well in junction graphs with loops. When the junction graph is singly connected, the exact inference result is produced after one iteration of message-passing. When there exists only a single loop in the junction graph, it is guaranteed to converge to the correct result under some conditions [60]. Chapter 5. Approximate Inference Algorithms 65 5.4.1 Building a Junction Graph Definition 5.3 (Junction Graph) Given the primal graph Q = (V, £) of a CBI problem P = (X, D,R, F), a junction graph is J = (C,S), where C = {Ci,- • • , C n} is a set of clusters, each cluster Ci corresponds to an aggre-gation of a subset vertices (correspondingly, variables) Vct C V); and S = {(Ci, Cj)\Ci, Cj £ C} is a set of separators between Ci and Cj. In addition, a junction graph satisfies that for any constraint f £ F, there exists a cluster d £C s.t. Scope(f) C Cj. A junction tree is a junction graph without loops. In a junction graph, both the constraint allocation property and the running intersection property still hold. Given a CBI problem P = (X, D, R, F), there are many ways to generate a junction graph. The most straightforward way is shown in Algorithm 5.6. Following this algorithm, the maximum cluster size of the generated junction graph is the size of a constraint with the maximum number of variables in its scope, which is relatively small to cope with. The junction graph generated by Algorithm 5.6 is sometimes called the dual graph. Algor i thm 5.6 Building a Junction Graph for a CBI Problem Input: A CBI problem P = (X, D, R, F) and a variable subset of interest Z Output: A junction graph J = (C, S) l : C : = { Z } 2: S:=0 3: for each / £ F do 4: if Scope(f) C d for some d € C then 5: Continue 6: else if d C Scope(f) for some d € C then 7: Replace Cj with Scope(f) 8: else 9: Cluster Ci := Scope(f) 10: C := C U C* 11: end if 12: end for 13: for each Cj, Cj £ C do 14: if Cj n Cj / 0 then 15: Sij := Ci O Cj 16: S-.-SUSij 17: end if 18: end for A dual graph can be revised to be a junction graph with less clusters and a larger maximum cluster size. The basic operation of the revision is the merging of neighboring clusters. Given two clusters Ci and C2 in neighbors, merging is defined as the following sequence of actions: (1) introducing a new cluster Chapter 5. Approximate Inference Algorithms 66 C = C i UC2, (2) connecting all neighbors of C\ and C2 to C , and (3) removing C i and C2 from the graph. The size of the merged cluster obviously increases. It is a tradeoff between the number of clusters and the maximum cluster size. 5.4.2 Generalized Loopy Message Propagation Algor i thm The generalized loopy message propagation for a junction graph J = (C,S) of a CBI problem P = (X,D,R,F) is shown in Algorithm 5.7. The basic idea of the generalized loopy message propagation is the same as message-passing in the generalized junction tree algorithm, except that messages at iteration t are updated by incoming messages at iteration t — 1. The initial messages are produced based on local constraints and do not depend on the incoming messages from neighboring clusters. A l g o r i t h m 5.7 Generalized Loopy Message Propagation Algorithm (GLoopy) Input: A junction graph J = (C,S) of a CBI problem P = (X,D,R,F)\ Z: an variable set of interests; tmax : maximumiterationtimes Output : gCBi{Z) « ( $ X \ z <8>/_F/ l: Attach each cluster C,with a constraint rpd = 1 2: for each / G F do 3: Find a cluster Cj such that Scope(f) C Cj 4: (Pd •= <Pd <8> / 5: end for 6: for each Ci G C do 7: for each Cj G Neighbors(Cj) do <• 8: m^(Ci,Cj) ~ © c . X 5 u ©C; 9: end for 10: end for 11: for t = 1 to tmax do 12: for each Ci G C do 13: for each C j € Neighbors(Ci) do 14: m ^ ^ C j ) := © C A S i . <PCi ® <8>c i e y V e i . / .6or . (c i ) J -^ f n ( t " 1 ) (^,C i ) 15: end for 16: end for 17: end for 18: Find a cluster C G C s.t. Z C C 19: Return __•_/(£) « 0 C X Z 0 C ® <g>_,_jv«ff/.i»r.(c() d) If a CBI problem is defined on a semiring with the combination invert-ible property, the algorithm can be revised slightly to save computational cost. Details of this revised version of the generalized loopy message propagation al-gorithm for semirings with the combination invertible property are shown in Algorithm 5.8. Chapter 5. Approximate Inference Algorithms 67 Algor i thm 5.8 Generalized Loopy Message Propagation Algorithm with In-verses (GLoopy-Inv) Input: A junction graph J = (C,S) of a CBI problem P = (X,D,R,F); Z: an variable set of interests; tmax : maximumiterationtim.es Output: 9CBI{Z) « 0 x \ z < 8 > / e f / l: Attach each cluster Cjwith a constraint 4>d = 1 2: for each / £ F do 3: Find a cluster Cj such that Scope(f) C Cj 4: 4>Ci •- 4>d ® / 5: end for 6: for each Cj £ C do 7: for each Cj G Neighbor s(d) do 8: m<°>(c7i,Cj):=eCASw*ci 9: end for 10: end for l i : for t — 1 to i m a x do 12: for each d G C do 13: ^ : = 0 C ® O c ^ / V e . g W o r ^ C O m * ' " 1 ^ , Ci) 14: for each Cj G Neighbors(Ci) do 15: m W ^ . C j O - ^ O m ^ ^ ^ i ) 16: end for 17: end for 18: end for 19: Find a cluster C G C s.t. Z Q C 20 : Return gCBl{Z) « © C \ Z 0 C ® (SceJVeisfcfcoraCC) " » ( 0 ( C j , C) 5.4.3 Loopy Message Propagation As an Exact Inference A special case of applying loopy message propagation is performing inference on CBI problems that are defined on semirings with the combination idempo-tency property. Since the idempotency of combination implies that repeatedly counted messages will not introduce errors, loopy message propagation returns exact answers after finite iterations. Theorem 5.4 (Loopy Message Propagation As an Exact Inference) Loopy mes-sage propagation algorithm (Algorithm 5.7) runs as an exact inference algorithm in CBI problems with combination idempotency property. Maximum iterations to be converged are the diameter d of the graph, which is defined as the num-ber of edges in the longest path (with backtrack, detour, or loop excluded from consideration) between any two vertices of the graph. Proof: Consider any cluster C\ in the junction graph J = (C,S) of a CBI problem. It will receive messages from another cluster C2 in d iterations at most. Since passing through other clusters will not filter the message from C2 and repeatedly counting a message will not introduce errors, C\ will receive messages Chapter 5. Approximate Inference Algorithms 68 from all of the other |C| — 1 clusters after d iterations at most. Combining these messages with local constraints in C\ will result in the marginal of the joint combination of total constraints. • Classic CSPs can be embedded into our semiring-based unified framework using the semiring R = ({false, true}, V, A). It is easy to show that the com-bination operation A, logical AND, is idempotent. According to the theorem, loopy message propagation should be an exact inference algorithm for classic CSPs. In practice, arc-consistency [42], the key technique for CSPs, can be seen as a special case of the generalized loopy message propagation for CBI problems with idempotent semirings. In arc-consistency, the messages are the possible values of the variable in a separator. Combining messages from other clusters will remove illegal values, in other words, revise the outgoing message. The algorithm terminates when all the messages remain the same. 5.4.4 Discussion In the probability inference field, the convergence of loopy message propagation in a junction graph with a single loop is proven. For more complex junction graphs of probability inference, little theoretic work has been done. For classic CSPs, loopy message propagation is proven as an exact inference algorithm. Chapter 7 will present the empirical results of applying loopy message propa-gation in classic CSPs. Loopy message propagation is natively a distributed and parallel algorithm. At the iteration t, each cluster can compute its outgoing messages by collecting all the incoming messages from other neighbor clusters at iteration t — 1 in parallel. The number of needed processors is the same as the number of clusters. We may resort to junction graph building approaches to reduce the number of clusters, while increasing the maximal cluster size at the same time. 5.5 Hybrid Junction Tree Algorithm The last approach for approximate inference of CBI problems is the hybrid junc-tion tree (HybridJT) algorithm. Generally, it passes messages exactly to/from clusters with tractable sizes. When a cluster with intractable size is encoun-tered, the hybrid junction tree algorithm builds a local junction graph based on local constraints and incoming messages. Loopy message propagation is used in this local junction graph. After several iterations of loopy message propagation, the approximate messages of such a cluster are generated and ready to pass to its neighbors. Formally, the hybrid junction tree algorithm is shown in Algorithm 5.9. The hybrid junction tree algorithm here is analogous to the generalized approximate junction tree algorithm, except in the method of dealing with large clusters. In ApproxJT, large clusters are split into small sub-clusters, whereas in HybridJT, large clusters are treated as subproblems, which are approximately solved using loopy message propagation. In addition to the local constraints of a subprob-Chapter 5. Approximate Inference Algorithms 69 lem, all the incoming messages are seen as constraints of the subproblem. A l l outgoing messages are initially seen as unitary constraints. After performing loopy message propagation over the junction graph of the subproblem, we can approximately produce outgoing messages. 5.6 Summary The exponential complexities of exact inference algorithms for CBI problems make approximate approaches not only desirable but mandatory in practical scenarios. Al l generalized approximate inference algorithms presented in this chapter aim to restrict the maximum cluster size or induced width and to reduce the exponentially large factor in the complexities expression. The cluster sizes actually represent sizes of subproblems. We can achieve this purpose through either relaxing some constraints or re-organizing subproblems in a more complex way. With a bounded subproblem size, all exact inference approaches can be modified to cope approximately with scale problems. So far, approximate inference approaches are mainly studied empirically. Little theoretical work has been done to analyze the bounds of approximation errors or give directions for choosing approximate parameters. These are defi-nitely interesting topics for future study. Chapter 5. Approximate Inference Algorithms 70 Algor i thm 5.9 Hybrid Junction Tree Algorithm (HybridJT)-, Input: A junction tree T = (C,S) of a CBI problem P = (X,D,R,F); Z: an variable set of interests; wmax- upper bound of cluster size; b: number of partial messages Output: (Approximate) Consistent junction tree that is ready to answer queries l : 1 Attach each cluster Ci with a constraint set$Ci = $ for each / _ F do Find a cluster d such that Scope(f) C Cj $C< := $Ci U {/} end for for each Edge Sij which is from clusters Cj to C, do M(Ci,Cj) := 0 if Cj has received messages from all its neighbors other than Cj then Q •= *c, for each C, € (Neighbor(Cj) \ {Cj}) do Q:=QuM(Q,Ci) • end for if |Q| < wmax then M ( C j , C j ) : = { © C A S i . ( g ) / 6 Q / } else if \Sij\ < wmax then Define : 5< j -» 1 Q : = Q U { 0 ( ° ) } else Split Sitj into 6 parts, \Jbk=l S\j = 5j,j for k = 0 to b — 1 do Define o<fc) : S $ -» 1 Q : = Q U { 9 W } end for end if Building junction graph J of constraints set Q and do loopy message propagation over J let Qk be the cluster in J to represent for k' = 0 to b — 1 do Af (Ci, C,) := Af (Ci, Cj) U ® Q l £ N e i g b o r s { Q k ) m(Qu Qk) end for end if end if end for Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 71 Chapter 6 G e n e r a l i z e d C B I T o o l k i t i n J a v a - G C B I J 6.1 GCBIJ Motivation In a constraint-based inference problem, the two basic constraint handling op-erations, combination and marginalization, are expressive enough to generalize various inference approaches. The semiring-based unified framework for CBI problems introduced in Chapter 3 is based on this observation. The proposed framework defines the generalized representation of CBI problems in terms of discrete variables and constraints on local variables. It also uses the semir-ing, an importation concept in abstract algebra, to generalize operations of the constraint combination and marginalization. Any CBI problem, then, can be theoretically represented by our semiring-based framework. Any discrete in-ference approaches based on combination and marginalization operations can be theoretically interpreted by the framework as well. In previous chapters, we have embedded into this framework many concrete CBI problems, such as probabilistic inference, decision-making under uncertainty, constraint satisfac-tion problems, propositional satisfiability, decoding problems, and possibility inference. We also show that many concrete exact and approximate inference algorithms can be generalized within the framework. In other words, we show that the semiring-based unified framework is powerful enough to represent both CBI problems and various concrete inference approaches. To prove the representation power of our proposed semiring-based unified framework in practice, we implement a software toolkit, named Generalized Constraint-Based Inference Toolkit in Java (GCBIJ). It is a concrete system for implementing the ideas and concepts of our semiring-based unified frame-work. Besides demonstrating the representation power of the framework, there are also two reasons for implementing the framework: (1) GCBIJ provides a series of generalized inference approaches, either exact or approximate, to solve practical CBI problems. Users can tackle problems in their own domains sim-ply through calling procedures provided by this toolkit. They only need to specify or design their task-specific semirings. (2) Researchers can design, ver^ ify, compare, and analyze different inference approaches for CBI problems on GCBIJ 's platform. Discovering these general ideas for performing inference is more straightforward by dealing with abstract versions of inference algorithms. The process of abstracting inference approaches from other disciplines provides Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 72 a good opportunity to improve the algorithm design in one discipline. 6.2 GCBIJ System Architecture The main body of GCBIJ consists of 7 packages. Figure 6.1 shows the relations among these packages. We will discuss the details of GCBIJ implementations in the following sections. 1. Semiring: Various semirings are defined and implemented in this pack-age. Public interfaces are shared among semirings, which provide access to combination and marginalization operations, as well as to the identity and absorbing elements of each operation; 2. CBITask: This package provides the objects (variable, domain, con-straint) to specify a CBI problem. Together with an instance of a specified semiring, a CBI problem can be fully characterized; 3. Graph: Two sub-packages are included in the package Graph: • Components: This package provides the primal graph, junction tree, junction graph, and related graphical components . The ba-sic graphical elements, such as vertices and edges, are implemented through wrapping OpenJGraph [31], an open source Java Graph and Graph Drawing library under G N U Lesser General Public License; • Algorithms: This package provides the basic algorithms of the graph theory, including triangulation, cliques identification, and span-ning tree finding; 4. Inference: This package provides various inference algorithms. These al-gorithms are categorized into exact inference and approximate inference. A l l of these algorithms solve specified CBI tasks by accessing the opera-tions of the abstract Semiring interface; 5. Parser: To solve concrete CBI problems, different parsers are imple-mented to translate various types of practical problem representations into the internal format of our framework; 6. Utilities: For research purposes, we need to collect information about inference approaches, either in terms of C P U time or in terms of the number of operations. Some statistic utilities are implemented in this package to assist with our algorithm analyses; 7. G C B I J J V I a i n : This example package includes routines on how to use GCBIJ integrally. Example CBI problems and inference algorithms are tested. Statistic utilities are used here to analyze various inference algo-rithms. There are also some test units included in this package. i ,-Pfcgr Graph.Components E d g e "7ST Vertex Separator 1 G r aph - v - pm Doma in •valueList i 1 11 Clus ter J G r a p h 1. r - i J T r e e ShortestPath - r "PKg: i : Graph.Algorithms I Sma l lWor ldParam MinFi l l ln MaxCard Tr iangu lat ionAlg ^Variable Constra int •varList •valueTable 1 * 1 _ _pkg; _, GCBIJ_Ma i in CBITask Semiring i+query(): Constrainti +findOneAssign() +findAIIAssign() 4 - -Semiring «inter face» l lnversiable j+product(): object +sum(): object i+getSumldentity(): object i+getProdldentity(): object i+isCommutativeO : bool r 1 +divide(in valueList) : object\ +normalize(): object «inter face» l ldempotent 1 1 1 HeuristicSearch +getOrdenng() Pkg: Parser G C B I J Maih •task : CBITask •graph : Graph •algor: InferAlg +parseCmdLine()i Parser +parse() TestUnit +compares(in valueList): boon S u m P r o d S e m i n n g M a x P r o d S e m i n n g O r A n d S e m i r m g • - BHParser^ XmlbifParser DimacsParseri - ~ Pkg: ~ ~ StatisticUtilities Inference InferAlg •params +query(): Constraint| +findOneAssign() +findAIIAssign() — E Gener ic ln ferA lg ExactlnferAlgl ApproxInferAlgl i-params V E E x a c t l nferAlg J T E x a c t l nferA Ig 1 LS_JTExact ln ferA lg i 2T VEApprox In ferA lg JTApprox In ferA l g LoopyApprox In ferA lg HUGIN JTExact ln ferA lg l |LS_JTApprox ln ferA lg HUG IN_JTApprox ln fe rA l g l dempotent_JTExact ln ferA lg l dempotent_JTApprox ln fe rA l g Th inJTApprox In fe rA l g HybndApprox In ferA lg | _ i _ _ _ _ _ _ J Figure 6.1: System Architecture of GCBIJ Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 74 Table 6.1: Public Methods of the Class Semiring Return Method Name Comments Semiring(int) Constructor with specified semiring type ID. Object product(Object, Object) Binary combination operation. Object sum(Object, Object) Binary marginalization operation. Object getSumIdentity() Get the identity element of marginaliza-tion, if exists. Object getProdldentityQ Get the identity element of combina-tion, if exists. Object getSumAbsorb() Get the absorbtion element of marginal-ization, if exists. Object getProdAbsorb() Get the absorbtion element of combina-tion, if exists. boolean isCommutative() Determine if it is a commutative semir-ing. 6.3 GCBIJ Implementation and Interfaces GCBIJ is capable of representing a set of CBI problems with discrete vari-ables, which include probabilistic inference, decision making under uncertainty, constraint satisfaction, prepositional satisfiability, decoding problems, and pos-sibility inference. A l l the constraints are represented in the table form. Several generalized inference algorithms, either exact or approximate, are implemented. By manipulating various semirings, these generalized algorithms can solve con-crete CBI problems within the semiring-based unified framework. 6.3.1 Semiring Semiring is the key concept in the proposed framework. The implementation of semirings should capture their common computational properties. The abstract class Semiring is the base class for all concrete semirings, which provides basic operations (combination and marginalization), as well as provides access to other semiring properties. The public methods provided by class Semiring are listed in Table 6.1. Dif-ferent semirings are derived from it, which implement concrete methods. In addition, there are two public interfaces, IReversible and Ildempotent, which specify the behaviors of semirings with combination idempotency and invertible properties. The methods of these two interfaces are listed in Table 6.2 and Table 6.3, respectively. Concrete semirings with these two properties can implement these interfaces, besides inheriting the base class Semiring. Implemented concrete semirings in GCBIJ include: • SumProdSemiring; Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 75 Table 6.2: Public Methods of the Interface Ildempotent Return Method Name Comments int compares(Object, Object) Return a partial ordering of the two given objects. boolean isSumldempotent () Determine if marginalization operation is idempotent. boolean isProdldempotentQ Determine if combination operation is idempotent. Table 6.3: Public Methods of the Interface IReversible Return Method Name Comments Object reverses(Object) Return the reverse of a given object. Collection normalize(Collection) Normalize a collection of objects. • MaxSumSemiring; • OrAndSemiring; • MaxMinSemiring; . • MinMaxSemiring; • MaxProdSemiring; • MinSumSemiring; • MinProdSemir-ing.. Provided with the base class and common interfaces, users can easily design and implement their own task-specific semirings. 6.3.2 Parser To tackle practical CBI problems from different disciplines, GCBIJ provides several parser classes to construct CBI problems from input files. Al l parsers are derived from the base abstract class Parser. The public methods provided by the base class are listed in Table 6.4. • Implemented concrete parsers in GCBIJ include: • XmlbifParser : This parser corresponds to parse X M L files in the Inter-change Format for Bayesian Networks [11]. • BifParser : This parser corresponds to parse BIF files in older versions of the Interchange Format for Bayesian Networks [10]. Together with XmlbifParser and a 3rd-party Bayesian Networks Formats Convertor [28], GCBIJ can parse and perform inference on most Bayesian networks in the Bayesian Network Repository [23]. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 76 Table 6.4: Public Methods of the Class Parser Return Method Name Comments Parser(Semiring) Constructor with specified semiring. CBITask parse (String) Given a filename, parse it into CBITask, the internal CBI problem representation of GCBIJ . Semiring setSemiring(Semiring) Specify the semiring used during pars-ing a file. • CspifParser : This parser corresponds to parse CSPs in CSPIF format that is used in the CSP representation of CISpace [41]. Currently, the parser can parse scheduling problems with custom constraints, n-queens problems, and crossword problems. Given more CSPIF specifications, we can extend this parser to incorporate GCBIJ into CIspace in the future. • DimacsParser : This parser corresponds to parse SAT problems in DI-M A C S C N F format. Al l benchmark problems in SATLIB [26] are in this format. • DimacsColorParser : This parser corresponds to parse coloring problems in C O L format. COL format is an older version of the DIMACS C N F format used to characterize clique and coloring problems in graphs. 6.3.3 CBITask This package consists of several elements used to characterize a CBI problem. Class Domain wraps a List in Java as domain values. Class Variable specifies the variable name. It contains an instance of class Domain as the domain of the variable. Class Constraint is the implementation of the constraint, it consists of a list of variables and a (logically) multi-dimensional table to specify the constraint value (in Object in Java) under every value assignment of variables. Internally the table is implemented as a List. A series of accessing methods are provided. The class CBITask is the container of variables and constraints, which corre-sponds to the basic elements of a concrete CBI problem. It also has an instance of Semiring as its member. The combination and marginalization operations for both exact and approximate approaches are implemented for the CBITask through wrapping corresponding methods of this semiring. 6.3.4 Inference The Inference package is a collection of generalized exact and approximate in-ference algorithms for CBI inference tasks, which queries a new constraint over a variable, a set of variables, or the whole problem (no particular variable is Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 77 Table 6.5: Public Methods of the Class InferAlg Return Method Name Comments InferAlg () Default constructor, no CBI problem specified. InferAlg(CBITask) Constructor with specifying a CBI prob-lem to solve. CBITask setCBITask(CBITask) Specify a CBI problem to solve. void setOrderingType(int) Set the heuristic search type to deter-mine a variables elimination ordering. CBITask getCBiTask() Return a CBI problem to be solved. int getOrderingType() Return current heuristic search type. Constraint query () Query the property of the whole CBI problem. Constraint query(Variable) Query the constraint on specified vari-able. Constraint query(Collection) Query the constraint on a set of speci-fied variables. Map queryOneAssignmentQ Find one assignment of all variables. Collection query Al l AssignmentsQ Find all assignments (collection of Maps) of all variables. interested). Another task for CBI problems is the assignment task with combi-nation idempotent semirings. Inference algorithms need to find one valid value assignment or all valid value assignments to uninterested variables. The basic methods of inference algorithms are defined in the abstract base class InferAlg, as shown in Table 6.5. Al l concrete inference algorithms inherit it and follow the same interface. Generic Inference The class GenericInferAlg is the only concrete class, derived directly from the abstract class InferAlg. It directly implements the Eq. 3.1 of CBI problems. Allocation tasks are solved by looking up the value table of the total combined constraint. The generic inference can be classified as an exact inference. How-ever, we decided to implement it independently. Exact Inference The abstract class ExactlnferAlg is derived from the abstract class InferAlg, without more public methods specified. Based on it, several exact inference algorithms introduced in Chapter 4 are implemented. These algorithms include: • VEExactlnferAlg : The generalized variable elimination algorithm (Algo-rithm 4.1) with various elimination orderings is implemented in this class. Backtracking is used to answer assignment tasks. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 78 • JTExactlnferAlg : The generalized junction tree algorithm (Algorithm 4.4) for Shenoy-Shafer architectures is implemented in this class. Back-tracking in the final cluster and constraints-passing are used to answer assignment tasks. There is no special requirement for semirings except commutativity. Three classes derived from JTExactlnferAlg aim to cope with semirings with combination invertible and idempotent properties. - LS JTExactlnferAlg : The generalized JT algorithm for Lauritzen-Spiegelhalter (LS) architectures (Algorithm 4.8) is implemented in this class, which is designed to solve CBI problems with combination invertible semirings. - HUGIN JTExactlnferAlg : The generalized JT algorithm for HUGIN architectures (Algorithm 4.9) is implemented in this class, which is designed to solve CBI problems with combination invertible semir-ings. - IdempotentJTExactlnferAlg : The generalized JT algorithm for CBI problems with combination idempotent properties (Algorithm 4.6) is implemented in this class. Approximate Inference The abstract class ApproxInferAlg is derived from the abstract class InferAlg, with additional public methods to specify approximation parameters. These methods are listed in Table 6.6. Based on the abstract class ApproxInferAlg, several approximate inference algorithms introduced in Chapter 5 are imple-mented. These algorithms include: • VE ApproxInferAlg : The generalized approximate variable elimination al-gorithm (Algorithm 5.2) with various elimination orderings is implemented in this class. • Loopy ApproxInferAlg : The generalized loopy message propagation algo-rithm (Algorithm 5.7) is implemented in this class. • JT ApproxInferAlg : The generalized approximate junction tree algorithm (Algorithm 5.3) for Shenoy-Shafer architectures is implemented in this class. There is no special requirement for semirings except commutativity. Two classes derived from JT ApproxInferAlg aim to cope with semirings with the combination invertible property. Both of them are implemented based on the parent abstract class JT ApproxInferAlg. The implementation of the inverses approximation follows Eq. 5.2. - LSJTApproxInferAlg : The generalized approximate JT algorithm for Lauritzen-Spiegelhalter (LS) architectures. - HUGIN JT ApproxInferAlg : The generalized approximate JT algo-rithm for HUGIN architectures. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 79 Table 6.6: Public Methods of the Class ApproxInferAlg Return Method Name Comments void setMaxWidth(int) Specify the upper bound of the size of combined constraint (the maximum cluster size in a junction tree/graph). void setMaxBucket (int) Specify the number of buckets to divide constraints. void setLoopNum(int) Specify the maximum loops in loopy message propagation. int getMaxWidth() Get the upper bound of the size of com-bined constraint (the maximum cluster size in a junction tree/graph). int getMaxBucket() Get the number of buckets to divide constraints. int getLoopNumQ Get the maximum loops in loopy mes-sage propagation. The other two approximation inference algorithms derived from JTAp-proxInferAlg are: — ThinJIApproxInferAlg : The generalized thin junction tree algo-rithm (Algorithm 5.5). — HybridJT ApproxInferAlg : The generalized hybrid inference algo-rithm (Algorithm 5.9). 6.3.5 Graph This package includes two sub-packages: Component and A lgo r i t hm. Graph .Componen t The basic graph element classes Vertex and Edge are implemented through wrapping classes Vertexlmp and Edgelmpl in OpenJGraph [31]. The primal graph class Graph wraps OpenJGraph's class Graphlmpl. More specifically, the class Vertex has a Variable instance as its member. Class Cluster is derived from Vertex with a collection of vertices as its mem-ber. Additional public methods (cf. Table 6.7) for Cluster are implemented. Class Separator is derived from Edge with a cluster as its member. The separa-tor set and messages are stored in that cluster. Accessing methods for the class Separator are implemented as well. Class JGraph is derived from Graph, which consists of clusters and separa-tors. Class JTree is derived from JGraph with the tree property enforcement. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 80 Graph. Algorithm A collection of graph algorithms are implemented in this package. Class Triangulation A Ig is the implementation of Algorithm 2.1 by feeding in the adjacent matrix of a given primal graph. Also, it provides a method for getting a variable ordering if specifying a heuristic search algorithm. The abstract class Heuristics Search provides a common interface for various heuristic search algorithms in Appendix A. These concrete heuristic search algorithms include: • RandomS'earch; • MinWidthS earch; • MinFillinS earch; • MinlnducedFillinS earch; • MaxCardinalitySearc; • Lex MinimalS earch; • LexP erfectS earch; Another class SmallWorldParam specifies the methods for computing the graphical parameters of small world models. Class SmallWorldUtilities acts as the helper to parameter computing procedures. 6.3.6 Uti l i t ies Several utility classes are implemented in this package, which include Statistic-sUtility, SetUtility and CSPGenerator. The class Statistics Utility implements a collection of static methods for sta-tistical purposes, which includes a start/stop C P U timer, counting numbers of operations, and basic data post-processing methods. The class SetUtility implements set operations such as interaction, difference, and union. Basically, it wraps methods of Collection in Java. The class CSPGenerator is an abstract class that specifies the interface of generating binary CSPs in CSPIF format. The CSPIF format can be parsed either by GCBIJ or by CIspace. In GCBIJ we implement three concrete binary CSP generators, inherited from CSPGenerator: • The class RandomCSPGenerator generates random binary CSPs; • The class SmallWorldCSPGenerator generates binary CSPs with Small World topologies, which implements Algorithm 2.2; • The class ScaleFreeCSPGenerator generates binary CSPs with Scale-Free topologies, which implements Algorithm 2.3. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 81 To generate binary CSPs, we need to specify the number of variables (ver-tices), the number of constraints (edges), the size of the variable domain, the forbidden rate of constraints, and the number of generated instances. Further-more, we need to specify the re-written probability of SmallWorldCSP'Gener-ator. Generated binary CSP instances with various topologies are used in our experiments (cf. Chapter 7). 6.3.7 G C B I J _Main The package GCBIJJVlain contains a main class GCBIJ as the entry point and an example for using GCBIJ . Also, several test unit classes are included in this package. For test purposes, a command line mode is implemented. 6.4 Summary The proposed semiring-based unified framework for CBI problems in Chapter 3 is implemented as the Generalized Constraint-Based Inference Toolkit in Java (GCBIJ). The toolkit provides a way to represent various concrete CBI problems and a series of generalized inference algorithms, from exact to approximate in-ference. By specifying various semirings, these generalized inference algorithms can be instantiated as concrete algorithms that solve CBI problems from differ-ent disciplines. GCBIJ also provides a collection of parsers to translate concrete problems from various fields with domain-specified formats. The architecture of GCBIJ is flexible, making it easy to extend. Users can implement their own task-specific semirings to fulfill their purposes. A l l the implemented inference algorithms use the abstract class Semiring to access basic operations, without relating to the properties of concrete semirings. On the contrary, to design a new inference algorithm, users do not need knowledge of specific semirings, which ensures the generality of the algorithm. Given the common parser interface, users can also design a new parser to translate their domain problems into our internal CBI problem representation. The graph package is flexible as well. Users can later add more graphical features and algorithms. GCBIJ is a proof of the representation power of the proposed semiring-based unified framework. It is a concrete system for implementing the ideas and concepts of the framework. The flexibility of its open architecture makes it a suitable toolkit for both research and application purposes. Chapter 6. Generalized CBI Toolkit in Java - GCBIJ 82 Table 6.7: Public Methods of the Class Cluster Return Method Name Comments int getSize() Return the number of variables in this cluster. Collection getVertexSet() Return vertices in this cluster. void addVertex( Vertex) Add a vertex to this cluster. void addVertices(Collection) Add a collection of vertices to this cluster. void remove Vertex( Vertex) Remove a vertex from this clus-ter. void remove Vertices(Collection) Remove a collection of vertices (if any) from this cluster. boolean containVertex( Vertex) Determine if the vertex is in this cluster. boolean contain Vertices (Collection) Determine if a collection of ver-tices is a subset of this cluster. void addToLocalPotential(Collection) Add a collection of constraints to the local constraint set of this cluster. void clear LocalPotential(Collection) Clear the local constraint set of this cluster. void addToIncomingMsg(Collection) Add a collection of constraints to the incoming message set of this cluster. void clear IncomingMsg (Collection) Clear the incoming message set of this cluster. void addToOutgoingMsg(Collection) Add a collection of constraints to the outgoing message set of this cluster. void clearOutgoingMsg(Collection) Clear the outgoing message set of this cluster. Chapter 7. GCBIJ Applications: Case Studies 83 Chapter 7 G C B I J A p p l i c a t i o n s : C a s e S t u d i e s In this chapter, we use the Generalized Constraint-Based Inference toolkit in Java (GCBIJ), an implementation of the proposed semiring-based unified frame-work and generalized inference algorithms, to do a series of experiments for CBI problems from different disciplines. Based on these experiments, we show that (1) the proposed semiring-based unified framework and generalized inference algorithms are suitable for representing and solving various concrete CBI prob-lems; (2) GCBIJ is an good platform for studying CBI problems; (3) GCBIJ has the potential to tackle practical applications. 7.1 Test Suites In this chapter, we study a series of concrete CBI problems based on GCBIJ . These problems belong to probability inference, CSP, and SAT respectively. See Table 7.1 for basic information on these problems. 7.2 Studies of Heuristic Triangulation Algorithms Analyses in previous chapters show that both the time and space complexi-ties of exact inference algorithms are exponential in the induced width of a given elimination ordering, or equivalently, exponential in the width of a tree decomposition of the original CBI problem. Ideally, the complexity will reach minimum if we can find the treewidth of the primal graph of the given CBI problem. However, computing treewidth is known as A/"P-hard • [2]. To get a near-optimal tree decomposition, we sometimes resort to heuristic triangulation algorithms. Basically, heuristic triangulation algorithms produce an elimination ordering of variables, which leads to an upper bound of the induced width or the treewidth. In this chapter, we study several heuristic triangulation algorithms, which in-clude Minimum Induced Width (MIW), Minimum Induced Fill-in (MIF), Max-imum Cardinality (MC), Perfect Lexicographic BFS (LEXP) , and Minimum Lexicographic BFS (LEXM) . Details of these algorithms are listed in Appendix Chapter 7. GCBIJ Applications: Case Studies 84 Table 7,1: The Basic Information of Test Suites Fields Adopted From No. Problem Name |V| \£\ Bayesian Networks Bayesian Net-work Repository [23] 1 Diagnosis 37 65 2 Insurance 27 70 3 Pigs 441 806 4 Link 724 1738 CSP Col-oring DIMACS Dataset [20] 5 fpsol2.i 496 11654 6 mulsol.i 197 3925 7 zeroin.i 211 4100 . SAT SATLIB [26] 8 uf20 20 147 9 flatlOO 300 1017 10 BlocksWorld 116 777 CSP CSPGenerator of GCBIJ 11 Random Binary CSP 35 70 12 Scale-Free Binary CSP 100 400 13 Small World CSP 100 400 A. To compare the results of different algorithms, we studied the triangula-tion algorithm with a random ordering. Table 7.2 shows the upper bounds of treewidth for various CBI problems. Here wmi„ is the best reported result so far, adopted from [24]. The number in the parentheses is the maximum separa-tor size of the generated junction tree. For heuristic search algorithms (LEXP, L E X M , MC) with a random initial vertex, the upper bound of treewidth is averaged from 0.1 x |V| runs, each with a random initial vertex. For straightforward comparisons, we normalize the upper bounds of treewidth from various heuristics by the best reported results. The normalized upper bounds are shown in Figure 7.1. According to it, the Minimum Induced Fi l l -in heuristic results in the best approximation of treewidth among all of these heuristics. The second best is the Minimum Induced Width heuristic, which outperforms to the Maximum Cardinality heuristic in most problems. We also notice that heuristics are sensitive in different problems. Problems No. 3 (Pigs), No. 4 (Link), and No. 7 (zeroin.i) are hard to approximate by many heuristic techniques. Following this experiment, we use the Minimum Induced Fill-in as the default heuristic triangulation algorithm in GCBIJ , though users can specify other heuristics by changing the default parameters. Further information and discussion about heuristic triangulation algorithms and their empirical evaluations can be found in [33]. Table 7.2: Upper Bounds of Treewidth for Graphs, Different Heuristic Searches Problem R A N D MIW L E X P L E X M M C MIF Diagnosis 5 13(8) 6(4) 5(3) 5(3) 5(3) 5(3) Insurance 8 14(13) 8(6) 10(8) 10(8) 9(6) 8(7) Pigs 10 137(131) 17(14) 26(26) 31(30) 21(20) 11(9) Link 13 254(250) 36(29) 68(66) 71(70) 121(120) 16(14) fpsol2.i 66 247(202) 71(66) 72(67) 79(73) 72(68) 67(66) mulsol.i 50 130(121) 51(50) 71(64) 77(73) 62(45) 51(50) zeroin.i 50 110(106) 51(50) 97(96) 102(99) 63(48) 51(49) uf20 17 18(17) 17(16) 17(16) 18(16) 17(16) 17(14) flatlOO 100 175(173) 107(105) 120(119) 126(124) 114(111) 101(89) BlocksWorld 49 80(77) 74(72) 50(46) 53(50) 49(48) 52(44) Rnd. Binary CSP 9 9(8) 9(8) 9(8) 9(8) 9(8) 9(8) Chapter 7. GCBIJ Applications: Case Studies 86 I I I ! • 1 1 -m- MIF • « • MIW ' i i ; i ~ MC ' t ' i < t i i \ . . . [ . / , A j • . . . » . . . " , . . . ; ( \ \\ ! 4; >.'..; / ' .« »i / ' \ l » . , . . • : , . ••. \ . „ < 2 ! k 1 1 1 1 5 6' 7 No. of Problems Figure 7.1: Upper Bounds of Treewidth, Based on Various Heuristics 7.3 E m p i r i c a l S tud ies of E x a c t Inference A l g o r i t h m s 7.3.1 Exact Inference Algori thms wi th Invertible Semiring In this experiment, we solve the probabi l i ty inference prob lem in G C B I J , by specifying the regular sum-product semiring (Semiring No. 6 in Table 2.2). It is obvious that the sum-product semiring has the combination (product) invertible property. So, besides generalized V E and JT algorithms, both JT algorithms in LS and HUGIN architectures can be used to tackle inference tasks. The probability inference problem to be tested in this experiment is the Diagnosis network (cf. Figure 7.2, which is taken from CIspace [41]). The Diagnosis problem has 37 variables. The primal graph, the moralized graph of the original Bayesian network, has 65 edges. To test the performances of various generalized exact inference algorithms (GVE, GJT , GJT-LS and GJT-HUGIN) in GCBIJ , we query the marginal prob-ability for every variable (37 queries in total). Validated by CIspace, all gen-eralized exact inference algorithms return correct results for all queries. The running time and the number of binary operations used are shown in Table 7.3. Results here empirically support the theoretical analyses conveyed by Table 4.2 and Table 4.3 in Chapter 4. We can conclude that: (1) JT algorithms are Figure 7.2: Bayesian Network of the Diagnosis Problem Chapter 7. GCBIJ Applications: Case Studies 88 Table 7.3: Running Times of Exact Inference Algorithms with the Combina-tion Invertible Property for the Diagnosis Problem. (37 Independent Queries) G V E G V E ( perquery) G J T GJT-LS GJT-HUGIN Run. Time (ms) 60068 1623 7922 4307 4217 ^Operation x 77946 2107 6768 2881 2881 #Operation + 43910 1187 3694 3694 3694 ^Operation / 0 0 0 966 250 ideal for multiple queries and V E algorithms are ideal for single queries; (2) for CBI problems defined on invertible semirings, both LS and HUGIN architectures are superior to the generalized JT algorithm in the Shenoy-Shafter architecture; and (3) the HUGIN architecture is slightly better than the LS architecture in using fewer / operations. 7.3.2 Exact Inference Algori thms wi th Idempotent Semiring In this experiment, we solve the inference task (decision problem) of a ran-dom binary C S P based on GCBIJ , by specifying the logical or-and semiring (Semiring No. 1 in Table 2.2). It is obvious that the or-and semiring has the combination (A) idempotent property. So besides generalized V E and JT algorithms, the idempotent JT algorithm (Algorithm 4.6) can also solve this inference task. The test problem in this experiment is a random generated binary CSP with 35 variables and 70 constraints. The domain size of each variable is 3. The forbidden rate for each constraint is set at 0.5, in other words, each value assignment of a constraint has a probability 0.5 of not being allowable. In GCBIJ we compose a binary CSP generator to produce binary CSPs in CIspace's CSPIF format. To test the performances of various generalized exact inference algorithms (GVE, GJT , GJT-Idemp) in GCBIJ , we query the marginal constraint on every variable (35 queries in total). The marginal constraint of each variable indicates valid values for this variable. Validated by CIspace, all generalized exact infer-ence algorithms return correct results for all queries. The running time and the number of binary operations used are shown in Table 7.4. These results empirically support the theoretical analysis conveyed by Table 4.2 and Table 4.3 in Chapter 4. We can conclude that: (1) JT algorithms are ideal for multiple queries. For CSPs, the query for a variable returns valid values for that variable; (2) for a CBI problem defined on idempotent semirings, the idempotent JT algorithm is superior to the generalized JT algorithm. Chapter 7. GCBIJ Applications: Case Studies 89 Table 7.4: Running Times of Exact Inference Algorithms with the Combination Idempotent Property for a Random Binary CSP. (35 Independent Queries) G V E G V E (per query) G J T GJT-Idemp Running Time (ms) 182945 5227 117749 41360 #Operation A 2428510 69386 1204947 416556 ^Operation V 1407875 40225 243093 306022 7.3.3 Exact Inference Algorithms for General CBI Problems The last problem to test for exact inference algorithms in GCBIJ is the M a x C S P for previous random binary CSP (35 variables and 70 constraints). Ac-cording to the solution of the inference task, the test problem is satisfiable. So the inference result of the corresponding Max CSP should be 70. Both gen-eralized V E and JT algorithms return this correct result. See Table 7.5 for details. We also conclude the maximum number of satisfiable constraints for each variable, given different domain values (cf. Table 7.6 ). From Table 7.5 we can clearly see that the V E algorithm is ideal for sin-gle queries. Compared to corresponding columns in Table 7.4, we notice that different semirings cause different running times even when we use the same generalized exact inference algorithm. 7.4 E m p i r i c a l S tud ies of A p p r o x i m a t e Inference A l g o r i t h m s 7.4.1 Approximate V E and J T for CSPs In this experiment, we apply both the generalized approximate variable elimi-nation (GVE-Approx, Algorithm 5.2) and the approximate junction tree (GJT-Approx, Algorithm 5.3) to solve a random binary CSP with 35 variables and 70 constraints with GCBIJ . The exact inference results of the problem was given in previous sections. We query the valid values for each variable. Each valid value corresponds to its Table 7.5: Running Times of General Exact Inference Algorithms for a Max CSP (Single Query) G V E G J T Running Time (ms) 6670 274975 ^Operation + 70761 1204947 ^Operation max 37801 237737 Chapter 7. GCBIJ Applications: Case Studies 90 Table 7.6: The Maximum Number of Satisfiable Constraints Given Different Domain Values for a Random Binary CSP with 35 Variables and 70 Constraints Variable DI D2 D3 V I 70 64 63 V2 70 66 67 V3 70 66 66 V4 70 63 69 V5 70 64 70 V6 70 67 70 V7 70 67 70 V8 70 68 70 V9 70 64 70 V10 70 63 70 V l l 70 63 69 V12 70 67 69 V13 70 68 70 V14 70 67 70 V15 70 64 69 V16 70 67 69 V17 70 68 67 V18 70 70 70 V19 70 70 70 V20 70 69 69 V21 70 68 67 V22 70 68 68 V23 70 70 70 V24 70 68 69 V25 70 70 70 V26 70 68 68 V27 70 68 '68 V28 70 68 70 V29 70 69 69 V30 70 69 69 ' V31 70 68 69 V32 70 70 70 V33 70 68 69 V34 70 67 70 V35 70 69 70 Chapter 7. GCBIJ Applications: Case Studies 91 appearance in a valid assignment. If there is no valid value for a variable, the problem is not satisfiable. Both GVE-Approx and GJT-Approx return the approximate marginal of each variable. Compared to exact results, all the errors made by approximate algorithms are false-positive. In other words, only introduced errors falsely report some invalid values as valid. In general, both GVE-Approx and GJT-Approx trade time and accuracy to save the space of the corresponding exact inference algorithms. Two parameters control the performance of approximate inference algo-rithms. The maximum width w is the upper bound of the acceptable width during computing, which controls the accuracy of the approximation. A larger acceptable width, at the same time, will take more space. If w is the same as or larger than the treewidth or induced width of the given CBI problem, the approximate V E and JT algorithms return exact results. Another parameter b, the number of buckets to partition constraints, controls the speed of the ap-proximation. A larger b will reduce the running time. However, more errors will be introduced at the same time. We can change the settings of w and d to achieve balance between accuracy and speed. Figure 7.3 shows the accuracy and running time of the approximate V E algo-rithm for a random binary CSP under different parameter settings. Analogous results for the approximate JT algorithm are shown in Figure 7.4. A B Bounded Width: w Bounded Width: w Figure 7.3: Accuracy and Running Time of the Approx. V E for a CSP Chapter 7. GCBIJ Applications: Case Studies 92 A B Bounded Width: w Bounded Width: w Figure 7.4: Accuracy and Running Time of the Approx. JT for a CSP 7.4.2 Approximate V E and J T for Probabil i ty Inference The approximate V E and JT algorithms in GCBIJ can be applied to solve prob-ability inference problems without any modifications. The only thing needed is to specify a regular sum-product semiring to replace the logical or-and semiring for CSPs. In this experiment, we use approximate V E and JT algorithms to solve the Insurance problem from B N Repository [23]. The Insurance problem contains 27 variables and 27 conditional probability tables (CPTs). Figure 7.5 shows the Bayesian network of the Insurance problem. After moralization, we have 27 vertices and 70 edges in GCBIJ 's internal primal graph representation. In each run of the experiment, we randomly choose 4 variables as observed. Observed values are chose randomly as well. Then we compute the marginal of the rest variables. For both GVE-Approx and GJT-Approx, we fix the number of buckets at b = 2. The maximum width w changes from 4 to 7 (treewidth of the Insurance problem is 8). The correspondence between the exact and approximate marginal for the Insurance problem are shown in Figure 7.6 and Figure 7.7, corresponding to approximate V E and JT algorithms respectively. We find that the accuracy of the approximate V E and JT algorithms are increase with the maximum width w, the same as the results for solving CSPs. Chapter 7. GCBIJ Applications: Case Studies 93 Chapter 7. GCBIJ Applications: Case Studies 94 Figure 7.6: Marginal Correspondence of the Approx. V E for a Probability In-ference Problem. Chapter 7. GCBIJ Applications: Case Studies 95 w = 4;b = 2 w = 5;b = 2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Exact Marginal Exact Marginal Figure 7.7: Marginal Correspondence of the Approx. JT for a Probability In-ference Problem. 7.5 Loopy Message Propagation As Exact Inference Algorithm As discussed in Chapter 5, loopy message propagation (Algorithm 5.7) is in general an approximate inference algorithm for CBI problems. However, when a CBI problem is defined on semirings with the combination idempotency prop-erty, loopy message propagation returns exact results within at most d iterations, where d is the diameter of the corresponding dual graph. In this section, we apply loopy message propagation algorithm to solve a random binary CSP. The random binary CSP is the same as the one used in Section 7.3.2, which has 35 variables with 3 domain values each and 70 constraints with the forbidden rate at 0.5. In the experiment, we examine the different upper bounds of the maximum cluster size in the join graph. We gradually increase the maximum allowable iterations. Within the allowable iterations, the program exits when all messages are unchanged (stable), returning the actual number of iterations. Chapter 7. GCBIJ Applications: Case Studies 96 Figure 7.8: Accuracy (A) and Time Complexity (B)of loopy Message Propaga-tion for a CSP Figure 7.8(A) and Figure 7.8(B) report the accuracy and time complexity (in terms of binary operations) respectively. Just like using the approximate V E and JT algorithms for CSPs, loopy message propagation returns only false-positives (in other words, reporting some invalid values as valid). Errors are reduced when the iterations of the message propagation increase. Eventually all messages are stable and loopy message propagation returns exact inference results. Roughly speaking, a junction graph with a larger cluster size consumes more binary operations (combination and marginalization) for random CSPs. 7.6 Impact of Small World Topology 7.6.1 Small Wor ld Topology of C B I problems The Small World topology [59] appears widely in social graphs (e.g., the col-laboration graph of actors in feature files), biological graphs (e.g., the neural network of the nematode worm C.elegans), and man-made graphs (e.g., the electrical power grid of the western United States). In a graph with the Small World topology, vertices are highly clustered, yet the path length between them is small. In this thesis, we are interested in whether or not such a topology exists in primal graph representations of real world CBI problems and how this characteristic impacts the performance of inference algorithms. Chapter 7. GCBIJ Applications: Case Studies 97 6 0.8 \ 5 10 15 20 25 30 35 Path Length; 1 40 45 50 • -0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Clustering Coefficient: c 1 1 r-0.8 0.9 Ill ' ' j g 0.5 • I D 0I JS— — , , , , 1 0 10 20 30 40 50 60 70 Degree: d Figure 7.9: Small World Parameter Distributions for the Pigs Network The Small World characteristic parameters for various CBI problems, includ-ing characteristic path length L, clustering coefficient C, and normalized scalar H (cf. Chapter 2), are computed and shown in Table 7.7, and compared to some practical networks [59]. For comparison, we also generate three scale-free networks based on Algorithm 2.3. Al l of them exhibit characteristic of small world topology. That observation shows that the scale-free network is probably a subset of small world topology, though no formal proof is available so far. In this table, we are particularly interested in the Pigs and the Link net-works, which are taken from a real-world application dataset. Figure 7.9 and Figure 7.10 show the characteristic parameter distributions of these two prob-lems respectively. In both of the Pigs and the Link networks, characteristic path lengths are relatively small, which implies that the message from one cluster will quickly reach another cluster in the junction graph. Although degree distributions show that most vertices in the graph have small degrees, some large degrees, in ad-dition to a large amount of medium clustering coefficients, imply that exact inference algorithms are infeasible. Combining these observations, loopy mes-sage propagation is probably an efficient approach to coping with these two CBI problems. Table 7.7: Parameters of the Small World Topology for Graphs Fields Problem |V| \£\ L Lrand C Grand M Practical Fi lm Actors 225,226 6,869393 3.65 2.99 0.79 0.00027 2396 Power Grid 4942 6819 18.7 12.4 0.08 0.005 10.61 C.elegans 282 ' 1974 2.65 2.25 0.28 0.05 4.755 Bayesian Networks Diagnosis 37 65 3.82 2.92 0.79 0.56 1.08 Insurance 27 70 2.23 2.12 0.70 0.48 1.37 Pigs 441 806 4.97 4.77 0.79 0.50 1.56 Link 724 1738 6.37 4.35 0.68 0.40 1.17 Coloring fpsol2.i 496 11654 1.69 1.92 0.45 0.13 3.83 mulsol.i 197 3925 1.81 1.80 0.59 0.24 2.42 zeroin.i 211 4100 1.66 1.82 0.48 0.23 2.33 SAT Unf20 20 147 1.22 1.22 0.80 0.80 1.00 FlatlOO 300 1017 3.72 3.18 0.33 0.31 0.92 BlocksWorld 116 777 2.56 2.07 0.52 0.25 1.72 Scale-Free sflOO 100 400 1.92 2.42 0.88 0.08 13.87 Link-Like 724 1738 2.56 4.37 0.388 0.049 13.66 C.elegans-Like 282 1974 1.95 2.42 0.926 0.049 23.58 Chapter 7. GCBIJ Applications: Case Studies 99 Figure 7.10: Small World Parameter Distributions for the Link Network 7.6.2 Inference in the Small Wor ld To analyze the impacts of the small world topology for CBI problems, we gener-ates a series of binary CSPs with different re-written probabilities, according to [59]. In general, a small re-written probability corresponds to ordered graphs, whereas a large re-written probability corresponds to random graphs. Visually small world graphs are those between ordered and random graphs. Figure 7.11 is adopted from [59]. It illustrates the relations between small world parameters and the re-written probabilities of graphs with 100 vertices and 400 edges. In this section, we use loopy message propagation to perform inference tasks for binary CSPs with the Small World topology. These CSPs are generated based on Algorithm 2.2. Each of them has 100 binary variables and 400 bi-nary constraints, with a 0.5 forbidden rate. The re-written probabilities range from 0 to 1. We collect the number of binary operations under each re-written probability (with 5 instances). The results are generalized in Figure 7.12. Compared to Figure 7.11, we find that loopy message propagation works well in CSPs with the small world topology. A relatively small characteristic path length implies quick convergence. A relatively large clustering suggests that we can resort to a large maximum cluster size in loopy message propagation if the computation power permits. The results also suggest that loopy message propagation is a suitable inference approach for CSPs with ordered graphical representations, but not suitable for random CSPs. Chapter 7. GCBIJ Applications: Case Studies 100 • C(p)/C(0) 0.8 0.6 h 0.4 , • 0 0.0001 0.001 0.01 0.1 p Figure 7.11: Characteristic Path Length L(p) and Clustering Coefficient C(p) for Different Re-written Probabilities (Figure is taken from [59])' O Bounded Widlh: w = 3 x Bounded Width:w = 4 • Bounded Width: w = 5 10"* 10" J 10" J 10"' 10° Rewriting Probability: p Figure 7.12: Time Complexity of loopy M P in the Small World Chapter 7. GCBIJ Applications: Case Studies 101 7.7 Summary Based on the Generalized Constraint-Based Inference toolkit in Java (GCBIJ), this chapter presents a series of experiments of CBI problems from different disciplines. The results of these experiments are not totally new to research communities. However, these results as a whole verify the feasibility of using semirings to generalize representations of various CBI problems. Generalized ex-act and approximate inference algorithms, based on the proposed semiring-based unified framework, are proven to be suitable to apply in concrete application domains. Case studies also show that GCBIJ is a flexible toolkit of constraint-based in-ference research. Various exact and approximate inference algorithms can solve different CBI problems when different semirings within GCBIJ are specified. New algorithms can be added to the toolkit by either generalizing existing con-crete approaches or designing new approaches on the abstract level. Although more optimization and implementation work is required, the extendibility and flexibility of GCBIJ make it a suitable toolkit for both CBI research and appli-cations. Chapter 8. Conclusions and Future Work 102 Chapter 8 C o n c l u s i o n s a n d F u t u r e W o r k 8.1 Conclusions As the first contribution of this paper, we propose a semiring-based unified framework, a single formal representation framework with a broader coverage of the constraint-based inference problem space based on the synthesis of ex-isting generalized frameworks. Our framework explicitly subsumes and unifies many concrete CBI problems, such as probabilistic inference, decision making under uncertainty, constraint satisfaction problems, propositional satisfiability, decoding problems, and possibility inference, in an abstract representation. The unified framework is also a single formal algorithmic framework that pro-vides a broader coverage of both exact and approximate inference algorithms. This is the second contribution of this paper. We unify various widely used in-ference algorithms, such as the exact and approximate variable elimination algo-rithms, the exact and approximate junction tree algorithms, and loopy message propagation, based on the framework. Many of these algorithms depend only on the basic properties of the commutative semirings. Based on other special properties of different commutative semirings, we also generalize the variants of these algorithms that arise in different application scenarios. Abstract representations of CBI problems, as well as abstract inference al-gorithms, provide several opportunities for researchers from various fields: (1) they can study the most important common characteristics of various CBI prob-lems without representation barriers; (2) they can analyze and compare differ-ent inference approaches; (3) they can borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) implementations at the abstract level significantly reduce the amount of work targetted previously at the individual problems. In other words, researchers from different fields may reinterpret many familiar approaches in their domains at a higher level. The algorithm discussions and the complexity analyses in this thesis are examples of applying the abstract knowledge to the concrete application domains. The unified representation for CBI problems and inference algorithms is not, of course, a totally novel idea. Much research has been conducted in various disciplines through using different tools or notions. Here we have significantly broadened the scope of the problems and the coverage of the algorithms. The Chapter 8. Conclusions and Future Work 103 final contribution of this paper is a software toolkit, the Generalized Constraint-Base Inference Toolkit in Java (GCBIJ). GCBIJ is the first concrete toolkit to implement ideas for unifying the representations of CBI problems using semir-ings and to implement various inference algorithms on the abstract level. The generalization and extensibility of GCBIJ make it a good platform for both CBI research and practical problem solving. 8.2 Possible Improvements to Inference Approaches Here we briefly discuss some possible improvements to generalized inference algorithms for CBI problems. These preliminary discussions are not complete but suggest some interesting research topics for the future. 8.2.1 F r o m Var i ab l e s to Va lues The inference approaches generalized in Chapter 4 and Chapter 5 do not depend on possible values or the domains of variables in CBI representations. If values are taken into account, we can improve these approaches by manipulating values. As a restriction of our proposed semiring-based unified framework, the do-main of a variable is discrete. Considering Values in Heuristic Search Heuristic searches are used in both V E and JT algorithms to find a near-optimal elimination ordering or tree decomposition before performing the inference. The most straightforward improvement to heuristic-search-based approaches is to consider the actual domain size of the variables instead of treating them uni-formly. For example, we use heuristic searches to find the upper bounds of the treewidth or the induced width. For either Minimum Induced Fill-in or Mini-mum Induced Width heuristics, we break ties by randomly choosing a vertex. Alternatively we can use the domain size as the weight of a vertex (variable) to break ties with largest weight first. In approximate inference algorithms, the key idea of the approximation is to restrict the size of the maximum sub-problem to a tractable level. In our framework, the sub-problem size is measured by the cluster size or the number of variables participating in the sub-problem. More specifically, the sub-problem size could be measured as the product of the domain sizes of all participating variables. Using such a measurement as a heuristic function or an upper bound of approximation procedures will bring computational benefits such as giving a more accurate approximation of the treewidth or introducing less approximation errors. At the same time, using values in heuristics will increase computational costs. Chapter 8. Conclusions and Future Work 104 Instantiating and Domain Splitting In general, when a vertex of a graph is removed, the graph's structure will be simplified. The treewidth of a simplified graph will possibly be reduced. As shown in previous chapters, the time and space complexities of exact inference algorithms in this thesis are proven to be exponential in the width of a tree decomposition of the corresponding primal graph representation. One way to remove a vertex is to assign a value to its corresponding variable. These naive observations suggest that instantiating variables will bring computational ben-efits if carefully designed. For the completeness of inference, we need to store instantiations of a variable for further use. Once the inference of a conditioning value is finished, we may need to backtrack to other instantiated values. Basically, it is a backtracking process. The D P L L algorithm [12] of SAT is a typical example of implementing such an idea. Given a normal inference approach, when to start instantiating, which vari-able to be instantiated, and what value to assign are the key problems of in-stantiation. A related improvement is to split the domain of a variable. Instead of as-signing one value to a variable, we could assign a subset of the original domain values to the variable. For context-specific constraint representations, a vari-able with splitting domains will simplify the graph since some edges are not presented in the current context. Domain Shrinking The last possible improvement to inference approaches we would like to mention is domain shrinking. This idea comes from the Arc Consistency of CSPs. Given a constraint, the value of a participating variable can be discarded once all the assignments induced by this value are false. After checking out all-false values for a variable, we shrink its domain size. The reduced domain size leads to computational benefits for regular inference approaches. The same idea can be generalized to abstract CBI problems. In CSPs, false can be used to detect shrinking values since false is the absorbing element of the combination operation (logical AND) in the OR-AND semiring. The absorbing element of the combination operation means that it is a trap for further combinations. At the same time, false is also the identity element of the marginalization operation (logical OR), which means that no more new information is contained. Based on the previous observations, we can apply domain shrinking to prob-ability inference problems. Since zero is both the absorbing element of the combination (product) and the identity element of the marginalization (plus), we can remove a value from a variable domain if all the assignments in which this value participates are mapped to zero. If approximation is permitted, near zero values with a user-specified threshold can be used to detect the values to remove. Chapter 8. Conclusions and Future Work 105 Table 8.1: Detecting Elements of Some Commutative Semirings No. R © ® Detecting Element 1 {true, false} V A false 2 [0,1] or [0, oo) max min 0 3 [0,1] max X 0 4 (-oo,0] max + N / A 5 [O.oo) max + N / A 6 [0,oo) + X 0 7 [0,oo) max X 0 8 (—oo, oo) min + oo 9 ( -oo , 0] min X 0 In general, given a semiring R = (S, ©, ®), a e R can be used to detect values to remove from the. domain if the following two statements hold at the same time: (1) a is the absorbing element of ®; (2) a is the identity element of ©. We call a the detecting element of the semiring R, Table 8.1 concludes the detecting elements of the some semirings. 8.2.2 From Systematic to Stochastic Basically, all the inference algorithms generalized in our framework are system-atic approaches to CBI problems. In concrete application domains, stochastic approaches, such as sampling techniques in probability inference and stochastic local searches in SAT, are very successful. Given the fact that underlying rep-resentations of these problems are highly analogous, an abstract representation of stochastic or hybrid (stochastic and systematic) inference approaches should be a reasonable goal for future research. 8.3 Future Work In Section 8.2, we briefly discussed some possible improvements to existing infer-ence approaches. Some of these improvements have already achieved success in many fields, which suggests generalizing the characteristics of these approaches, adding them into our semiring-based unified framework, and improving the de-sign of other concrete approaches. One limitation of our framework is that the domain of variables must be discrete. However, both the semiring itself and the combination and marginal-ization operations do not impose any restriction on variable types. It would definitely exciting to extend our framework to tackle continuous variables in the future. In this thesis, we propose some naive inference algorithms for CBI problems, which can be abstracted with a fc-semiring. For k = 2, we show that the decision making under uncertainty is one example. For k > 2, the application of k-Chapter 8. Conclusions and Future Work 106 semiring is far from clear. If we have some input from experts in other fields to support the usefulness of fc-semirings, we would like to investigate more efficient inference algorithms for fc-semirings. As an open and extendable toolkit, the implementation and optimization of GCBIJ is far from complete. To maintain generality, we sacrificed some running efficiency in our current implementation. To attract people to using this toolkit, more comprehensive documents and clear interfaces are needed. In general, much coding and documenting work for GCBIJ remains. Bibliography 107 B i b l i o g r a p h y S. M. Aji and R. J . McEliece. The generalized distributive law. IEEE Transactions on Information Theory, 46:325-343, 2000. Stefan Arnborg, D. G. Corneil, and A. Proskurowski. Complexity of finding embeddings in a fc-tree. SIAM J. Algebraic and Discrete Methods, 8:277-284, 1987. Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509-512, October 1999. Umberto Bertele and Francesco Brioschi. Nonserial Dynamic Program-ming. Academic Press, Inc., 1972. ISBN 0120934507. Stefano Bistarelli. Semirings for Soft Constraint Solving and Programming. Springer-Verlag, 2004. Stefano Bistarelli, Ugo Montanari, and Francesca Rossi. Semiring-based constraint satisfaction and optimization. J. ACM, 44(2):201-236, 1997. ISSN 0004-5411. Hans L. Bodlaender. A tourist guide through treewidth. Acta Cybernetica, 11:1-21, 1993. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / bodlaender93tourist .html. Hans L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theor. Comput. Sci., 209(1-2): 1-45, 1998. ISSN 0304-3975. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduc-tion to Algorithms. MIT Press/McGraw-Hill, 1990. Fabio Cozman. The interchange format for bayesian networks: Bif, 1996. U R L http://www-2.cs.cmu.edu/~fgcozman/Research/ InterchangeFormat/Old/xmlbif02.html. Fabio Cozman. The interchange format for bayesian networks: Xmlbif, 1998. U R L http://www-2.cs.cmu.edu/afs/cs/user/fgcozman/www/ Research/InterchangeFormat/. Martin Davis, George Logemann, and Donald Loveland. A machine pro-gram for theorem-proving. Commun. ACM, 5(7):394-397, 1962. ISSN 0001-0782. Bibliography 108 [13] Thomas Dean and Keiji Kanazawa. A model for reasoning about persis-tence and causation. Comput. Inteli, 5(3): 142-150, 1990. ISSN 0824-7935. [14] Rina Dechter. Bucket elimination: A unifying framework for probabilistic inference. In 12th Conf. on Uncertainty in Artificial Intelligence, pages 211— 219, 1996. U R L h t tp : / / c i teseer . i s t .psu .edu /dech te r96bucke t .h tml . [15] Rina Dechter. Constraint Processing. Morgan Kaufmann, 2003. [16] Rina Dechter and Judea Pearl. Network-based heuristics for constraint satisfaction problems. Artificial Intelligence, 34:1-38, 1987. [17] Rina Dechter and Judea Pearl. Tree clustering for constraint networks (research note). Artif. Inteli, 38(3):353-366, 1989. ISSN 0004-3702. [18] Rina Dechter and Irina Rish. Directional resolution: The Davis-Putnam procedure, revisited. In Principles of Knowledge Representation and Reasoning, pages 134-145. 1994. U R L h t t p : / / c i t e s e e r . i s t . p s u . e d u / a r t i c le /dech te r94d i rec t iona l .h tm l . [19] Rina Dechter and Irina Rish. A scheme for approximating probabilistic inference. In Uncertainty in Artificial Intelligence (UAI97), pages 22-44, 1997. [20] Rutgers University DIMACS Center. Dimacs. U R L h t tp : / /d imacs . ru tgers .edu/ . [21] Amir Eyal. Efficient approximation for triangulation of minimum treewidth. In Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01), pages 7-15, San Francisco, C A , 2001. Morgan Kaufmann Publishers. [22] Eugene C. Freuder. Complexity of k-tree structured constraint satisfaction problems. In AAAI, pages 4-9, 1990. [23] N. Friedman, M. Goldszmidt, D. Heckerman, and S. Russell. Bayesian network repository. U R L h t tp : / /www.cs .hu j i . ac . i l / l abs /compb io / Reposi tory/ . [24] Vibhav Gogate and Rina Dechter. A complete anytime algorithm for treewidth. UAI04, 2004. U R L http://www. i c s .uc i . e d u / ~ c s p / r l l 2 .pdf. [25] C. Guestrin, D. Roller, and R. Parr. Multiagent planning with factored mdps. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Ad-vances in Neural Information Processing Systems 14, pages 1523-1530, Cambridge, M A , 2002. MIT Press. [26] Holger H. Hoos and Thomas Sttzle. Satlib: An online resource for research on sat. In I. P. Gent, H. v. Maaren, and T. Walsh, editors, SAT2000, pages 283 - 292. IOS Press, 2000. U R L ht tp : / /www.sat l ib .org . Bibliography 109 [27] R. Howard and J . Matheson. Influence diagrams. In R. Howard and J . Matheson, editors, Readings on the Principles and Applications of Deci-sion Analysis, volume II, pages 721-762. Strategic Decisions Group, Menlo Park, C A , 1981. [28] Will iam H. Hsu. Bayesian networks formats convertor. U R L http: //www.kddresearch.org/KDD/Groups/Probabilistic-Reasoning/ convertor.html. [29] F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in causal probabilistic networks by local computations. Computational Statis-tics Quarterly, 4:269-282, 1990. [30] Frank Jensen, Finn V. Jensen, and S0ren L. Dittmer. From influence dia-grams to junction trees. In Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-94), pages 367-373, San Fran-cisco, C A , 1994. Morgan Kaufmann Publishers. [31] Jesus M Salvo Jr. Openjgraph - Java graph and graph drawing project, 2002. U R L http://openjgraph.sourceforge.net/. [32] Kalev Kask, Rina Dechter, and Javier Larrosa. Unifying cluster-tree de-compositions for automated reasoning. Submitted to the AIJ, June 2003. [33] Uffe Kjasrulff. Aspects of Efficiency Improvement in Bayesian Networks. PhD thesis, Aalborg University, 1993. [34] Uffe Kjaerulff. Reduction of computational complexity in bayesian net-works through removal of weak dependences. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence,, pages 374 - 382, San Francisco, C A , 1994. Morgan Kaufmann Publishers. [35] J . Kohlas and P.P. Shenoy. Computation in valuation algebras. In Handbook of Defeasible Reasoning and Uncertainty Management Systems, Volume 5: Algorithms for Uncertainty and Defeasible Reasoning, pages 5-40. Kluwer, Dordrecht, 2000. [36] Jelle R. Kok and Nikos Vlassis! Distributed decision making of robotic agents. In S. Vassiliadis, L .M.J. Florack, J .W.J. Heijnsdijk, and A. van der Steen, editors, Proc. 8th Ann. Conf. of the Advanced School for Computing and Imaging, Heijen, The Netherlands, pages 318-325. Advanced School for Computing and Imaging (ASCI), June 2003. [37] A. M. C. A. Koster, H. L. Bodlaender, and C. P. M. van Hoesel. Treewidth: Computational experiments. Technical Report 01-38, ZIB, Berlin, Ger-many, 2001. U R L http://www.zib.de/PaperWeb/abstracts/ZR-01-38/. [38] Kschischang, Frey, and Loeliger. Factor graphs and the sum-product algo-rithm. IEEE Transactions on Information Theory, 47:498-519, 2001. U R L http://citeseer.ist.psu.edu / a r t i c le/frey98factor.html. Bibliography 110 [39] S. L. Lauritzen and D. J . Spiegelhalter. Local computations with prob-abilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157-224, 1988. [40] A .K . Mackworth. Constraint satisfaction. In S.C. Shapiro, editor, Encyclo-pedia of Artificial Intelligence, volume 1, pages 285-293. Wiley Interscience, 1992. [41] Alan Mackworth and David Poole. CIspace: Tools for learning computa-tional intelligence. Laboratory for Computational Intelligence, Computer Science, University of British Columbia. U R L ht tp : / /www .cs .ubc.ca/ l abs / l c i /C Ispace / i ndex .h tm l . [42] Alan K. Mackworth. Consistency in networks of relations. Artificial Intel-ligence, 8:99-118, 1977. [43] Robert Mateescu, Rina Dechter, and Kalev Kask. Tree approximation for belief updating. In Eighteenth national conference on Artificial intelligence, pages 553-559. AAA I , 2002. ISBN 0-262-51129-0. [44] R. J . McEliece, D. J . C. MacKay, and J.-F. Cheng. Turbo decoding as an instance of Pearl's belief propagation algorithm. Journal on Selected Areas in Communications, 16(2):150-161, Feb 1998. [45] Kevin P. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, 2002. [46] Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propaga-tion for approximate inference: An empirical study. In Conf. on UAI, pages 467-475, 1999. U R L ht tp : / /c i teseer . is t .psu.edu/murphy991oopy. html. [47] Richard E. Neapolitan. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. John Wiley and Sons, New York, NY , 1990. [48] Mark A. Paskin. Thin junction tree filters for simultaneous localiza-tion and mapping. Computer Science Division Technical Report CSD-02-1198, University of California, Berkeley, September 2002. U R L h t tp : / / c i t e s e e r . i s t . p s u . e d u / a r t i c l e / p a s k i n 0 3 t h i n . h t m l . [49] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., 1988. ISBN 0-934613-73-7. [50] N. Robertson and P. Seymour. Graph minors. II. Algorithmic aspects of tree-width. Journal of Algorithms, 7(3):309-322, 1986. [51] Donald J . Rose, Robert Endre Tarjan, and George S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM J. Comput., 5(2):266-283, 1976. Bibliography 111 [52] Z. Ruttkay. Fuzzy constraint satisfaction. In Proceedings 3rd IEEE Inter-national Conference on Fuzzy Systems, pages 1263-1268, 1994. [53] Thomas Schiex, Helene Fargier, and Gerard Verfaillie. Valued constraint satisfaction problems: Hard and easy problems. In IJCAI95, pages 631-637, Montreal, 1995. U R L h t t p : / / c i t esee r . i s t . psu .edu /sch iex95va lued . html. [54] P. Seidel. A new method for solving constraint satisfaction problems. In Seventh national conference on Artificial intelligence, pages 338-342. AAA I , 1981. [55] P. P. Shenoy and G. Shafer. Axioms for probability and belief-function propagation. In Uncertainty in Artificial Intelligence 4, pages 169-198. North-Holland, 1990: [56] Robert Endre Tarjan and Mihalis Yannakakis. Simple linear-time algo-rithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput., 13(3):566-579, 1984. [57] A . J . Viterbi. Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Transactions in Information Theory, 13:260-269, 1967. [58] Toby Walsh. Search in a small world. In IJCAI, pages 1172-1177, 1999. U R L h t tp : / / c i teseer . i s t .psu .edu /wa lsh99search .h tml . [59] Duncan J . Watts and Steven H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684):440-442, June 4 1998. [60] Yair Weiss and William T. Freeman. Correctness of belief propaga-tion in gaussian graphical models of arbitrary topology. Neural Com-putation, 13(10):2173-2200, 2001. U R L h t t p : / / c i t e s e e r . i s t .psu. edu/ weiss99correctness.html. [61] L. A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy sets and systems, l( l):3-28, 1978. [62] Nevin Lianwen Zhang. A Computational Theory of Decision Networks. PhD thesis, University of British Columbia, 1994. [63] Nevin Lianwen Zhang. Probabilistic inference in influence diagrams. Com-putational Intelligence, 14:475-497, 1998. U R L h t t p : / / c i t e s e e r . i s t . psu.edu/zhang98probabi l is t ic .html . [64] Nevin Lianwen Zhang and David Poole. A simple approach to Bayesian network computations. In Proc. of the Tenth Canadian Conference on Artificial Intelligence, pages 171-178, 1994. Bibliography 112 [65] Ying Zhang and Alan K. Mackworth. Parallel and distributed finite con-straint satisfaction: Complexity, algorithms and experiments. In Parallel Processing for Artificial Intelligence, chapter 1, pages 305-334. Elsevier Science Publishers, 1994. Appendix A. Heuristics Search Algorithms for Triangulation 113 Appendix A Heuristics Search Algorithms for Triangulation A . l Minimal Width Heuristic Triangulation Algorithm A . l Minimal Width Heuristic Triangulation (MW) [47] Input: A connected graph Q = (V, £) Output: Triangulated graph Qt = (V,£ U £ t ) with w*{Q) < w*(Qt) and an elimination ordering a = (v\, • • •, vn) 1: St := 0 2: for v £ V do 3: factor(v) := [Neighbor(v)\ 4: M(v) := Neighbor(v) 5: end for 6: S:=V 7: for i = |V| to 1 do 8: u := arg min^gs factor(v) 9: a(i) := u 10-. for x,y £ M(u) and (xty) £ £ U £t) do 11: £t:=£tU{x,y) 1 2 : - M(x) := M(x) Uy and M(y) := M(y) Ua; 13: end for 14:. for u; £ M ( u ) do 15: M ( tu ) := M(w) \ u 16: end for 17: 5 : = 5 \ { u } 18: end for Appendix A. Heuristics Search Algorithms for Triangulation 114 A.2 Minimal Induced Width Heuristic Triangulation Algorithm A.2 Minimal Induced Width Heuristic Triangulation (MIW) [37] Input: A connected graph Q = (V,£) Output: Triangulated graph Qt = ( V , £ u £ t ) with w*{Q) < w*{Qt) and an elimination ordering a — (vi, • • •, vn) 1: St := 0 2: for v € V do 3: M(v) := Neighbor(v) 4: end for 5: S := V 6: for % = |V| to 1 do 7: u := argmin^gs \M(v)\ 8: a(i) := u 9: for x,y e M(u) and (x, y) <£ SU St) do 10: St := StU [x,y) 11: M(z ) := M(x) U y and M(y) := M(j/) Ux 12: end for 13: for w G M ( u ) do 14: M(w) := M(w) \ u 15: end for 16: S :=S\{u} 17: end for Appendix A. Heuristics Search Algorithms for Triangulation 115 A.3 Minimal Fill-in Heuristic Triangulation Algorithm A.3 Minimal Fill-in Heuristic Triangulation (MF) [47] Input: A connected graph Q = (V, £) Output: Triangulated graph Qt = {V,£u£t) with w*{Q) < w*(Gt) and an elimination ordering a = • • •, vn) 1: £t := 0 2. for v G V do 3: fillin(v) := 0 4: M(v) := Neighbor(v) 5: for each x, y G M(u) and (x, y) ^ f do 6: fillin(v) := fillin(v) + 1 7: end for 8: end for 9: 5 := V 10: for i = |V| to 1 do 11: it := argmin„ e s fillin(v) 12: <r(i) := u • 13: for x,y € M(u) and (a;, y) <£ £ U £t) do 14: f t := 5 £ U 15: M ( i ) := M(s) U y and M(i/) := JW(i/) U i 16: end for 17: for w G M(u) do 18: M(w) := M{w) \ u 19: end for 20: S:=S\{u} 21: end for Appendix A. Heuristics Search Algorithms for Triangulation 116 A.4 Minimal Induced Fill-in Heuristic Triangulation Algorithm A.4 Minimal Induced Fill-in Heuristic Triangulation (MIF) [37] Input: A connected graph Q = (V, £) Output: Triangulated graph Qt = (V,£u£t) with w"{Q) < w*{gt) and an elimination ordering a = (vi, • • •, vn) 1: St := 0 2: for v £ V do 3: fillin(v) := 0 4: M(v) := Neighbor(v) 5: for each x,y £ M(u) and ^ if do 6: fillin(v) := fillin(v) + 1 7: end for 8: end for 9: 5 : = V 10: for i = |V| to 1 do 11: u :— arg min„ e s fillin(v) 12: a(i) := u 13: for I , J 6 M(u) and (x, y) £ £ U £t) do 14: S t := £t U (a, j/) 15: M(x) := M(x) U 2/ and M(y) :- M(y) U x 16: end for 17: for W £ M(u) do 18: M(w) := M(w) \ u 19: fillin(v) = 0 20: for each x,y £ M(io) and [x,y) £ £ LI £t) do 21: fillin(v) := fillin(v) + 1 22: end for 23: end for 24: S:=S\{u} 25: end for Appendix A. Heuristics Search Algorithms for Triangulation 117 A . 5 Lexicographic BFS Heuristic Triangulation, variant Perfect Algor i thm A . 5 Lexicographic BFS Heuristic Triangulation, variant Perfect (LEXP) [51] Input: A connected graph Q — (V, f ) , arbitrary vertex v* € V Output: Triangulated graph Gt = (V, £ U ft) with w*(G) < w*{Gt) and an elimination ordering u = (v\, • • •, vn) 1: ft := 0 2: for v £ V do 3: label(v) := 0 4: end for 5: 5 ~ - V 6: for i = |V| to 1 do 7: if i == j Vj then 8: u := V* 9: else 10: u := arg m a x u e s label(v} 11: end if 12: o-(i) := u 13: f b r i , f c e { t - r l , - " , |V | } , j V f c d o 14: if (a(i), a(j)), (a(i), a(k)) e f and (a(j), a(fc)) £ f U ft then 15: ft :=ftU(a(j) ,a(/e)) 16: end if 17: end for 18: S:=S\{u} 19: for u> £ Neighbor(u) fl 5 do 20: label(iv) := label(w) U {i} 21: end for 22: end for Appendix A. Heuristics Search Algorithms for Triangulation 118 A.6 Lexicographic BFS Heuristic Triangulation, variant Minimal Algor i thm A.6 Lexicographic BFS Heuristic Triangulation, variant Minimal (LEXM) [51] Input: A connected graph Q = (V, £), arbitrary vertex v* € V Output: Triangulated graph Qt = (V, £ U £ t) with w*(Q) < w*(Qt) and an elimination ordering a = (i>i, • • •, vn) 1: £t := 0 2: for » £ V d o 3: label(v) := 0 4: end for 5: 5 := V 6: for i = |V| to 1 do 7: if i == |V| then 8: u := V* ' 9: else 10: u :— argmax-ugs label(v) 11: end if 12: a(i) := u 13: for j,k G {i + 1,- • • , | V | } , J V k do 14: if (a(i),Q(j)),(a(i),a(fc)) £ £ and (a(j), a(fc)) ^ £u£t then 15: f t :=f tU(a(j').a(fc)) 16: end if 17: end for 18: S:=S\{u} 19: for w e S s.t. 3 path {u = vi, • • • ,Vk+i = w} in Q with G 5 and label(vj) < label(w) for j = 2,3, • • •, /c do 20: label(w) := label(w) U {i} 21: end for 22: end for Appendix A. Heuristics Search Algorithms for Triangulation 119 A.7 Maximum Cardinality Heuristic Triangulation Algor i thm A.7 Maximum Cardinality Heuristic Triangulation (MC) [56] Input: A connected graph Q = (V, £) Output: Triangulated graph Qt = ( V , £ u £ t ) with w*{Q) < w*{Qt) and an elimination ordering a = (i>i, • • • ,vn) 1: Et := 0 2: for v & V do 3: counter(v) := 0 4: end for 5: S - V 6: for i = |V| to 1 do 7: w := arg max^gs counter(v) 8: a{i) := u 9: forj,fce{i + l ) - - - , | V | } , i ^ f c d o 10: if (a{i), a(j)), (a(i), a(k)) G £ and (a(j), a{k)) £ £ U £t then 11: £ t : = £ t U ( a ( j ) , a ( k ) ) 12: end if 13: end for 14: 5:=S\{u} 15: for w G Neighbor(u) fl S do 16: counter(w) := counter(w) + 1 17: end for 18: end for
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Generalized constraint-based inference
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Generalized constraint-based inference Chang, Le 2005
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Generalized constraint-based inference |
Creator |
Chang, Le |
Publisher | University of British Columbia |
Date Issued | 2005 |
Description | Constraint-Based Inference (CBI) is a unified framework that subsumes many practical problems in different research communities. These problems include probabilistic inference, decision-making under uncertainty, constraint satisfaction, propositional satisfiability, decoding problems, and possibility inference. Solving them efficiently is important for both research and practical applications. Along with the development of inference approaches for concrete CBI problems, researchers are increasingly aware that these problems share common features in representation and essentially identical inference approaches. As the first contribution of this thesis, we explicitly use the semiring concept to generalize various CBI problems into a single formal representation framework with a broader coverage of the problem space based on the synthesis of existing generalized frameworks. Second, the proposed semiring-based unified framework is also a single formal algorithmic framework that provides a broader coverage of both exact and approximate inference algorithms, including variable elimination, junction tree, and loopy message propagation methods. Third, we discuss inference algorithm design and complexity issues based on the abstract representations of CBI problems and inference algorithms. These discussions are examples of applying the abstract knowledge to the concrete application domains. Researchers from various fields can (1) study the most important common characteristics of various CBI problems without representation barriers; (2) analyze and compare different inference approaches; (3) borrow design ideas from other fields and improve the inference approaches' efficiency in their own domains; and (4) significantly reduce the amount of implementation work target ted previously at the individual problems. Finally, we present a software toolkit named the Generalized Constraint- Based Inference Toolkit in Java (GCBIJ) as the last contribution of this thesis. GCBIJ is the first concrete software toolkit that implements the abstract semiring approach to unify the CBI problem representations and the inference algorithms. Users can design their own task-specific semirings or simply use ones provided by the toolkit to solve their own concrete CBI problems through instantiating various already provided abstract inference algorithms. Users can also design their own inference algorithms on the abstract level and apply them to solve different problems. Furthermore, researchers can test, verify, compare, and analyze inference approaches based on this toolkit. The the experimental results based on GCBIJ show that the generalized CBI framework is a useful tool for both research and practical problem-solving. |
Extent | 6004839 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-12-03 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051112 |
URI | http://hdl.handle.net/2429/16241 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2005-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2005-0178.pdf [ 5.73MB ]
- Metadata
- JSON: 831-1.0051112.json
- JSON-LD: 831-1.0051112-ld.json
- RDF/XML (Pretty): 831-1.0051112-rdf.xml
- RDF/JSON: 831-1.0051112-rdf.json
- Turtle: 831-1.0051112-turtle.txt
- N-Triples: 831-1.0051112-rdf-ntriples.txt
- Original Record: 831-1.0051112-source.json
- Full Text
- 831-1.0051112-fulltext.txt
- Citation
- 831-1.0051112.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051112/manifest