A BINARY RELATION INFERENCE NETWORK FOR CONSTRAINED OPTIMIZATION By Crystal Jinghua Su B. A. Sc. and M. A. Sc., Tsinghua University, China, 1985 and 1988 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA Sept. 1992 © Crystal Jinghua Su, 1992 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of ^E I^r^e=7-n5,1-7-eCreqj - The University of British Columbia Vancouver, Canada Date^ Tec!_^ - DE-6 (2/88) tz. Abstract Constrained optimization is an essential problem in artificial intelligence, operations research, robotics and very large scale integration hardware design, etc. Many constrained optimization problems are computation-intensive, and yet require fast solutions. This thesis presents a novel parallel computing network designed for some optimization problems which can be represented by binary relations and the calculation upon them. Traditional graph search techniques provide sequential solutions to constraint satisfaction, but their speed is limited by the computational ability of a single processor. Parallel processing offers faster solutions by dividing its work load among many processors. A parallel algorithm can be developed from its sequential counterpart by decomposing the unfinished task dynamically according to the availability of processors. These kind of algorithms usually have difficulty in distributing work load evenly among processors [69]. Many distributed parallel computing models have been proposed for various types of problems. A distributed computation model usually embodies the data structure for a specific type of problem [16]. Many of them work in the discrete time domain [71, etc]. Neural network approaches have been proposed for optimization problems [28], however, optimal solutions are often not guaranteed when the gradient descent scheme is used. The determination of the link weights in a Hopfield-type neural network can be elaborate and convergence for large problems remains a problem [2, 90]. This thesis proposes a new parallel computing network — a binary relation inference network. The network topology is set up to match conventional optimization procedures. The network transforms some optimization problems (shortest path, transitive closure, assignment, etc.) into network convergence problems and divides the work load ii evenly among all processors. Link weights are known. It can operate in synchronous discrete-time, asynchronous discrete-time, and continuous-time domains. Inference network algorithms are derived directly from problem formulation, and are effective for a large range of problem sizes. The inference network consists of interconnected components; each component is composed of a unit, a number of sites attached to the unit, and links from the unit to sites on other units. Each unit is a simple computational engine. The topology of the inference network matches naturally to many optimization problems, since its deduction sites produce candidates to compete for the optimum and its decision-making units select the optimum from the site values. Either directly or through a transformation mapping to a systolic structure, discrete inference network algorithms can be implemented on a commercially available parallel processing facility, such as the Connection Machine. Analog inference network can be built using integrated circuits which solve problems efficiently in the continuous-time domain. In this thesis, mathematical analysis and numerical simulation were conducted for the synchronous, asynchronous, and analog inference network. Various applications has been discussed. The inference network has shown to solve the all-pair shortest path problem efficiently in various updating schemes. The inference network algorithm for the transitive closure problem was derived straightforwardly from the formulation of the problem and the topology of the inference network. Efficient continuous-time solution was obtained from a non-synchronized logical circuit. The convergence of the network for the shortest path or transitive closure problems was shown to be independent of the problem size. It was demonstrated that the inference network can solve the assignment problem in a way similar to the Hopfield net's solution to the traveling salesman problem. However, the former was shown to converge for all test cases with problem size up to 128. iii Table of Contents Abstract^ ii List of Tables^ viii List of Figures^ x Acknowledgements^ 1 Constrained' Optimization ^ 1 1.2 Traditional Approaches for Constrained Optimization ^ 3 1.3 Parallel Approaches ^ 5 Scope of this Thesis ^ 9 1.4 2 1 Introduction 1.1 xiii Topology of the Inference Network 11 2.1 Inference on Binary Relations ^ 11 2.2 A Novel Inference Network ^ 12 2.3 Inference Network Topology ^ 15 2.4 Updating Schemes ^ 17 2.5 Transfer Function ^ 17 2.6 Is the Inference Network a Neural Network? ^ 21 2.7 The Inference Network Cannot Be Treated as a Hypercube ^ 2.8 Examples and Potential Applications ^ iv 24 26 3 Inference Network Implementation^ 32 3.1 Analog Inference Network ^ 32 3.2 Discrete-Time Inference Network ^ 36 3.3 Mapping to the CM Using Local Communication ^ 37 4 Inference Network for the All-Pair Shortest Path Problem^41 4.1 The Shortest Path Problem ^ 41 4.2 Inference Network Algorithm for the Shortest Path Problem ^ 43 4.3 Algorithm Convergence ^ 46 4.3.1 Synchronous Algorithm Converges in at most flog 2 (N —1)1 Iterations 46 4.3.2 Convergence of the Asynchronous Algorithm ^ 47 4.3.3 Convergence of the Analog Algorithm ^ 4.4 Relationship to other Algorithms ^ 48 50 4.4.1 A* Algorithm ^ 50 4.4.2 MATRIX Algorithm ^ 51 4.4.3 FLOYD Algorithm ^ 52 4.4.4 DANTZIG and MDANTZIG Algorithms ^ 53 4.5 Unifying Some Established Algorithms Using an Inference Network . ^ 54 4.5.1 Floyd's Algorithm ^ 54 4.5.2 Revised Matrix Algorithm ^ 55 4.5.3 Dantzig's and modified Dantzig's Algorithms ^ 56 4.6 Comparison with Established Algorithms ^ 57 4.6.1 Parallel Processing with Optimal Hardware ^ 57 4.6.2 Performance on Connection Machine ^ 4.7 Discussion ^ 58 60 5 Inference Network for the Transitive Closure Problem 62 5.1 The Problem ^ 62 5.2 Unit and Site Functions ^ 63 5.3 Convergence Consideration ^ 64 5.4 Implementation Using Logical Circuits ^ 68 5.5 An Example ^ 72 5.6 Discussion ^ 73 6 Inference Network for the Assignment Problems 6.1 77 The Assignment Problem ^ 77 6.1.1^The Problem ^ 77 6.1.2^Optimization Formulation ^ 78 6.2 Unit and Site Functions for Optimization ^ 80 6.3 The Decreasing Objective Function ^ 81 6.4 TSP, the Hopfield Model and Comparison ^ 83 6.4.1^The Hopfield Model for the TSP 83 ^ 6.4.2^Comparison of the Assignment Problem with the TSP ^ 84 Dynamics of the Network ^ 86 6.5.1^Connection Matrix ^ 86 6.5.2^Eigenvalues Move the Network Towards the Valid Sub-space 87 6.5.3^The Role of the Non-Linear Function ^ 89 6.5.4^The Complete Q Term ^ 89 6.5.5^Initialization, Updating Rate and Termination ^ 90 6.6 Determining Constants ^ 91 6.7 Simulation Results ^ 92 6.7.1^Small Size Examples ^ 93 6.5 vi 6.7.2 Solve Large Problems on the CM ^ 6.8 Discussion ^ 7 Conclusions^ 97 99 104 7.1 Summary ^ 104 7.2 Contribution ^ 106 7.3 Open Issues and Future Work ^ 108 Bibliography^ 112 A Data Consistency Checking Using the Inference Network ^119 B Inference Network Algorithm for Matrix Inversion^123 C A Weightless Neural Network for Consistent Labeling Problems ^126 D Algorithms Related with the Shortest Path Problem^130 D.1 Mapping the Shortest Path Algorithm onto the Connection Machine . . . 130 D.2 Unifying Some Established Algorithms Using an Inference Network^. . 131 D.2.1 Floyd's Algorithm ^ 131 D.2.2 Matrix Algorithms ^ 132 D.2.3 Dantzig's Algorithm ^ 135 E Determinants and Eigenvalues^ 137 E.1 The Determinants of Some Matrices ^ 137 E.2 Eigenvalues of the Inference Network Connection Matrix ^ 138 vii List of Tables 3.1 Connection Machine busy time (in seconds) to solve the all-pair shortest path problem. Direct program: the CM is simply programmed as the inference network; Mapping: the inference network is mapped to a processor array using local communication. 39 3.2 CM busy time (in seconds) for the inference network and a systolic algorithm to solve (I — A) -1 on a 16K CM. ^ 40 4.3 Execution time for shortest path algorithms executed on their optimal hardware configuration. a in RMATRIX is 2 or 3. ^ 59 4.4 The number of time units required to obtain the shortest paths when implemented on the Connection Machine. ^ 59 4.5 CM busy time for INFERENCE-CM (random), the worst case of INFERENCE- CM (worst), and Robert and Trystram's algorithm (Robert) to solve the shortest path problem given random distance graphs . 60 ^ 5.6 States of unit, site and path indicator. ^ 71 6.7 CM busy times (in seconds) and assignment results for 20 random problems using inference network (A = B= 11 N = 64 ,C = 7,4,Do = 4,p = 0.996, Kd = 0.008) and corresponding result from the Hungarian method. 98 6.8 CM busy times (in seconds) and assignment results for 20 random problems using inference network (A = B=A A ,C =N , N = 128 Do = 4, p = 0.998, Kd = 0.007) and corresponding result from the Hungarian method. 98 viii 6.9 Solution quality vs. problem size. 20 distinct random examples were used for each N. ^ ix 99 List of Figures 1.1 Relationship between some optimization problems. ^2 2.2 (a) An AND/OR graph indicating various ways to find the distance from a to b. (b) Substitute the 3-input AND relation in (a) with two 2-input AND relations. ^ 12 2.3 (a) A complete AND/OR graph for three relations; (b) the corresponding inference network of (a). ^ 29 2.4 Unit (i, j) in the binary relation inference network. ^ 30 2.5 (a) A classical sigmoid artificial neuron; (b) a basic component in the inference network. ^ 31 2.6 (a) Topology of a 3-dimensional hypercube with 8 processors and 3 bidirectional links on each processor; (b) a full N = 3 inference network with 9 units and 4 bidirectional links per unit. ^ 31 3.7 Examples of inference network components, (a) for the shortest path problem, (b) for the transitive closure problem. ^ 33 3.8 (a) Torus structure of the inference network; (b) a circular disk. ^ 34 3.9 (a) A square structure with multiple sets of buses; (b) connection of a unit to bus lines. ^ 3.10 An analog inference network system. ^ 35 35 3.11 The systolic mapping of the inference network to the Connection Machine. 39 5.12 (a) Site and (b) unit functions for the transitive closure problem. . . . . 65 5.13 Essential components for a unit ^ 69 5.14 Site and unit time delay. ^ 70 5.15 A path indicator for the transitive closure problem. ^ 71 5.16 An example of a 4-node graph (a), the intermediate result (b) and the transitive closure (c).^ 72 5.17 Circuit for an N = 4 transitive closure problem. ^ 75 5.18 Hspice simulation of the inference network for a 4-node transitive closure problem. ^ 76 6.19 Unit and sites in the inference network act like a neuron. ^ 81 6.20 (a) A = B = N1 , C = Do = 4A, p = 0.994, the larger Kd is, the faster the network converges; (b) A = B = A, Do = 4 A, p = 0.994, Kd = 0.01, the network converges slowly when C is small, and converging speed is almost the same within a certain range of C; (c) A = B = N 1 C = V, p = 0.994, Kd = 0.01, the larger Do is, the faster it , converges. 102 6.21 Number of iterations vs. problem size. All benefits rii are randomly distributed in [0,1]. A = B = C =^^-^ Do = 4A Kd = 0•01 d — N-1 7^N ^tz! 1) E [PO, V] , 7 103 . 6.22 Maximum C for the algorithm to converge vs. problem size. All benefits rii are randomly distributed in [0, 1]. A = B = N , Do = 4A, Kd = 0.01, i u !(i) E [12, , ^103 B.23 Data flow in calculating A -1 . Each parallelogram contains data for one phase. * is an element in the original matrix; * is an element in L or U; 0 is an element in L -1 or U -1 xi 125 C.24 Interconnected clusters form a weightless neural network for consistency labeling problems. ^ 128 C.25 A diagonal cluster G(i, i) in the weightless neural network. ^ 128 C.26 A non-diagonal cluster G(i, j) (i > j) in the weightless neural network. ^ 129 D.27 The order of updating length matrix in a forward (left) and backward (right) process ^ xii 134 Acknowledgements I am grateful to my supervisor Dr. K.P. Lam for giving me inspiration, providing effective direction, allowing me to explore the areas I am interested in, and offering advice even when he is working outside of UBC. I would like to thank Dr. H. Alnuweiri sincerely for taking over the duties as my research supervisor at a time of need. My thanks also go to my supervisory committee members Dr. P.D. Lawrence and Dr. R. Ward for spending time with me despite of their busy schedules. These people help greatly in this thesis presentation. I appreciate the institute of Advanced Computer Studies at University of Maryland and the Thinking Machine Corporation Network Service for granting the access to their Connection Machines. Chapter 1 Introduction 1.1 Constrained Optimization A Constrained Optimization Problem [56] has four major components: V, D, C and f, where V is a set of variables {v i , v2 , • • • , v n }, and IVI = n (the cardinality of V is n); D = D1 x D2 x D3 • • • x D n is the vector of domains for all the variables in V, and Di is the domain of vi - a set of values from which v i can take one; C is the set of constraints for subsets (of arbitrary sizes) of variables in V, and a constraint on a subset of variables is a restriction on the values that they can take simultaneously; f is an objective function to be optimized. Constraints can also be an integral part of the objective function, where the function measures the compatibility between the constraints and the solution. A valid solution for constrained optimization is an assignment of one value to each of the variables in V such that all the constraints are satisfied. An optimal solution is a valid solution which optimizes the objective function. The goal of constrained optimization is, therefore, to find the optimal solution. However, when there is no efficient way to get the optimal solution (exact solution), a good valid solution (approximate solution) is also acceptable. Many practical problems can be interpreted as constrained optimization problems. The assignment problem, to be discussed in Chapter 6, is a typical example. Given N persons, N jobs, and the benefit of each person getting each job, an one-to-one person-job assignment is to be found to maximize the total benefit. Here we may take persons as 1 ^ 2 Chapter 1. Introduction^ variables and jobs as their domains. The constraint is each person getting a distinct job, and the objective function calculates the total benefit. The all-pair shortest path problem (Chapter 4) can also be formulated into a constrained optimization problem, although not as obvious as the assignment problem. Given a map of N cities and road distances between some of them, how to find the shortest path between two cities? The variables in this problem are city pairs, and their domains are all the combinations of possible intermediate cities. The objective function is the total distance between these city pairs. An implicit constraint is that if the shortest path from city one to city three passes city two, then this path must be the concatenation of the shortest paths from city one to two and from city two to three [11]. The transitive closure problem is a path-finding problem related with the shortest path problem, and therefore can also be considered as an optimization problem (Chapter 5). ED = min 0 = +^e =^V 0=A E Transitive Closure I^i I^ I^I I^.^I^I LFind ^i^LFind consistent relations shortest paths Data Consistency Figure 1.1: Relationship between some optimization problems. Figure 1.1 outlines some interesting relationships between the problems of shortest path (page 41), transitive closure (page 62), data consistency checking ,(page 119) and Chapter 1. Introduction^ 3 matrix inversion. First, the Algebraic Path Problem (abbreviated to APP), is a generalization of several conventional problems [1, 57, 58, 71, 95]. Substituting the general operators ED and ® with specific ones, an algorithm for the APP [71, 70] can give solutions to the shortest path problem, the transitive closure problem, and computing (I — A) -1 . Second, transitive closure, shortest path, and data consistency problems are related. The transitive closure problem determines whether a binary relation is defined for two nodes in a graph. The data consistency problem examines the consistency of a given set of time/location referencing data, and subsequently derives a consistent set of data based on the given data. The shortest path and data consistency problems, in addition to the determination of the existence of a binary relation, find a specific — the shortest or consistent — relation from a number of given or derived relations concerning the two given nodes. 1.2 Traditional Approaches for Constrained Optimization The procedure of finding a valid solution to a constrained optimization problem is usually called constraint satisfaction [56]. Symbolic Artificial Intelligence (AI) procedures for constraint satisfaction can be used in constrained optimization: generate all valid solutions using constraint satisfaction techniques, evaluate the objective function for each solution, and find the optimal. Generate and test [56] is the most straightforward approach for constraint satisfaction. It simply generates all possible combinations of the variable assignments and tests their compatibility with the constraints. Generate and Test is very inefficient and valid only when the variable domains are discrete and finite. A predicate is a function that maps an argument into a truth value (TRUE or FALSE). Backtracking [56] is a basic graph search algorithm which systematically explores vari- able domains by sequentially evaluating predicates in a specific order. As soon as any Chapter 1. Introduction^ 4 predicate has all its variables evaluated, its truth value is determined. Whenever a predicate is evaluated as false, the backtracking procedure moves back to the last variable with unassigned values remaining in its domain (if any) and evaluate its next value. When the size of a problem is large, backtracking is inefficient due to the combinatorial explosion of computations [67]. Searching with the help of heuristic information, for example A* search [64], may significantly reduce the computational cost. However, good heuristic information is often unavailable. Mackworth [55] and Waltz [87] proposed to systematically reduce the search space by eliminating local inconsistency. Although these algorithms can often significantly reduce the sizes of the variable domains, they usually have to be combined with backtracking to get final solutions (refer [55] and [87]). Simulated annealing [35] has shown significant potential for solving large-scale combi- natorial optimization problems. It is an iterative improvement procedure using controlled uphill step to achieve a global minimum. This procedure was used in the traveling salesman problem [35, 68]. Linear programming [68] is a well-studied area for optimization problems and has many established algorithms to perform relaxation [9]. Probability relaxation is a labeling procedure used in artificial intelligence and especially in computer vision [4, 59]. When a variable domain is finite and discrete, each variable is assigned a label which is associated with a weight or probability. Relaxation procedure moves the label vector around in order to satisfy all the constraints, as well as optimize the objective function [4]. Dynamic programming [11] provides a general approach for optimization problems that can be broken down into smaller but similar problems. Dreyfus [11] explores the mathematical formulations for problems like equipment replacement, resource allocation, shortest path, and traveling salesman, etc. All these approaches have limitations. The major difficulties with symbolic AI constraint satisfaction are its sequential nature and the combinatorial explosion. Symbolic Chapter 1. Introduction^ 5 AI algorithms usually can not be parallelized straightforwardly without serious communication overhead (refer [69]). When the size of a problem is large, it is impossible to explore and evaluate all the valid solutions. Simulated annealing is a sequential procedure and does not guarantee the solution to be optimal [68]. Linear programming and dynamic programming are not general methods for all optimization problems; their efficiency depends critically on the recurrent formulation of the problem. Some of the algorithms can be parallelized; others are computational-intensive and have a sequential nature (refer [11]). 1.3 Parallel Approaches The design of efficient techniques for parallel computers is strongly influenced by the assumption of the parallel computation model used. There are different classifications of parallel machines; one of them is to divide parallel machines into shared-memory models [75] and distributed-memory models [75]. The main difference between the two models is the way in which communication among processors is handled. In a shared memory system, processors write and read data to and from a single shared memory. The memory serves both as a communication medium as well as a storage medium. The PRAM (Parallel Random Access Machine) model assumes that all processors can perform a single access to the shared memory in 0(1) time, regardless of the data access pattern [18]. Within a single memory access period, all processors can perform either a read operation or a write operation but not both. A parallel algorithm can be developed from its sequential counterpart by decomposing its computation dynamically to multiple processors. Quinn [69] proposed synchronous and asynchronous models to perform backtracking (page 3). In the synchronous model, all unexamined subproblems are kept in a single priority queue stored in the memory Chapter 1. Introduction^ 6 of a master processor; whereas in the asynchronous model, each processor keeps its own priority queue. Quinn's results show that the synchronous model predicts an eventual decrease in speedup as the number of processors increases, due to the increased communication overhead. Quinn's asynchronous model has better performance. However, when the number of processors increases, theoretical speedup saturates and experimental speedup decreases [69]. In a distributed-memory model, each processor has a local memory. Data exchange is accomplished through inter-processor connections. Communication among processors is restricted by the network topology, bandwidth, data movement pattern, and other related factors [50]. To obtain good performance, small diameter (the maximum number of times a message has to be forwarded from one processor to its destination), uniform communication, extensibility, short wires, and redundant paths are recommended [26]. On the other hand, in order to reduce cost, minimum number of wires, efficient layout, fixed degree, and fitting to available technology are necessary [26]. In the attempts to solve problems requiring a large amount of communication, various interconnection topologies have been proposed [6]. Some examples are crossbar, ring, tree, grid, torus and hypercube. Reconfigurable networks are used to achieve higher network throughput. An example is given in [3]. Connectionist networks, neural networks and systolic arrays are some intensivelystudied distributed-memory networks. Shastri [72] showed that structured connectionist knowledge representation can handle problems that have proven difficult for logic-based approaches. The connectionist approach uses distributed representation which reflects the way a human brain works [14]. Constrained optimization problems can be solved by using such a connectionist network, in which constraints are represented by links and their weights [16]. Unit outputs v must be given proper interpretations so that a possible stable state of the network corresponds to a meaningful state of the problem (refer [28]). Chapter 1. Introduction^ 7 The presence and weight of the links, denoted by T, are pre-defined according to the known convergence requirements such that the network will arrive at a desired state eventually. Computational energy of the network E(T, v) can be defined indicating how much the state of the network disagrees with the given constraints (refer [28]). Optimization procedures move the outputs v in the space of possible states representing solutions, progressing in the direction that tends to decrease the energy. When used for polynomial-complexity problems, a connectionist network can be a vehicle to parallelize an otherwise sequential algorithm. The Hopfield network was used in solving the traditional traveling salesman problem [31, 2]. The objective function was composed of four terms; three of them represent the constraints and one represents the length of the tour to be minimized. The result is reasonably good fbr small size problems and its speed is acceptable. However, the validity and quality of the solution depend critically on the weight of each term in the objective function [90]. It is very hard to determine these weights for this problem with a large number of cities [90]. Combinatorial optimization can be done in an analog neural network with modified Hopfield's energy function and parameters [36]. The idea of transforming a constrained optimization problem into a network convergence problem was also used to solve the map coloring and Necker cube problems [15, 16]. Winner-Take-All (WTA) is a decision-making mechanism used in distributed computing situations. A WTA network is comprised of neurons connected by mutually inhibitory links [14]. Neurons in a WTA network compute their values based on such a rule: the neuron with the largest value keeps its value and the values of the rest of the neurons are set to zero. A maximum neural network is composed of a number of interconnected clusters where each cluster is a winner-take-all network. Maximum neural networks have been used for the bipartite subgraph problem [49] and the max cut problem [48]. It has been proven [48, 49] that the maximum neural network always generates valid (optimal Chapter 1. Introduction^ 8 or near-optimal) solutions for these two problems and the link weights of the network are easy to determine. When the maximum neural network is used for the max cut problem, the solution quality improves as time elapses [48]. Unlike the energy function used in the Hopfield model, which is comprised of four weighted terms, the energy functions for these two problems only contain one term which covers the constraints as well as the quantity to be optimized. Since there is only one coefficient involved in the energy function, link weights can be easily determined by letting the energy decrease monotonically. Therefore, the two problems solved by Lee [48, 49] provide examples of a neural network with easy-to-determine link weights. The proposed inference network will be compared with some neural network models in Section 2.6 when the inference network topology is defined. A systolic array is another approach to deal with some optimization problems. Several systolic architectures and algorithms were proposed for algebraic path problems (including shortest path, matrix inversion, and transitive closure) [71, 70, 12, 39, 23]. The definitions for the shortest path and transitive closure problems can be found on page 41 and page 62 respectively. A systolic architecture is a number of systematically connected processors; each processor calculates and transfers data rhythmically in a pre-defined way. A systolic array is a discrete-time system; processing elements in the array must be properly synchronized to ensure a correct solution. Unlike many connectionist and neural network approaches, systolic algorithms do not involve relaxation. In distributed-memory models, a network topology can embody specific computational structures needed to solve a certain class of problems [16]. A systolic structure for a given problem is usually derived by analyzing the data dependencies of the corresponding sequential algorithm for the problem (refer [52]). Grid and torus connection are most commonly used in parallel computers because of their simple hardware implementation. However, parallel algorithms written for these architectures often have to be carefully Chapter 1. Introduction^ 9 designed in terms of data delivery and synchronization. The popular hypercube topology has a higher space-compression factor and a lower communication factor than what is naturally needed in optimization problems (refer [43] or Section 2.7). For optimization problems, this thesis defines a parallel computing model in which both the evaluation of the candidates competing for the optimum and the selection of the optimum are explicitly represented by the values of network components. It is a distributed-memory model when it is used in synchronous discrete time domain. 1.4 Scope of this Thesis This thesis presents a novel parallel computing network — the binary relation inference network which is suitable for some constraint optimization problems, such as the allpair shortest path problem, the transitive closure problem, and the assignment problem. Constrained optimization is an essential problem in artificial intelligence, operations research, robotics, and very large scale integration hardware design, etc. Many constrained optimization problems are computation-intensive, and yet require fast solutions. The approach proposed here is based on defining and using an inference network whose topology is derived from simple intuitive approach to optimization. It will be shown that the inference network algorithms are obtained from the formulation of the problems, and they can be executed in synchronous discrete-time, asynchronous discretetime, or continuous-time domains. It will be shown that the link weights of the inference network are easy to determine. The network will be shown to produce optimal solutions to the shortest path (Chapter 4) and the transitive closure (Chapter 5) problems. It does not guarantee an optimal solution to the assignment problem (Chapter 6) because gradient descent will be used to decrease the network energy. The trade-off of eliminating synchronization is the large amount of communication between the elements in Chapter 1. Introduction^ 10 the network. With these communication channels, each decision-maker (a unit in the network) always has access to all candidates competing for the optimum. Each element of the network is a very simple computational engine. The thesis has the following contents. Chapter 1 is an introduction. Chapter 2 discusses the topology and operation of the inference network. It also shows several structural versions and updating mechanisms for different instances of optimization problems. Chapter 3 is about implementation issues. It discusses the components of the network and their interconnection. Both analog and discrete implementations are studied. A transformational mapping of the discrete-time inference network onto the Connection Machine is also introduced. The inference network algorithms for some constrained optimization problems are discussed in Chapter 4 to Chapter 6, and results are compared with other established methods. Chapter 4 gives the synchronous, asynchronous, and continuous-time algorithms for the shortest path problem and discusses their convergence. The network can also unify several other algorithms for the problem in terms of different inference network operations. The efficiencies of various algorithms are also discussed. Chapter 5 gives the inference network algorithm for the transitive closure problem and the continuous-time implementation of the algorithm using logic gates. Chapter 6 derives algorithms for the assignment problem and discusses network dynamics in pursuing approximate solutions. A decreasing weight is used in the energy function for the quantity to be optimized, which results in good network convergence. Finally Chapter 7 summarizes the contributions of this thesis and discusses future work. Chapter 2 Topology of the Inference Network 2.1 Inference on Binary Relations A node is an item. It can be a node in a graph, a city in a map, or a time instance in a schedule, etc. A relation associates two or more nodes. A binary relation R(x, y) is either a qualitative or quantitative relation between two nodes x and y. A binary relation is more basic than a general m-ary relation R(x i , x 2 , ..., x,n ) for m distinct nodes, which can often be defined by a set of binary relations about the m nodes, together with a set of inference rules on the binary relations. Examples of inference rules are Unary and binary deductions which draw a conclusion from one or two premises respectively. The relation 'x > y' is a binary relation. The relation `u is the largest among u, v, w' is a 3-ary relation which can be derived from binary relations `u > v' and `u > w'. Unary deduction obtains y > x from x < y, whereas binary deduction obtains x > z from x > y and y > z. A relation between x and z may also be derived using another intermediate node u, for example x > u and u > z where u y. The final inference result of the relation between x and z is determined based on all possible intermediate nodes. Consider a problem with N nodes denoted as 1,2, ... , N where the relations between the nodes can be fully represented using binary relations R(i, j) (1 < i j < N are , nodes). Operations on the relations follow the format: R(i, j) and R(j, k) R(i, k) (deducing relation R(i, k) from relations R(i, j) and R(j, k)). There are other ways to deduce R(i, k): from R(i, m) and R(m, k), R(i, n) and R(n, k), etc. Inference takes place 11 12 Chapter 2. Topology of the Inference Network^ to determine relation R(i, k) based on various deduction results. 2.2 A Novel Inference Network The inference network is a novel topology for solving constrained optimization problems. To the best of the author's knowledge, no research has been reported on networks with the same architecture and operations. The inference network architecture aims at representing optimization procedures explicitly. Its components are interconnected in such a way that a decision maker always has access to the candidates competing for the optimum. (a) ^ (b) Figure 2.2: (a) An AND/OR graph indicating various ways to find the distance from a to b. (b) Substitute the 3-input AND relation in (a) with two 2-input AND relations. The basic inspiration of the inference network topology came from a conventional AND/OR graph [64]. An AND/OR graph is a notation used in artificial intelligence to represent logical relations. As shown in Figure 2.2, arrows are tied up by arcs into groups. The logic between the relations tied up by an arc is AND — all the tied relations Chapter 2. Topology of the Inference Network^ 13 are needed to derive a result. The logic between any two groups of relations is OR — each group represents an independent way to derive the result. Figure 2.2(a) shows various ways to calculate the distance from a to b. The distance may be obtained from the distance from b to a alone, OR, from the distances a c AND c b, OR, from the distances a d AND d e AND e 4 - b. Since m-ary deduction can often be decomposed into a number of binary deductions, an AND/OR graph with only 2-input AND relations is often sufficient to represent the logic relation among a set of nodes. Fig. 2.2(b) is a variation of Fig. 2.2(a) and only uses 2-input AND relations. The AND/OR graph is a scheme to describe logical relations. It is not a parallel computing network; however, it shows how to derive a conclusion from given data, and how the given data are to be associated. Based on an AND/OR graph, the inference network was proposed which is composed of units, sites and links. The correspondence and descriptions of the entries in the inference network and an AND/OR graph are given in Figure 2.3. Figure 2.3 contains all the binary deductions which can be done to the three binary relations. In this figure, any distance can be calculated from the other two distances. On the other hand, any distance is used in evaluating another distance. Given a set of N nodes, there will be up to N(N — 1) binary relations between the nodes. Generally, a result R(i, j) can be deduced from any two relations R(i, l) and R(1,j). If all the possible binary deductions for an N-node problem are shown in a graph, the graph will become a complicated, regularly connected, closed network with a large number of units. How can one ensure that such a closed network will not oscillate? The convergence of a parallel computing network can be analyzed by defining and examining the computational energy of the network. The Hopfield model has shown an example of using an energy function in convergence analysis. The topological distinction between the inference network and neural networks or Chapter 2. Topology of the Inference Network ^ 14 other parallel computing models is that the inputs to an inference network unit are grouped into pairs (sites), and the evaluation of each pair (corresponding to site calculation) is independent of the evaluation of other pairs. Figure 2.5 shows a unit with its paired inputs, as well as a typical neuron with all inputs connecting to it directly. A typical neuron collects all of its inputs and produces an output accordingly. An inference network unit is able to update its value based on the change of any of its input pair. The reason to group the inputs is that the result obtained from one relation pair (for example a c b) can be used to update the relation (a -.4 b) without waiting for results from others (a --od—).e--■borb-i a). Another difference between the inference network and some other parallel computing models (for example a systolic array) is that the former can be updated synchronously, asynchronously, or in continuous-time. This is a feature related with the structure of the inference network. To be more specific, the grouping of unit inputs into pairs makes it possible to compute candidates for optimum independently. An AND/OR graph shows various paths to evaluate a relation using other relations, and the evaluations through distinct paths are independent. Again let us look at Figure 2.2, the calculation of distance a --0 c b is independent of the calculation of a d b. This indicates that the evaluation of sites can be parallelized. For some applications, the evaluation of all relations can be active at the same time. For example, distance d b and a —' b can be calculated simultaneously. When the distance a d e --) b is not known, distance a b is determined by a —0 c b and b a. After a —■ d —0 e b is obtained, distance a b can be updated (if necessary) based on the new information. This analysis shows that for some applications, both the units and the sites in the inference network can be updated synchronously or asynchronously (Section 2.4). Chapter 2. Topology of the Inference Network^ 15 2.3 Inference Network Topology The binary relation inference network consists of simple computational elements and operates by communicating between the elements. Each computational element is composed of a unit, a number of sites attached to the unit, and links from the unit to a site at another unit, refer Figure 2.4. Unit and site functions are usually obtained from application considerations. Just as described in the previous section, a site output derives a result from two relations, and sites attached to a certain unit provide different ways to derive a relation. A unit determines a binary relation based on its site values. In general, site functions represent deduction rules or calculate factors competing for the optimum; and unit functions resolve conflicts or perform optimization. The dynamic state of the network is determined by its link weights, unit and site functions, initial state, and up. dating mechanism. The network is named inference network because conclusions are obtained by binary deduction and multi-input conflict resolution. We will use u i; for the value of unit (i, j) and s ki/ for the value of site 1 at unit Figure 2.4 shows unit (i, j) in the inference network. Relation u1 1 and arrive at site 1 of unit (i, j) and produce a result sii i — relation between i and j according to site 1. At the same time, other results about i and j — so, • • • , ski's/ — can also be derived. Unit (i, j) is responsible to derive a relation between i and j based on the site values so, sij2, • • • SijN. The key features of this topology which make it suitable for optimization problems are its deduction ability at each site and the comparing ability at each unit. A 2-input site performs binary deduction or calculates a candidate competing for the optimum, for example, it calculates the distance from i to j via a specific intermediate node. A multi-site unit provides the platform for multiple-input decision-making or optimization, for example, finds the shortest path from i to j. For an N-node problem, the general inference network contains N 2 units. For all Chapter 2. Topology of the Inference Network ^ 16 1 < i,j < N, unit (i, j) has N sites, namely site 1 (1 < 1 < N). Unit (i, j) has 2N outgoing links leading to site j at units (i, 1) and site i at units (/, j), where 1 < 1 < N. Site 1 at unit (i,j) has two incoming links from unit (i,l) and unit (/,j). A unit is a simple computational engine which is able to calculate its value based on its N site values using a simple unit function. The ability required for a site is to evaluate a simple site function of two variables — its two inputs. Like a neuron in an artificial neural network, a unit does not require any sophisticated processor, nor does it require a large memory. For many applications, diagonal units (i, i) can be omitted, hence the total number of units is reduced to N(N — 1), the number of sites at each unit reduced to (N — 2), and the number of outgoing/incoming links from/to each unit reduced to 2(N — 2). The site label 1 at unit (i,j) satisfies 1 < 1 < N and 1 i j. For applications with undirected graphs, d,, = d1 , (symmetric distance matrix), the network size can be cut down by half: only C(N, 2) units in the network'. Each unit in this case still has (N —2) sites and each site has two incoming links; unit (i,j) and unit (j,i) will be used to refer to the same unit in the network. For clarity, Figure 2.4 only shows the sites and links for unit (i, j). However, other units also have N sites and each site also has two incoming links. If all the sites and links at each unit are put into Figure 2.4, the complete inference network will become a closed network with feedback. If the N 2 units of the network are placed into a 2-dimensional array on a plane, and unit (i, j) stays at row i and column j, then only the units in the same row or column communicate among themselves. There is no direct connection between unit (i, j) and unit (v, w) if i, j, v, w are four distinct nodes in the problem. 1 C(N, 2) = N * (N — 1)/2 is the number of different 2-object combinations from N objects without considering the order. Chapter 2. Topology of the Inference Network^ 17 2.4 Updating Schemes The inference network works in both discrete and continuous-time domains. The discrete inference network is one which operates rhythmically with a clock. It can work in synchronous, asynchronous, or partially synchronous modes. In synchronous updating, all unit values are updated at the same time based on the unit values at the previous step. Asynchronous updating changes one unit at a time. The new value of the unit is then used in the updating of the next unit. Partially synchronous mode combines these two schemes — updating some of the units synchronously at a time. The number of iterations required to complete a task shows the speed of a discrete algorithm. An iteration is used to stand for the process that can be performed on the inference network, given the results of the previous iterations. Each iteration comprises the simultaneous binary deduction (evaluation of sites) at all units and their subsequent conflict resolution (evaluation of units). In the analog inference network, all units constantly change their values until a stable state is reached. Updated unit values are used by other units right away. The time required for the network to arrive at the stable state indicates the speed of an algorithm. A unit value can be calculated after all its sites are evaluated in parallel. It is also possible to update the unit value as soon as one or more sites are evaluated. The reason behind this is that optimization can take place whenever a candidate is generated or updated; the unit value will always be the optimum-so-far. For analog implementation, it is not difficult to update a unit as soon as its site changes (refer Figure. 3.7). 2.5 Transfer Function A site function SO is a function of two unit values: = S(uii,Uti)^ (2.1) Chapter 2. Topology of the Inference Network ^ 18 A discrete unit function U() is a function of its previous unit value and all its site values: (k+1)^(k) (k+1) (k+1)^(k+1) U t'i^U' j SO^Si j2^• • • SijN (2.2) ) A continuous-time unit function U() is a function of its own value and all its site values: uii(t + 6) = U (uii(t),^+ 6), sii 2 (t + 6), . , siiN (t + 8))^(2.3) For an optimization problem, the candidates competing for the optimum are usually independent of each other, which means the unit function is separable: u i; ^. . . U'(^u' ( 12, 13) ) • • .) ^ (2.4) where 1 0 , 11, 12,^, 1N is an arbitrary permutation of so, so, • • 8 4N, and u q . U' () is a 2-variable function related with U(). For example, if U() is a minimization of N 1 variables, then U'() is a minimization of two variables. For many unit function UO, the N 1 variables in U() can be associated randomly and minimized in an arbitrary order. The state of the inference network can be described by all its unit values, which can be put into an N 2 -dimensional vector = (u119U129 • • • 9U1N9 U21, U22, • • • , UN N Updating of the network can be characterized by a transfer matrix P if: • the network updates in synchronous discrete mode or continuous-time domain; • unit function U() and site function S() are linear and are the same for all the unit and sites; • unit function is separable. For synchronous discrete updating, the state of the network is determined by equation u(k+1)^Pu(k) ^ (2.5) Chapter 2. Topology of the Inference Network ^ 19 For the analog network, network state can be determined by equation du p dt^u (2.6) where the transfer matrix P is an N 2 x N 2 sparse matrix whose element at row (iN+j) and column (kN + 1) is PiN-i-j,kN-1-1. PiN+j,kN+1 shows whether and how a binary relation concerning nodes i and j is related with another binary relation concerning nodes k and 1. piN+JAN+1 = aS(i — k)6(j — 1) + b6(i — k) cS(j — 1)^(2.7) where a, b, c are constants determined from applications and S(x) = The first term of piNi_jAN+1 shows 1x=0 0 x00 that a unit has feedback from itself; the second and third terms of PiNi-j,kN+1 indicate that a unit interacts with the units in the same row (i = k) or the same column (j = 1). When i k and j 1, pust+j,kN+1 = 0, which means a unit does not communicate directly with units which are not in the same row nor in the same column with it. For the assignment problem, to be discussed in Chapter 6, a transfer matrix P is defined whose elements satisfy Eq.(2.7). Both Eq.(2.5) and Eq.(2.6) hold for the application. The transfer matrix is used to examine network convergence for the problem (Chapter 6). For the shortest path and transitive closure problems, to be discussed in Chapter 4 and Chapter 5 respectively, a transfer matrix can also be defined indicating link weights. For these two applications, a, b and c in Eq.(2.7) are all one. However, since these two applications use non-linear site and/or unit functions, the updating of network state cannot be described by linear equations Eq.(2.5) or Eq.(2.6). Chapter 2. Topology of the Inference Network ^ 20 When the inference network is used to solve an optimization problem and is stabilized at a solution, the unit value for each unit (i, j) is denoted as u71P Vi, j. Although u7 are unknown before the problem is solved, they can sometimes be expressed in a mathematical expression. For example, in the shortest path problem, t47.7 = min{ min{ (u it uu)}}. When the expression for ti7P is known as a function of unit values, the energy function of the network can have the following form: E=E E(u - un2 (2.8) where uji is the current value of unit (i, j), and t477P is an expression of the expected value of unit (i, j) when the network is stabilized. The energy function indicates the distance of the current network state from its stable state. When the network arrives at the expected solution, uii = Vi, j and E = 0. When changes in continuous time domain, and lien ui; = u7P, then energy E will be sufficiently close to zero when t—oo time t is sufficiently long. Plugging the expression for u ker into the energy function, the derivative of the energy can be used to analyze the convergence of the network. In some applications, it is impossible to know 47 before solving the problem. However, the constraint of the problem can be put into a mathematical term E, which equals to zero when the constraint is satisfied and is greater than zero otherwise. Meanwhile, the optimization problem is to minimize an objective function Eo under the constraint. Under this circumstance, the energy function can be defined as E = Ao E, A.E0 where dtl, and A o are constants. The objective is to decrease the energy until arriving at a state at which Ec = 0 and E, is minimized. The Hopfield model is an example. Giving each term proper weight is critical here if gradient descent is used to decrease the energy. For some graph partition problems, Lee shows [48, 49] that both the problem constraint and the objective function can be expressed in a single term Ec , therefore the energy function E = AEco contains only one coefficient, and there is no need to adjust the weights Chapter 2. Topology of the Inference Network^ 21 between the constraint and the objective function. When linear unit and site functions are used in synchronous or continuous-time updating, the energy function has the form of E = u t Pu^+R^ (2.9) where u is a vector of all units, P is the above transfer matrix, Q and It are constant vectors. Refer Chapter 6 for the derivation of Eq.(2.9). Although units in the network are connected with feedback, oscillation will not occur in the electronically implemented network, as long as the energy function decreases monotonically, that is, in the mathematical term, < 0. 2.6 Is the Inference Network a Neural Network? An artificial neural network is an abstract computational model based on the structure of animal brains. Individual neurons do not transmit large amounts of symbolic information. Instead, they compute by being appropriately connected to a large number of similar neurons [14]. A number of neurons form a state and represent a concept (distributed representation, refer the Necker Cube example in [16]). The operation of each neuron is very simple. A neural network often has a uniform structure. The change of the state of a neuron aims at reducing network energy, and is often in the direction of gradient descent (for example, the Hopfield model for the traveling salesman problem). Neural network approaches focus on parallelism, robustness, adaptation, and learning. Neural networks usually solve an optimization problem right from its definition (define an energy function which is composed of constraints and the quantity to optimize). Algorithms for some types of neural networks involve a lot of learning and training to determine link weights [27]. The objective of network updating is to decrease its energy, and link weights Chapter 2. Topology of the Inference Network^ 22 are set up to achieve this goal. Artificial neural networks have both digital and analog versions. The inference network is a parallel computation model. When it is used synchronously, it is a distributed memory model. When it is used in continuous-time (without a clock), it is a circuit without the kind of memory in a von Neumann machine. Some electronic components in the circuit (capacitance, etc) act like memories to keep voltage values etc. Its topology is a reflection of an optimization process — evaluating candidates and determining the optimum. Each processor has a specific mission — an explicit reason why it performs the operation. Unit and site functions are defined to be essential procedures in optimization. Proper definition of the unit and site functions results in a decreasing energy function. The network has a uniform structure and units are identical. The network does not involve training and learning. All link weights are known. Algorithms are developed right from problem formulation. The network can operate in both discrete and continuous-time domains. Based on the above information, the inference network can be regarded as a neural network with known link weights. However, in the inference network, the sites at a unit are independent of each other. Changes at any site (or sites) can be used to update a unit value. In other words, a unit does not have to collect all its site values to calculate its own value. When a new candidate arrives, the optimum-so-far will be compared with the new candidate to find a new optimum. In a neural network, a neuron usually collects all the inputs and then produces an output (for example, a weighted sum). The inputs are not explicitly divided into groups. Figure 2.5 compares an inference network component with a classical sigmoid artificial neuron. It is interesting to compare the inference network with a specific type of neural network, namely, the maximum neural network [49]. In a maximum neural network, the clusters are responsible for decision-making, just like the units in the inference network. Chapter 2. Topology of the Inference Network^ 23 However, all neurons in a cluster, except the one with the highest value, are set to zero once the decision is made; whereas a unit in the inference network chooses the optimum from site values, but it does not reset the site values. Each unit in the inference network has one output; whereas in a general maximum neural network, all neurons in a cluster send out their values. Neither the maximum neural network nor the inference network require a rigorous synchronization procedure. The inference network guarantees optimal solutions to the shortest path (Chapter 4) and transitive closure (Chapter 5) problems. Its algorithm for the assignment problem (Chapter 6) provides valid near-optimal solutions. The maximum neural network provides valid near-optimal solutions to the max cut and bipartite subgraph problems [48, 49]. An inference network with N 2 units can be considered as a sub-net of a maximum neural network with-N 2 clusters. The former has less links. If the extra links are taken out from an N 2 -cluster maximum neural network, and the winner-take-all rule for each cluster is defined as the unit function, the N 2 cluster maximum neural network will become an inference network. A fully connected N 2 -cluster maximum neural network is more powerful than an N 2 -unit inference network because the former uses more links. Despite the similarities between the maximum neural network and the inference network, the inference network topology does not naturally match problems like max cut and bipartite subgraph, and the author is not aware of any solution on a maximum neural network to solve the shortest path, transitive closure or the assignment problems. In Lee's solution to the bipartite subgraph problem [49], the value change of each neuron is based on the gradient descent of the function to be optimized, although winnertake-all scheme is used in each cluster. Even if the maximum neural network is used to solve the shortest path or the transitive closure problem, as long as the states of the neurons change in the direction of gradient descent, whether the solution is optimal is still questionable. Chapter 2. Topology of the Inference Network^ 24 2.7 The Inference Network Cannot Be Treated as a Hypercube A hypercube [26, 75] is a topology commonly adopted by many parallel machines. The inference network is a topology derived from simple intuitive procedures of optimization. Node connectivity is defined as the number of distinct links connected to a hypercube node or an inference network unit, which is a constant for both the hypercube and the inference network. The ratios between any two of node connectivity, total number of nodes (units), and total number of links represent some features of a topology. The inference network cannot be treated as a hypercube because its ratios are quite different from the corresponding ratios of a hypercube. Figure 2.6 shows the topology of a simple hypercube and a simple inference network. If we define the space compression factor as the ratio of total number of nodes (units) to node connectivity, then the hypercube topology has a higher space-compression factor than the inference network. For a problem with size N, the inference network has N 2 units. If a hypercube with the same number of nodes is to be used (2 M = N2 ), the dimension of the hypercube is M = log 2 (N 2 ) (2.10) The node connectivities for the inference network and the hypercube are 2(N — 1) and M respectively. For any N > 2, the M which satisfies Eq.(2.10) is smaller than 2(N — 1). This indicates that the amount of communication required naturally by optimization problems is higher than the hypercube with the same number of processors can provide. The following table shows the connectivities of the inference network and a hypercube. Refer Figure 2.6 for illustration. Inference network topology for directed problems Chapter 2. Topology of the Inference Network^ 25 problem size ( N) 3 4 5 6 connectivity ( 2(N — 1) ) 4 6 8 10 total units( N 2 ) 9 16 25 36 total links ( (N — 1)N 2 ) 18 48 100 180 hypercube dimension ( M) 3 4 5 6 connectivity ( M) 3 4 5 6 total nodes (2 M ) total links ( M * 2 (M-1) ) 8 16 32 64 12 32 80 192 Hypercube topology Nevertheless, it is possible to implement the inference network on a piece of hardware which has a hypercube topology. In this case, a link in the inference network has to be implemented as an indirect path through several processors. The transformation mapping of the inference network to a systolic architecture, to be discussed in Section 3.3, is an example. The communication requirement of the inference network is a major challenge to computer technology. Each unit in the N 2 -unit network has links to 2N other units. This amount of communication cannot be accommodated by many existing digital parallel processing facilities. However, since modern technology is able to build a fully connected analog neural network chip (for example with 50 neurons), it is technically possible to build a VLSI chip for the inference network. Chapter 2. Topology of the Inference Network ^ 26 2.8 Examples and Potential Applications Let us look at some application examples of the inference network. The objective of the shortest path problem is to find the shortest path between two cities in a given distance graph. In this application, a unit value represents a distance between two cities. Unit (i,j) looks for the shortest path from city i to city j. Site 1 on this unit calculates the distance from city i to city j via city 1 by adding two unit values. Therefore, each site on the unit provides the distance of one path and the unit obtains the shortest path by minimizing the site values. Another example is to use the network for data consistency checking. Given N events and the time intervals between any two of them, the objective is to check whether these time intervals are consistent with each other. The inference network for this problem is composed of )1(14-7.4) units, and each unit (i,j) represents the time interval between event i and event j. At site k, the time interval between event i and event j is obtained from intervals from event i to event k and from event k to event j. The given data are consistent if the intervals between i and j from all sites are equal. In this application, site function is an addition and unit function checks consistency of site values. The inference network topology can be extended to a weightless neural network for the consistency labeling problem. The weightless neural network consists of clusters of neurons. Each cluster (i, j) is more complicated than a unit (i, j), however, it also works for one binary relation — the consistency between the labels for object i and object j. The interconnection between the clusters is the same as the interconnection between the units: cluster (i, j) communicate with cluster (i, k) and cluster (k, j) for all k's. Cluster (i, j) eliminates inconsistent labels for object i and j, and whenever the eligible labels for object i (or object j) change, the information is passed to cluster (i, k) (or cluster (k, j)) for all k's. Details about this weightless neural network can be found in Appendix C. Chapter 2. Topology of the Inference Network^ 27 The inference network can be applied to an N-node problem if the problem has the following features: 1. Since a unit only keeps a binary relation, the problem must be fully represented by a set of binary relations u u , where i and j are nodes of the problem. The nature of the relations uu are the same for all i and j; 2. With the links provided by the inference network, the state of the problem must progress through the interaction of the binary relations, and relation u u is only directly used in determining relation uji and u 12 ; 3. Due to the structure of each site, candidates for the optimal flu are obtained from a binary function on /A i/ and tiu; 4. Deduction rules are defined to derive sib( from ua and uu; 5. Decision-making rules are defined to determine relation u12 from s ill , so, • • • ,sio; 6. An energy function can be defined for the problem, and through the site and unit updating, it will eventually arrive at a global or local minimum when the network is stabilized. Among the above features, the first two are the most important, and they can be used to determine if it is possible to solve a problem on the inference network platform. Although the longest path problem — finding the longest path in a general graph — sounds similar to the shortest path problem, it can not be solved on the inference network because the second and third requirements are violated. In the shortest path problem, the shortest path length from node i to j always equals to the total of the shortest path lengths from i to k and from k to j for all ks. However, the longest path from i to j often is not the concatenation of the longest path from i to k and from k to j for any Chapter 2. Topology of the Inference Network ^ 28 k. Despite that the shortest path problem is solved nicely on the inference network, the longest path problem cannot be solved easily on the same network. If a relation between i and j cannot be determined from the relation between i and k and the relation between k and j, the kind of deduction is not supported by the inference network. The inference network is not suitable for any problem that cannot be fully represented by a set of simple binary relations. For example, if there are more than one salesmen in the traveling salesman problem, and the salesmen's destinations are associated with some priorities, it is not easy to define binary relations u i; Vi, j such that they can fully represent a state of the problem. On the other hand, even if it is possible to fully represent the complicated traveling salesman problem using a set of binary relations, the deduction upon the relations is probably too complicated for the inference network platform to support (the second and third requirements above). For what kind of optimization problems can the inference network produce global optimal solutions? In the inference network, each unit performs local optimization based on its inputs and the unit function. The shortest path (Chapter 4), transitive closure (Chapter 5), data consistency checking (Appendix A), and consistency labeling (Appendix C) are a few examples with a common feature: local optimization of a binary relation is always advantageous to the optimization of other binary relations. For applications with this feature, the inference network produces globally optimal solutions. For example, in the shortest path problem, any unit with a non-optimal path continues to optimize and activates other units to change accordingly. Once all units have settled down, a solution to the all-pair shortest path problem is found. Chapter 2. Topology of the Inference Network^ 29 Inference Network AND/OR graph ^Description unit^oval node^a binary relation to be optimized site^arc for AND^associating two relations for a result link^arrow^showing dependence of relations sites on a unit^untied OR logic^independent ways to derive a relation Figure 2.3: (a) A complete AND/OR graph for three relations; (b) the corresponding inference network of (a). Chapter 2. Topology of the Inference Network ^ General configuration: 1 < i,j,/ < N N 2 units, N sites per unit 2N incoming links per unit No-diagonal-units configuration: 1 < i,j,I<N 10i0j N(N — 1) units, N — 2 sites per unit 2(N — 2) incoming links per unit Undirected network configuration: 1 < i,j,/ <N 10i0 j unit (i,j) coincides unit (j, i) N(N — 1)/2 units, N — 2 sites per unit 2(N — 2) incoming links per unit Figure 2.4: Unit (i,j) in the binary relation inference network. 30 Chapter 2. Topology of the Inference Network^ 31 ••■•■•••■■•011. u ii v• where D and can be either linear or non-linear functions (b) Figure 2.5: (a) A classical sigmoid artificial neuron; (b) a basic component in the inference network. ( a ) (b) Figure 2.6: (a) Topology of a 3-dimensional hypercube with 8 processors and 3 bidirectional links on each processor; (b) a full N = 3 inference network with 9 units and 4 bidirectional links per unit. Chapter 3 Inference Network Implementation 3.1 Analog Inference Network Each unit in the inference network is a simple computational engine. The following chapters will show that for a variety of applications, operations involved in the site and unit functions involve only a few basic arithmetical operations. Therefore, a unit can be built by wiring up some operational amplifiers or logic gates operating in continuous time. An output of a site is sent to a single unit as input; an output of a unit drives a number of sites. Figure 3.7 shows two units and the sites attached to them. Figure 3.7(a) is a network component for the shortest path problem. An analog unit for a 5 city shortest path problem was built and tested [91]. The value of a unit is the voltage measurement at the output of a unit. Each site is an operational amplifier to add two voltages. Each unit is composed of some voltage comparator to choose the minimum voltage among the site outputs. The inference network algorithm for this application will be given in Chapter 4. Figure 3.7(b) is a unit and its attached sites for the transitive closure problem. A 'high' unit value indicates that there is a path between the two nodes. Each site is an AND gate determining the existence of a specific path between two nodes. The unit is an OR gate determining the existence of any path between the two nodes. Detailed discussion can be found in Chapter 5. What spatial configuration will the units form if the inference network is to be built 32 Chapter 3. Inference Network Implementation^ 33 physically? Let us answer the question by looking at the connectivity of the units. First, if there is a link from the first unit to a site at the second unit, then there must be a link from the second unit to a site at the first unit, refer Figure 2.3. Therefore, bidirectional links can be physically used. Second, all units are not directly connected. If the N 2 units of the inference network are placed on a 2-dimensional array and unit (i, j) sits at row i and column j, only units within the same row or column have links in between. C 0 U m p u i,2 U 2j a r a • •• t U i,N U. 0 r (a) (b) Figure 3.7: Examples of inference network components, (a) for the shortest path problem, (b) for the transitive closure problem. In light of these features, a torus structure, as shown in Figure 3.8, can be used. It is formed by layered circular disks and a cylindrical surface wrapped on these disks. Each unit locates at the center of a layer, and communicates with all other relevant disks through bidirectional links in the cylindrical surface. For a directed problem with size N, there are N 3 (N — 1) bidirectional links on the cylindrical surface. Since only the units in the same row or column are connected, it is possible to use multiple buses, and each bus only contains unit signals in the same row or column. Figure 3.9 shows this structure. Each unit is connected with a vertical bus and a horizontal 34 Chapter 3. Inference Network Implementation ^ • Figure 3.8: (a) Torus structure of the inference network; (b) a circular disk. bus; each bus contains signals from units in a particular row or column. Units get inputs from two bus lines, calculate outputs, and send results back to the two bus lines to drive other units. The width of each bus is N and there are 2N buses in total. Despite the feedback in the network, for each of the applications considered in this thesis, it is shown that the inference network has stable dynamic behavior and eventually it will converge. An energy function can be defined for each application and can be proved to be decreasing. The feedback links in the network will not cause oscillation. A complete analog inference network system consists of a conventional computer, the inference network formed by circuits or VLSI chips, and D/A and A/D converters, as shown in Figure 3.10. The computer is a monitor as well as a storage medium. The analog inference network is composed of a physical configuration (torus, square, etc.), and units plugged in to it. The units can be multi-functional, which have switches to determine their functionality for the given application, or, they can be simple units for a specific application. Sites at a unit are evaluated in parallel. The advantage of using analog units instead of digital ones is due to the simplicity of analog units. An analog system uses additional D/A and A/D, but it does not need a set Chapter 3. Inference Network Implementation^ 35 Input signal from bus for the column Output signal drives bus for the column Figure 3.9: (a) A square structure with multiple sets of buses; (b) connection of a unit to bus lines. of circuits for each bit of the data, whereas a digital system does. Besides, the units in an analog system do not need a clock to synchronize their operations. A potential problem with the above implementation is the limited driving power of the output signals. When the network size is large, additional driving circuits may be necessary. Monitor/Data/Result Physical Connection (Analog Units) Figure 3.10: An analog inference network system. Chapter 3. Inference Network Implementation^ 36 3.2 Discrete-Time Inference Network The computational requirement for the components of the inference network is simply the ability to perform basic arithmetical operations. A site produces an output based on two inputs; a unit collects the site values and generates a unit output. Neither a sophisticated CPU nor a big memory is needed. However, each unit does require a fairly big communication capacity: each unit has 2N outgoing links and 2N incoming links. A discrete inference network can be built using digital components similar to those shown in Figure 3.7. Physical connection of units are also similar. The major difference is, in order to update the network synchronously, a global clock is required to synchronize the activities of each component. At the beginning of each time step, sites attached to a unit obtain inputs through its incoming links, calculate site outputs simultaneously, and then the unit performs its unit function and produces the output for the next time step. The Connection Machine (CM) [26, 85] can be programmed as the inference network. The Connection Machine is a single instruction multiple data stream computer which consists of a large number of simple processors and is efficient for massively parallel computing. It has up to 16K, 32K or 64K processors; each processor has up to 1MBit of memory. Its processors are connected logically into an n-cube topology and support highly efficient n-dimensional grid communications. An algorithm can be executed on the Connection Machine efficiently if it mainly requires regular and local communication. Non-local communication can be achieved with the trade-off in speed. If an application requires more processors than the CM physically has, virtual processors can be created. The CM uses a conventional computer, such as SUN or VAX, as a front-end machine. Several high-level languages for parallel programming are supported, such as *Lisp, C*, CM Fortran, which are parallel extensions of the corresponding sequential languages. CM provides the utility to measure computational time: CM busy time is the time the CM Chapter 3. Inference Network Implementation ^ 37 is involved in calculation; CM elapsed time includes the time the CM is communicating with its front end, in addition to CM busy time. When the Connection Machine is programmed as the inference network, a processor array can be declared in which each processor corresponds to a unit in the inference network. The network can operate in both synchronous and asynchronous modes. Each processor performs site operations as well as unit operation. To obtain site values of a unit, each processor is required to communicate with 2(N — 1) other processors. Unit values can then be calculated within each processor. Although each CM processor has only four hardware links with its neighbors, arbitrary communication can be achieved from software level. For example, C* code [i][i]work [k][l]work sends the element [1c][1] of vector variable work to position [i][j]. The programmer does not need to do any hardware reconfiguration. If the two elements are not hypercube neighbors, the message has to travel through other processors to get to its destination. However, since the cost of non-local communication increases rapidly with problem size N, this way of using CM is very inefficient. 3.3 Mapping to the CM Using Local Communication Under the situation that existing hardware does not support the operations of the inference network efficiently, we propose a systolic transformation mapping of the network to the Connection Machine. In the previous section, all the sites of a unit constantly reside on the unit. However, since there is only one processor for these sites, site values are evaluated in serial, and site evaluation normally involves non-local communications. The high communication cost caused by directly programming the CM can be significantly reduced by proper arrangement of site calculations at each unit: each unit value passes Chapter 3. Inference Network Implementation^ 38 through a set of units (processors) and stops at each unit to evaluate one site of the unit; each unit calculates its site values in its own order. Here are the details. The N x N inference network is mapped onto an N x N processor array on the CM, each processor for one unit. See Figure 3.11. Each processing element (PE) has a local memory keeping the unit value. Two data flows, one horizontal and one vertical, pass though each PE constantly. Each data flow consists of N unit values from a certain row or column. Since each unit only communicates with all the units in the same row or column, a horizontal (or vertical) data flow can be shared by all the units in the same row (or column). The data flows in the two directions are ordered in such a way that the two inputs of a certain site arrive at a PE at the same time. Eq.(3.11-3.12) is the initialization of the data flows: h[i, (—i — j) (mod N)] := u[i, j]^(3.11) v[(—i — j) (mod N), j] := u[i, j]^(3.12) where h[k, 1] and v[k,1] are the data at u[k, 1] after the initialization. It takes N time units for the data flows to traverse the array. In this mapping, the N sites are not physically attached to a unit all the time, but are attached to the unit one after another rhythmically. For most optimization problems, the unit function is separable, as shown in Eq.(2.4), hence a PE does not need extra memory to keep any site values. One synchronous iteration is composed of N 1 steps on the CM: • form the vertical and horizontal data flows (prepare updated unit values for the site inputs) according to Eq.(3.11-3.12); • repeat N times: — update local memory (calculate site values and revise the unit value); Chapter 3. Inference Network Implementation ^ 1<j<N ••• h[i,j] 1 • • • ...■11. • •••^•••^••• •• _L _L • 39 u [i ,j] •- rTT:T.TIT:^ Figure 3.11: The systolic mapping of the inference network to the Connection Machine. — propagate horizontal and vertical data. The result of this mapping is that each unit only has to communicate with its four neighbors. Table 3.1 shows that the mapping is far more efficient than directly program the CM into the inference network, which uses non-local communication. N Direct program Mapping 4 0.062 0.087 8 0.313 0.113 16 0.818 0.176 32 1.828 0.296 64 5.957 0.701 128 15.72 1.88 256 133.04 8.21 512 1029.0 37.1 Table 3.1: Connection Machine busy time (in seconds) to solve the all-pair shortest path problem. Direct program: the CM is simply programmed as the inference network; Mapping: the inference network is mapped to a processor array using local communication. This mapping of the inference network to a square processor array on the Connection Machine makes good use of the available processors. Because of the restrictions of the Chapter 3. Inference Network Implementation ^ 40 CM, a rectangular geometry must be declared on the CM if a systolic algorithm uses a parallelogrammic processor array. Therefore, the rectangular geometry, which completely covers the parallelogrammic array, results in a higher virtual processor ratio (defined as the number of virtual processors over the number of physical processors). Table 3.2 gives the Connection Machine busy time for the inference network (refer Appendix B) and a systolic algorithm [70] to complete a matrix inversion. The inference network uses a square processor array whereas the systolic algorithm uses a parallelogrammic one. The two algorithms have similar operations. When the problem size is large, the inference network is significantly more efficient. More details can be found in [80, 78] N 64 4 8 16 32 Inference Network 0.22 0.39 0.88 1.21 1.91 systolic algorithm 0.09 0.17 0.38 0.89 2.31 128 256 512 3.86 17.73 115.35 8.27 37.29 210.38 Table 3.2: CM busy time (in seconds) for the inference network and a systolic algorithm to solve (I — A) -1 on a 16K CM. Although the mapping of the inference network to the CM is more efficient than directly program the CM into the inference network, it must be made clear that this mapping is just a vehicle to utilize a piece of existing hardware. It is not the end of the inference network itself. As a matter of fact, the mapping introduces additional restrictions on the inference network: it requires the sites to be evaluated in specific orders and unit calculation to be strictly synchronized. Chapter 4 Inference Network for the All-Pair Shortest Path Problem This chapter discusses the inference network algorithm for the all-pair shortest path problem. For discrete synchronous updating, the algorithm was shown to converge to the global minimum state within 1log 2 (N — 1)1 iterations'. Unlike most existing discrete approaches and some neural network approaches, the algorithm was also shown to converge to the optimal solution for asynchronous or continuous-time updating. The inference network was proved to converge, independent of problem size, to the optimal solution for the problem. It was demonstrated that the inference network not only can provide an efficient platform to solve the shortest path problem efficiently, but also can unify many established algorithms for the problem by defining different unit and site functions. 4.1 The Shortest Path Problem The shortest path problem has the following description. Consider a densely connected graph with N nodes denoted as 0, 1, ... , N — 1 and a set of directed arcs A. Every arc from node i to node j in A is associated with a distance (or length, cost, time etc.) dip For those node pairs without an arc in between, define their distance as infinity. Assume further that dii = 0 if it is not otherwise specified. The length of a path is the total length of the arcs on the path. The problem is to find the shortest paths between all pairs of nodes. Rote [71] gives more formal definition of the problem using semiring theory. 1 Ir X1 (Z > 0) is the smallest integer greater or equal to x. 41 Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^42 Generally, d ii can be negative as well as positive. However, if a directed loop with negative length exists in the distance graph, the shortest paths will have negative infinite length — because the more a path repeats the loop, the shorter the path can be. In order to have a well defined problem, we restrict our discussion to graphs containing no negative-length loops. (Negative-length arcs may exist, as long as any closed loop in the graph has a positive or zero length.) A subset of these graphs are those containing undirected arcs with non-negative lengths, that is, in mathematical terms, Vi, j di; = clii > 0 where 0 < j < N — 1. Finding the shortest paths in graphs or trees is an intensively studied problem in artificial intelligence and operations research. Established discrete algorithms for the shortest path problem fall into three categories. Dijkstra's algorithm [10] (to be referred to as DIJKSTRA) is the most efficient one among the single-pair algorithms, which find the shortest path between a specific pair of nodes at a time. There are several algorithms which can find the shortest paths from a specific source node to all other nodes at the same time. Yen's [94], Spira's [73], and Moffat's[63] algorithms (to be referred to as YEN, SPIRA, MOFFAT, respectively) are examples of this kind of single-source algorithms; among them SPIRA is the most efficient. The third class of algorithms, all-pair algorithms, find the shortest paths between all the node pairs at the same time. This class includes the matrix and revised matrix algorithms [13, 32, 93], Floyd's [17], Dantzig's [9] and modified Dantzig's [81] algorithms, to be referred to as MATRIX, RMATRIX, FLOYD, DA NTZIG, MDANTZIG, respectively. Systolic algorithms like Rote's [71] and Robert and Trystram's[70] (to be referred to as ROTE and ROBERT respectively) are also all-pair algorithms. All-pair algorithms are easier to be parallelized. The shortest path problem is also solved by neural network approaches. Fritsch [19] discussed eigenvalue analysis method and proposed to use self-organizing feature map Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^ 43 and a modified Hopfield net to solve the problem. Thomopoulos [83] introduced a neural network algorithm run on a network structure similar to the Hopfield net. Like neural network approaches to other problems, these algorithms can not guarantee an optimal solution. The problem is also solved by the connectionist approach in digital environment[25]. It obtains the solution through relaxation with self-adapting step size. The shortest path problem has applications in communication, transportation networks, integrated circuit layout, and control, etc. Applications of the single-pair shortest path problem are common in daily life. Single-source and all-pair solutions are useful in VLSI layout etc. For example, compaction is a process in VLSI layout aiming at minimize the layout area while meeting all the design rules. It is an optimization problem because masks can move around as long as spacing, minimum-size, and shape rules are met. One-dimensional compaction changes one coordinate of the masks while preserving the design rules and not altering the function of circuits. In two-dimensional compaction, both coordinates can change simultaneously to minimize the area. Solving a single-source shortest path problem is one of the ways to perform a one-dimensional compaction. The difficulty of two-dimensional compaction lies in determining how the two dimensions of the layout must interact to minimize the area. One way out of this difficulty is to do two one-dimensional compactions under some additional constraints. The test of the constraints involves solving two all-pair shortest path problems. Lengauer [51] gives detailed algorithms for them. 4.2 Inference Network Algorithm for the Shortest Path Problem By giving proper unit and site functions, the inference network can be used for the allpair shortest path problem. Each unit (i, j) (0 < i, j < N — 1) in the network works for the shortest path from node i to node j. Its value u ji is the shortest path length between Chapter 4. Inference Network for the All-Pair Shortest Path Problem^44 i and j found so far. Site 1 at unit (i, j) records a path from node i to node 1 and then to node j. Its value so is the sum of the path lengths from i to 1 and from 1 to j. All links have weight 1. For undirected shortest path problem with no negative-distance path, initial value ut? is the given distance di,. For directed graph, ut? are obtained from ) ) distances dij using Eq.(4.13). If a path from node i to i exist with positive length, the shortest path has, by common sense, zero length. If the path from i to i has a negative length, by repeating this loop, the length of the shortest path will approach negative infinity. d13^i j 0^i = j, dij > 0 (4.13) —oo i = j, di; < 0 Each site on a unit performs siii := ull + uu^ (4.14) and all the units in the network perform u i; := min{u ih min{s ig}}^ (4.15) Sites and units can be updated asynchronously, synchronously with a clock, or in continuoustime. The algorithm for synchronous discrete time, asynchronous discrete time, and continuoustime domains are given in Algorithm 1, and Algorithm 2, Algorithm 3 respectively. Algorithm 1 : Shortest Path — Synchronous (to be referred to as INFERENCE) set initial unit values according to Eg.(4.19) k := 0 while (1) BEGIN Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^45 BEGIN PARALLEL for all i,j BEGIN PARALLEL for all 1 8121) := u (1 -1) 411 -1) END PARALLEL u(iV^n^{ 0 < < N-11 4111)}} END PARALLEL if Vi, j u! ) = u•3 -1) break; k:=k+1 END Algorithm 2 : Shortest Path — Asynchronous set initial unit values according to Eg.(4.13) REPEAT randomly pick a unit (i,j) BEGIN PARALLEL for all 1 Sip :=^ut; END PARALLEL uji :=^min^{so}} o <1<N-1 UNTIL no unit value changed when any of the units is picked. Algorithm 3 : Shortest Path — continuous time • Set initial unit values according to Eq. (4.13); • All sites at each unit perform function sip := uit u4; • All units perform^:= min{ui;^min^{siii}}; '0<l<N-1 • Sites and units will eventually stabilize and each unit value represents the distance of a shortest path. Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^46 4.3 Algorithm Convergence There are two important considerations in using the inference network for the shortest path problem. First, will the network always converge to the desired solution? Second, what are the parameters or conditions that affect the convergence rate of the network? The answer to the first question is 'yes', because at the k-th iteration, all paths containing at most 2" arcs take part in the minimisation; and the local minimisation performed at each distinct unit drives the network to a global minimum state — which is the desired solution. The convergence rate of the network is O(log2 N). 4.3.1 Synchronous Algorithm Converges in at most flog 2 (N — 1)1 Iterations Since all the shortest paths containing at most 2" arcs can be found in p iterations and a shortest path in an N-node graph contains at most (N — 1) arcs, the inference network converges to the solution in at most flog2 (N —1)1 synchronous iterations. In Algorithm 1, the inference network algorithm uses distances which is the length of one arc, in the first iteration to calculate the shortest paths containing at most 2 arcs. In the second iteration, it uses the shortest paths containing 1 or 2 arcs to calculate shortest paths containing at most 4 arcs. Repeating this procedure p times, all the shortest paths containing at most 2" arcs can be found. In other words, a shortest path containing q arcs can be found in (log e iterations. Since it is assumed that there is no negative loop in the graph and, by common sense, a shortest path does not contain a closed loop with non-negative length, shortest paths are all open paths. If an open path contains at least one infinite-distance arc, then the length of the path, which is infinity, can be determined right away. If there are a number of open paths between node i and j which are composed of finite-distance arcs only, the k-th of them contains /k(i,j) such arcs, then the number of iterations required, to find the shortest path between node 5_ Chapter 4. Inference Network for the All-Pair Shortest Path Problem^47 i and j falls into range: I log 2 ( qn /k(i, j) } ) 1 < ni; I log 2 ( mr {^} ) 1 The algorithm completes when the shortest paths between any two nodes are found, therefore, the number of iterations required, n, to find all shortest paths is given by log 2 ( max { min { /k(i,j) } } )1^n < I log 2 ( max { max { lk(i,i) } } ) 1 j^k^ j^k For an N-node problem, an open path consists of at most N —1 arcs, that is max { max { /k (i,j) } } < N — 1 s,^k hence all shortest paths can be found in at most ilog 2 (N — 1)1 iterations. Usually, the network requires fewer iterations to converge than this upper bound. The number of iterations required depends solely on the maximum number of arcs in an open path in the graph; it is not directly related with graph density. 4.3.2 Convergence of the Asynchronous Algorithm All the algorithms we found in literature for the shortest path problem are synchronous algorithms. However, it can also be proved that using the same updating rules asynchronously, the inference network Algorithm 2 converges to the global minimum state as well. For asynchronous updating, the unit which has the least chance to update is most likely the one which determines the completion time. In synchronous evolution, utk. ) is the shortest path length after k-th iteration, and it gives the shortest path among the paths from i to j which contain at most 2k arcs each. For asynchronous updating, we need a record nii to indicate that all the paths from i to j which contain no more than^arcs have been considered in minimizing uii. Some, but not all, paths contain more than ^arcs may also contribute in the minimisation. Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^48 For any n4, u4 in the asynchronous case is less or equal to u; °g 2 n '31) in the synchronous case. Site and unit functions for the asynchronous inference network are the same as in synchronous evolution for each individual unit and site. The initial value of n q is always 1, and it is updated according to = min{nit nu} In each iteration, a random unit (i, j) is picked to be minimize. In addition to calculating and updating sip and u4, n12 can be recorded. Paths are all minimized if Vi, j nu > N — 1. If units are updated one by one systematically, and one round of network updating starts only when all the units are updated in the previous round, then it takes N(N — 1) flog 2 (N — 1)1 iterations to obtain the shortest paths for all node pairs. If updating is in a random order, minimizing a unit (i, j) with its n 4 greater than that of some other units just makes the updating process longer. Hence for random updating, the number of iterations required is at least N(N — 1) [log 2 (N — 1)1, and depends on the unit (i, j) which increases its n4 value slowest. The asynchronous algorithm will eventually converge, that is, arrive at the state of n 11 > N —1 bpi, j, if each unit has a chance to update sometime. It is worth point out that n 4 is used here to prove the convergence of the algorithm. However, it is not necessary to calculate them if only the shortest paths are to be found. 4.3.3 Convergence of the Analog Algorithm In the analog shortest path algorithm, each unit performs tz 4 := min{u 4 , min {s kit }} o <I<N-1 (4.16) ^ Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^49 Taking into account the transient period and assuming that each unit is a first-order system, the behavior of a unit can be described by d =^ —AiJui; + Ai; minfui i ,^min^{s o d t3 0 <^— N il^(4.17) where constant^> 0 is the open-loop pole of unit (i,j) which partially determines the speed at which ui; changes. The network energy for this problem can be defined as E= E E (u, - min{uij,^min {si./}}) 2 0 <1<N-1 The global minimum state of the network is signified by E = 0, that is V i, j,^= min^{so}. At this state, unit values will not change any more and the network 0 < / < N —1 is stabilized. The following is a proof that the network will arrive at a state sufficiently close to this global optimal state as the elapse of time. • From Eq.(4.16), ui; >^min^{skid; 0 <I<N— 1 min^{so}, d;L • If ui; =^ = 0, the unit keeps its value; 0 <I<N-1 • If ?Ai; >^ min^{so}, d u --1.1= —Ai; (ui;^ —^min^Isiii}} ) < 0, which d t^ 0 <1 < N-1 drives uii towards^min^{sill}; 0 <I < N 1 0<l<N-1^ — • The farther ui; is from^min^{s ip}, the faster ui; decreases. Therefore, 0</<N—1 >0 • dd E t < ; 0 when the behavior of a unit is governed by Eq.(4.17): d Ed E d ui; dt = ,^j^s., du•• dt d ui; {sill}} i j^0<1 <N-1^j d t 2 ‘..., 1 ( d ui;1 <0 = —2 F. ^‘--"^A i ^4• d t = 9..N-- ^\"--‘ Ls ttij — MillIttij,^mill k Chapter 4. Inference Network for the All-Pair Shortest Path Problem^50 • E. E duwhere A > 0. when ui; >^min^{s o }, --!1< dt 0 < 1 < N-1 d2 E 0, and 7-, > 0, hence^> 0; E^ i; dt-^JA; dt dt^ Ce ti*^ • at = 0 V i•,j,^ , that i s, the network has settled down and = 0 holds only when dui^ dt d E^ dt^ — u i; =^min^{s o } Vi, j holds (all the paths are optimized); 0 <1<N-1 dE • if^ = 0, then u i • =^min^{s o } Vi, j, which means E = 0; dt 0 < < N-1 dE • E > n^ < 0, and -dTi : f- > 0 show that the network energy is a decreasing convex dt — function which has to approach its stable value zero with the elapse of time. The energy decreases quickly at the beginning, and the decreasing rate slows down when the value of E is close to zero; • whent^t lirn •51- = 0, we have Jim u ij = min{u4,^min^{s o il^j, that is, 00^0 < I < N-1 the network will eventually settle down at the global optimal solution. 4.4 Relationship to other Algorithms This section addresses the relationship between the inference network algorithm and other existing algorithms, such as A*, MATRIX, FLOYD, and DANTZIG. The essential difference between the inference network algorithm and other algorithms is that the former is derived directly from the topology of the inference network. It is a parallel algorithm with the information of how to be executed on a parallel network. Those existing algorithms are sequential algorithms which, by themselves, do not bear information about how to be parallelized. 4.4.1 A* Algorithm A* [64] is a heuristic search algorithm. It maintains two lists during the search; the open and closed lists contain nodes which are to be expanded and have been expanded Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^51 respectively. To determine the candidate to expand, it uses a heuristic guess of the length from the current node to the goal node. In searching for the shortest path from s to t, the algorithm chooses a node with the least estimated length f(s, t, n) to expand, where f(s,t,n) = g(s,n) it(n,t)^ (4.18) and g(s,n) is the length of the partial solution from s to n, h(n, 0 is an estimation of h(n, t) — the shortest path length from n to t. There is a remarkable degree of computational correspondence between INFERENCE and A*. If n is a node in the open list, unit values u(s, n) and u(n, t) in the inference network correspond to g(s, n) and h(n, t) in A*, respectively. Of the four values, g(s, n) is shortest path length between s and n; u(s, n) and u(n, t) are path lengths which may be further minimized; h(n,t) is heuristic information. One unit value in the inference network serves as the heuristic information of another unit. Both the heuristic information in A*, it(n, t), and the heuristic information used in INFERENCE, u(s, n) and u(n, t), have to be greater than the shortest path length to ensure a correct solution. The difference is that the heuristic information it in A* has to be provided prior to the search; whereas the inference network updates the shortest paths between two nodes in each iteration, and the results in one iteration are used in the next iteration as heuristic information until a global minimum is reached. 4.4.2 MATRIX Algorithm MATRIX is an all-pair algorithm which can be expressed in matrix operations. It defines a matrix binary operation called minaddition: W A 0 B ['D i; = qn(a i k bki)] where A [a ij ], B = [kJ ] ^ (4.19) If D 1 = D is the distance matrix [4], the elements of D N-1 gives the shortest path Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^52 lengths, where D N-1 is the minaddition of N — 1 D's: D N-1 =D0D0D0.-•0D0D^(4.20) The divide and conquer version of MATRIX [8, 47] uses the fact that minaddition is associative: D 4 =DODODOD = 0 D 2^(4.21) With this feature, only rlog 2 (N — 1)1 0-operations are required. INFERENCE is also intimately related with MATRIX, although the former is derived directly from the inference network platform. It is not difficult to observe that, if we define another matrix binary operation 1 as W = A_LB = [wi ; = min{ a;2 , bii}] where A = [a il ], B = [kJ]^(4.22) then Eq.(4.14-4.15) can be effectively represented as D k = D k-1 1(D k-1 D k-1 )^where k = 1, 2, ... , flog 2 (N —1)1^(4.23) Under the assumption of no negative-length loops, dii = 0, therefore Eq.(4.23) is equivalent to the divide and conquer version of MATRIX. However, the matrix algorithm itself does not carry information of how to implement it on parallel processing arrays, whereas the inference network provides a platform to solve the problem in parallel whose updating equations are mathematically equivalent to the divide and conquer scheme. 4.4.3 FLOYD Algorithm Floyd's algorithm (FLOYD) constructs optimal paths by inserting nodes, when appropriate, into more direct paths. After k iterations, function fk (i,j) is evaluated to the optimal path length between i and j, where the intermediate nodes belong to the set of Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^53 nodes {0,1, ... , k}. Its recurrent relation is fk(i,j) =^fk-i(i, k) + fk-i(k,i)} ^ (4.24) with boundary condition fo(i,i) = dij. The difference between FLOYD and INFERENCE is that the former considers one distinct intermediate node at a time and iterates N times; while the latter consider all intermediate nodes in each iteration and iterates log 2 (N — 1) times. 4.4.4 DANTZIG and MDANTZIG Algorithms Dantzig's algorithm (DANTZIG) constructs shortest paths using arcs whose nodes numbered between 0 and k at the k-th iteration. gk (i, j) (i, j < k), the shortest path length from i to j in the k-th iteration, is obtained from gk_1(i,j), dik and dki. Dreyfus [11] gives the algorithm in recurrent form: min^Ig k _ 1 (i,l) + g_ 1 (1, k)}^(i = 0,1,... ,k-1), 0<1<k-1 = mi {g_ i (k, 1) -I- gk_ 1 (1, j)}^(j = 0,1, ... , k — 1), 0<1< k-1 = min{0,^min^{gk (k, 1) + gk(l, k)}} 0<1<k-1 gk( 1 7 j) = min{gk_ i (i, j), gk(i, k) + g k (k, j)}^(i, j = 0,1,...,k —1) = (4.25) (4.26) (4.27) (4.28) where g_ i (i,j) = dij for all i and j. The total number of basic operations required is 2N 2 (N — 1). Modified Dantzig's algorithm (M DANTZIG) avoids unnecessary operations by further decomposing Eq.(4.25-4.26) into k recurrent steps. Define g2(i, k) = g2 (k , j) = co, A(i, k) = 4 -1 (i, k) and gk(k, j) = (k, j) can be obtained from: k)^ VO < i < k — 1 dik > gkl -1 (1, k) k) =^ min{g1-1 (i, k), gk_ 1 (i, 1) + dlk} VO i k — 1 due < glk-1(1, (4.29) ^ Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^54 gii( k,j ) gkl (k,j). VO < j < k — 1 dki > g kl -1 (k , 1) minfik-1 (k,j),dv+gk_ 1 (1,A} `d0j < k — 1 dki < gkl -1 (k,l) (4.30) A significant amount of operations can be saved when dik > gkl -1 (1, k) or did > is satisfied. While the INFERENCE considers all node pairs and all intermediate nodes in each iteration, DANTZIG only considers node pairs whose indices are less than k in the kth iteration. The former requires at most (log e (N — 1)1 iterations, whereas the latter requires N iterations. 4.5 Unifying Some Established Algorithms Using an Inference Network Many all-pair shortest path algorithms can be implemented in the inference network by defining corresponding unit and site functions. This section will summarize these functions. For more details, refer Appendix D.2. 4.5.1 Floyd's Algorithm The inference network implementation for FLOYD needs N 2 units. The unit and site functions are defined as: ^(k)^..(k-1) (k)1 = min{u13 9 3 1 fic ^S (k)^(k-1)^(k-1) ilk^U s. k^U kj (4.31) (4.32) Initial unit values are u17 1) = dij . After N steps, u!7 -1) gives the shortest path length from node i to node j. Compared with INFERENCE, FLOYD only updates one specific site (indexed k) at each unit for each iteration; but it takes N instead of flog 2 (N — 1)1 iterations to find all the shortest paths. ^ Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^55 4.5.2 Revised Matrix Algorithm Revised matrix algorithm (RMATRIX) updates matrix elements asynchronously to avoid unnecessary operations. Two distinct processes are defined: (i) a forward process c40 =^min^{c4r)^} 0<l<N-1 where p = fl j < 1 j >1 = (4.33) and (ii) a backward process f 1 < j^If 1< i 46) = 0 < /n<linN — 1 {c1(1) + din where p =^q=^(4.34) b 1> j^b l> i where 4.11) is the initial given distance. Yang [93] shows that the shortest paths can be obtained by one forward process followed by on backward process, one backward process followed by one forward process, three forward processes, or three backward processes. The inference network for RMATRIX contains N 2 units. Initially, 49 = dip Site function for both forward and backward processes are the same: sl (k-1)^(k-1) il ) =^ (4.35) Unit functions for a forward process is: min^{4, i} when k = f (i, j) 0<l<N-1 (k) =^ , (k-1) (4.36) otherwise where max{i,j}(m^j} ax{i,^. — 1) . .} + minft,j-1- 1 f (i, i) -^ 2 (4.37) Unit functions for a backward process is: min^{s ok } when k = b(i,j) 0 <1 < N —1 ( (k) ) — - (k-1) tiii otherwise (4.38) Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^56 where b(i, j) (max{N — i N — j} — 1)(max{N — N — j} — 2) 2 , min{N— i,N— j} (4.39) The network updates two units at a time. To obtain all the shortest paths, the units of the network have to be updated in a specific order for N(N — 1) (one forward and one backward process) or 1N(N —1) iterations (three forward or three backward processes). 4.5.3 Dantzig's and modified Dantzig's Algorithms Size of the inference network for DANTZIG is N x N. The unit function for DANTZIG is (2k) = -1) ,^min min{u!r {sr}} j < i = k or i < j = k 0< < k-1 " (2k-1) otherwise Uij (4.40) min{ s ur, r 1)}^i < k and j < k (2k+i) min{0,^min^{42;` +1) }} i j = k ui;^ 0<1<k-1 (2k) u ii^otherwise (4.41) where 0 < k < N —1, 4T 1) = d12 and st:1 = ulT -1) .141 1) V 0 < r < 2N. In the 2k-th , iteration, 2k units are updated, k sites at each unit are involved in the minimisation; in the (2k + 1)-th iteration, a total number of k 2 +1 units with a total of k(k 1) sites are minimized. The total number of synchronous iterations required is 2N. Corresponding to the 2k-th iteration in DANTZIG, the inference network implementation for MDANTZIG requires k iterations. In each of these iterations, 2k units are updated; and at each of these units, only one site is activated. So MDANTZIG requires a total number of 1.N(N + 1) iterations. Chapter 4. Inference Network for the All-Pair Shortest Path Problem^57 4.8 Comparison with Established Algorithms 4.8.1 Parallel Processing with Optimal Hardware A variety of algorithms have been proposed for the shortest path problem, most of which are initially designed for sequential computation. It is interesting to observe that in many cases, an efficient sequential algorithm does not naturally lead to an efficient one in parallel. In this section, we will compare the completion time for various algorithms under the assumption that each algorithm can be executed on the hardware that optimizes its performance. In the next subsection, we will compare the efficiency of some algorithms on a specific hardware — the Connection Machine. It is assumed that our goal is to find all the shortest paths between any two nodes. For single pair algorithms, such as DIJKSTRA and A*, N(N — 1) processors can be used. Each processof works for one shortest path between two distinct nodes. Since they are initially designed for finding the shortest path between a single pair of nodes, no inter-processor communication is needed. The completion time for the all-pair algorithm is determined by the processor which takes the longest time to find the one-pair solution. Single-source algorithms, for example YEN, SPIRA, and MOFFAT, are designed to find the shortest paths from one specific node to all other nodes. N processors can be used to find all the shortest paths. There is no interprocessor communication among the N processors. The completion time is determined by the processor which has the heaviest work load. All-pair algorithms are designed to find all shortest paths simultaneously. N2 processors can be used. Rich communication between the processors is necessary. For FLOYD, MATRIX, RMATRIX, DANTZIG, MDANTZIG, in each iteration, each processor either performs an addition and a comparison or does nothing. The completion times is determined by the number of synchronized iterations they require. Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^58 Systolic algorithms, for example ROTE and ROBERT, are to be executed on their own systolic structures. Both the number of processors required and the number of time units required are determined by the systolic array. The synchronous inference network algorithm requires N(N — 1) processors. Each processor has 2(N — 2) outgoing and 2(N — 2) incoming links, and performs the unit function and the simultaneous site functions. At most [log 2 (N — 1)1 iterations are required. Table 4.3 shows the numbers of processors required, number of links for each processor, and theoretical completion times for various algorithms. t o is used to stand for the time to perform a basic operation (addition or comparison), and t c is the time for a processor to send or receive a basic datum from another processor. Under the assumption that each algorithm can have its optimal hardware, t o and t c are the same for all the algorithms and are invariant with problem size N. The inference network algorithm has the shortest completion time since it requires the least number of synchronous iterations. The inference network can also solve the shortest path problem in continuous-time. The neural network approach [19], the Hopfield network approach [83], and the connectionist approach [25] all use gradient descent in decreasing their network energy, therefore do not guarantee the solutions to be optimal. On the contrary, the inference network algorithm (Algorithm 3 on page 45) is derived from the simple intuitive procedure of finding the shortest paths, and it is an analog algorithm that guarantees an optimum solution. 4.6.2 Performance on Connection Machine Now let us compare the efficiencies on a specific kind of computer — the Connection Machine. Table 4.4 shows the number of time units required by some all-pair shortest path algorithms when using 0(N 2 ) processors, assuming there are only four links on each Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^59 Algorithm DIJKSTRA A* YEN SPIRA MOFFAT MATRIX RMATRIX FLOYD DANTZIG MDANTZIG ROTE ROBERT Algorithm 1 (page 44) # of Proc. N(N — 1) N(N —1) N N N N2 N2 N2 N2 N2 (N + 1) 2 N(N + 1) N(N — 1) Link/Proc. 0 0 0 0 0 4N 4N 4N 4N 4N 6 4 4(N — 2) Execution Time 0(1N2t0) < 0(1N 2 t.) 3 oN to N2log2 Nt,, Net, rlog2N1(2Nto + tc) 2N2(2Nto + tc) N(2to + tc) 2N(2to + 4) < 2N(2to + tc) (7N — 2)(2t. + t o ) (5N — 2)(24, + tc) < flog2N1(2t0 + tc) Table 4.3: Execution time for shortest path algorithms executed on their optimal hardware configuration. a in RMATRIX is 2 or 3. processor. Each time unit is approximately the time to perform one addition and one minimisation and to propagate two data. Algorithm ROBERT INFERENCE-CM time units 5N — 2 < Nlog 2 (N — 1) MATRIX FLOYD N2 N2 DANTZIG N2 Table 4.4: The number of time units required to obtain the shortest paths when implemented on the Connection Machine. ROBERT, FLOYD and DANTZIG algorithms consider one specific node at each iteration, hence their processor operations are time-dependent. INFERENCE-CM (mapping of INFERENCE on the Connection Machine, refer Appendix D.1) and the MATRIX share the same kind of time-independent PE operations. All but ROBERT occupies a square processor array on the Connection Machine. ROBERT occupies a rectangular geometry which is twice as large as the other's square array. Considering all these factors and the Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^60 number of time units required, the most efficient algorithms should be INFERENCE-CM or ROBERT. The performance of INFERENCE-CM, in comparison with ROBERT, is given in Table 4.5. For a randomly generated distance matrix and N up to 512, INFERENCECM gives shorter CM busy times except for N = 4. The worst-case CM busy times for the inference network (iterate exactly (log 2 (N — 1)1 times), are close to but still shorter than that of ROBERT. Within the problem size we tested, INFERENCE-CM outperforms ROBERT. There are three reasons for this: processor operation for INFERENCE-CM is time-independent; INFERENCE-CM has a smaller virtual processor ratio; and INFERENCE-CM usually takes less than iN log 2 (N — 1)1 time units. When N increases further, however, according to the theoretical prediction, ROBERT will eventually be faster than INFERENCE-CM. 4 N 8 random 0.0874 0.1131 worst 0.0874 0.1181 Robert 0.0828 0.1688 128 16 32 64 0.1759 0.2958 0.7012 1.8825 0.1759 0.3248 0.7971 2.1143 0.3753 0.8965 2.3678 8.0302 256 512 8.2125 37.1443 8.8626 46.9628 36.9367 209.620 Table 4.5: CM busy time for INFERENCE-CM (random), the worst case of INFERENCE-CM (worst), and Robert and Trystram's algorithm (Robert) to solve the shortest path problem given random distance graphs. 4.7 Discussion Although many algorithms have been proposed for the shortest path problem, the inference network algorithm has shown a special feature — it works in synchronous discretetime, asynchronous discrete-time, and continuous-time domains. In synchronous discrete-time, the algorithm was shown to take the least number of iterations and was mapped to the Connection Machine efficiently. For a large systolic Chapter 4. Inference Network for the All-Pair Shortest Path Problem^61 array, it is a technical headache to synchronize all the processing elements. The inference network algorithm was proved to converge in asynchronous updating as well as in synchronous mode. Therefore, global synchronization was not necessary. Simulation of the asynchronous algorithm was conducted and convergence was guaranteed as long as each unit has a chance to update. The inference network was also proved to work in continuous-time using an analog circuit. The analog parallel implementation was shown to solve the N-node problem in a speed determined by the time constants of the first order system. A physical continuous time inference network for a 10-city shortest path problem is being built. Although neural network algorithms can also solve the problem using analog techniques [83, 19], they do not guarantee an optimal solution, due to the use of gradient descent. The inference network algorithm was proved to produce the optimal solution in both discrete and continuous time domains for all problem sizes. Chapter 5 Inference Network for the Transitive Closure Problem This chapter outlines the application of the discrete and continuous-time inference network to the transitive closure problem. Simple binary and operation is required at each site; and each unit performs multi-input or operations. The time complexity of the algorithm was shown to be bounded by log 2 N. The algorithm was proved to converge in continuous-time domain as well in discrete-time. It is demonstrated that the convergence is independent of the problem size, and the optimal solution is guaranteed. Theoretical analysis and numerical simulation of the circuit implementation was conducted. The algorithm was shown efficient and easy to implement. Using basic logic gates, the solution was obtained in nanoseconds range. 5.1 The Problem The transitive closure of a graph indicates whether two nodes in the graph are connected by a path. A graph G has a set of nodes V, and a set of arcs E connecting some of these nodes. A directed graph G can be represented by its connection matrix A = [a;J] in which = 1 or 0 means there is or is not an arc from node i to node j. The graph G*, which has the same node set as G, but has an arc from v to w if and only if there is a directed path from v to w in G, is called the transitive closure of G. G* can be similarly represented by its connection matrix A*. Many systolic algorithms were proposed for the problem [71, 70, 52, 40, 39, 46, 65] which require about N 2 processors synchronized with a clock to perform operations in 62 Chapter 5. Inference Network for the Transitive Closure Problem^63 pre-defined orders. Their time complexities are usually around 5N to 7N time units. Tamassia [82] proposed an algorithm to solve the problem in O(log N) time usinglogNN processors in exclusive-read exclusive-write PRAM model. Toptsis [84] introduces a parallel algorithm for highly scalable multiprocessors. If a perfect hashing function is available, the speedup over sequential algorithm is proportional to the number of processors. Wang [88] proposes a constant time algorithm which solves the problem on N6 processors. Transitive closure is a basic problem in artificial intelligence, and have many applications in database management systems, redundancy elimination, image labeling, and graph theory, etc. The inference network algorithm for the problem can be executed in both discrete and continuous time domain. In its discrete time version, the algorithm is straightforward and the time complexity is bounded by log 2 N. Compared with other discrete-time transitive closure algorithms, the inference network one is not superior. However, the power of the algorithm is in its continuous-time implementation. For the continuous-time version, the network is composed of simple logic gates, and its completion time is in nanoseconds range. The network can be implemented using NAND gates. Simulation of this NANDgate circuit is given. The network is easy to assemble; clock and global synchronization are not needed. 5.2 Unit and Site Functions The inference network for the transitive closure of a directed graph has N(N — 1) units, each unit has N — 2 sites. All unit and site values fall into range [0,1]. u(i, j) = 1 indicates that a path from node i to node j has been found; u(i, j) = 0 otherwise. Similarly, s(i, j, 1) = 1 (or s(i, j,1) = 0) shows a path from node i to node 1 then to node j has (or has not) been found. An initial unit value is 1 if the two corresponding nodes Chapter 5. Inference Network for the Transitive Closure Problem ^64 are connected by a direct arc in the graph. All link weights are 1. The sites and units can be updated asynchronously, synchronously with a clock, or in continuous-time. In the discrete version, the site and unit functions can be as simple as the following: each site performs binary and s (k) (i,j, I) ^VO < i j 01 < N^(5.42) and a unit performs (N — 1)-input or u (k) ( i j ) = u (k-1)(i , , j) (11.9 (k) i , j ^ ( , (5.43) In the continuous time version, the following site and unit functions may be used (refer Figure 5.12): 1^u t (i,^> Vh,th(1, j) > Vh,t > tO^Ts st(i,i,^= t—tn tit(i,^ 1) > vh, tti(1,i) >^t < to (5.44) 0^u t (i,l) < vh or ut(1 ,i) < vh where 0 < vh < 1 and t o is the time when both u t (i, I) and u t (/, j) just exceed Vh• 1^ut-6t(i,i) = 1 or t > t o + ru ut(i,i) = 1=41^31 u s t (i,j, 1) > vh, t < to + Tu 0^V1 St(i 1 :7, 1 ) (5.45) .5_ vh where t o is the time when the first site value exceeds vh. 5.3 Convergence Consideration Since both unit and site functions are non-decreasing, and their values are upper-bounded, the network must converge eventually. For the discrete version, if we define an N x N matrix U (k ) = [u(k)(i, j)] and u (k )(i, i) E 1 Vi, k, Eq. (5.42-5.43) can be effectively expressed by U(k) = U(k -1 ) 0 U(k-1) = [^u(k-1)(i, 0 A u (k-1) (1, j) )^(5.46) ^ Chapter 5. Inference Network for the Transitive Closure Problem^65 to^ 4T , ^ I^ to 1*— Tli I^I I Vt.-U(0) = 1 I tit(i, 1) < Vh I Vt(i, 1) > Vh^Vi St(i li' l) < Vh1 and / or 1^and^and u t (i, j) = 0 1 th g (/, j) < vh 1 ut(1,j) > Vh (a)^ 31 s t (i, j,1) > vh (b) Figure 5.12: (a) Site and (b) unit functions for the transitive closure problem. which is equivalent to u( k )(i, j) = u (k-1) (i, j)^V^ (u (k-1) (i, I) A u (k-1) (1, in VI0i0j If i ---■ j is an arc in the original graph G, define u (° )(i, j) = 1. After the first iteration, if u( 1 )(i, j) = 1, there is a path from i to j composed of one or two arcs in G. In the second iteration, all directed paths containing at most 4 arcs are found and the corresponding unit values are updated to 1. Repeating this procedure p times, all the paths containing at most 2 1' arcs can be found. In other words, a path containing q arcs can be found in at most rlog 2 (q)1 iterations. Since the longest open path in an N-node graph contains at most N — 1 arcs, all the paths between any two nodes can be found in at most n iterations, where n = rlog 2 (N — 1)1^ (5.47) and the corresponding unit values in U(n) contain the solution — U(n) equals to the connection matrix of transitive closure G*. Therefore, the number of iterations required for a synchronous discrete network to arrive at the solution is upper bounded by Eq.(5.47). Eq.(5.44-5.45) for the continuous-time version are similar to Eq.(5.42-5.43), except that the transition periods from value 0 to 1 are taken into account. The convergence rate is now determined by the turn-on time r, and ru as well. Since r, and ru for different Chapter 5. Inference Network for the Transitive Closure Problem ^66 sites and units can be different, unit values do not change simultaneously as they do in the discrete version. To measure the 'distance' between a network state and the final stable state, we can define the following network energy E(t): E(t) = EE(1- max fu t (i, j), u t (i, * u t (1,j)}) 2^(5.48) joi^V/00j The solution corresponds to the global minimum state, where E(t) equals to the number of 0's in the connection matrix of the transitive closure G*. Since 0 < u t (i, j), u t (i,1)* u t (1,j) <1 E(t) EE(1-u t (i,j))2^(5.49) i0i since dE(t)^OE(t) dut(i,i) EL ^5. ^dt^. . . au t (z 1)^dt^ JO. () 50 ago < ^a EE(1-ut(i,i))2 ^aut(i,i)^out(i j)^i#i = —2(1 — ut(i,i)) 5. 0^(5.51) , hence EE dE(t)^ dut(i,j) —2(1 — ut(i,j))^ dt dt^ (5.52) -dude• -i-Ez > 0 — unit Eq.(5.52) shows that the network energy does not increase as long as au values do not decrease, which is guaranteed by Eq.(5.44-5.45). Now we will see that the network has to arrive at the correct solution after Eq.(5.445.45) have been applied. At the minimum state of the network, dE(t) 0 dt < holds. In Eq.(5.50), since 8Elt) dt ^— ^atte(i,j) Vi,j (5.53) 0 for all i and j, Eq.(5.53) is equivalent to OE(t) = 0 or du t (i,j) dt Out(i,..7)^ 0 Chapter 5, Inference Network for the Transitive Closure Problem^67 that is, from Eq.(5.51), Vi,j u t (i,j) = 1 or dui(i,j) dt At this stable state, a unit either keeps its original 0 value or has become 1. All the site values are also either 0 or 1. Since Eq.(5.44-5.45) were applied before the stable state was achieved, the following statements are true for the stable state (t = oo): 1. u o( i,i) = 1^ucc.( i ,j) = 1; 2. u co (i, /)^1 A u.„(1, j) = 1^(i, j) = 1; 3. uo (i, j) = 0 A V1, t (ut(i, < vh or u t (1,j) < vh)^u.(i,j) = 0. From the first two statements, by mathematical induction, we know that for all i and j with a directed path in between, u,,(i,j) = 1; from the last statement, we can conclude that if there is no directed path from i to j,^= 0, or in other words, for all i and j with u c,o (i,j) = 1, there is a path from i to j. According to the definition, u cc (i, gives the solution to the transitive closure problem. The above analysis also shows that unit function can be any non-decreasing monotonic functions. A computational framework for solving the transitive closure problem can be formulated as follows: 1. Construct the N(N-1)-unit continuous-time inference network, where the dynamic behavior of each individual unit and site is governed by Eq.(5.44-5.45). 2. The network has a complex but regular interconnection structure, where each unit sends its output to a set of 2 * (N — 2) units, and receives at the same time the outputs from this same set of units at its N — 2 sites. 3. Initialize the network units with u o (i, j) = 1 if there is a direct arc from i to j in graph G. Otherwise, set uo(i,j) = 0. Chapter 5. Inference Network for the Transitive Closure Problem ^68 4. The network will converge to a global minimum at a rate depending on the transition time of each unit. The converged output j) of unit (i,j) corresponds to the element ct7i in the connection matrix A*. 5.4 Implementation Using Logical Circuits Although the Connection Machine can be used to implement discrete-time inference network algorithms, a simple special purpose inference network can be built for the transitive closure problem, with NAND gates as its major components. From Eq.(5.425.43), we can see a unit performs u(i,j) =^j) V (Vs(i, j,1))^u(i,j) V/^ A (As(i,j, VI /)) where s(i,j,1) u(k-1)(i,l) A u(k -1 )(1, j) A 2-input NAND gate, for example the kind in a 74LS00, can be a site, refer Figure 5.13(a). It has a simplified mathematical model of Eq.(5.54) with turn-off time Ts , see Figure 5.14(a). An (N-1)-input NAND gate with a mathematical model of Eq.(5.55) can be used for each unit, see Figure 5.13(b) and 5.14(b). For a small-size problem with N < 15, a 13-input NAND gate 74S133 can be used as a unit. If N is large, a unit may consist of several m-input NAND gates, based on u = si V s2 V • ••Vsm=siA• • •As„, Vs m +1A-••As2 m V• • •V e• •Asm = :11A•-•Ag m 3m +1A-••A 79- 2 m • • • • • • A .3 m Since each site has two inputs coming from two distinct units, the fan-in capability of the gates is sufficient. However, the output of a unit has to go to 2(N — 1) sites, a regular NAND gate may not have large enough fan-out for a unit. If this is the case, additional driving gates may be used at the output end of units. Chapter 5. Inference Network for the Transitive Closure Problem^69 ut(i, j) s t (i, j, 1) l7ij tit (b) (a) Figure 5.13: Essential components for a unit. The following equations describe the state of the inference network. Site gates perform s t (i, j, 1) = 1 u t (i, I) > vh,ut(1,i) > vh, t > to -I- rs 0 u t (i,^ I)^t^ > vh,ut(1,j) > vh,^<^ to + 1 tit(/ ' ^< Ts (5.54) vh or ut(1,i) < where t o is the time when both u t (i, 1) and u t (/, j) just exceed vh — the minimum value 1 to be considered as 'high', and governed by Ts is the turn-off time of the NAND gate. Each unit is 1^u t_st (i, i) = 1 or t> t o + ru ut(i, j) =^1.-1-. - 3/ s t (i, j, /) < vi, t < to + T. (5.55) 0^V1 st(i,i,l) > where t o is the time when the first input value goes below vi — the maximum value to be considered as 'low', and ru is the total effective turn-on time of the entire unit. Eq.(5.54-5.55) are very similar to Eq.(5.44-5.45) except sites produce s t (i,j, 1) instead of s t (i,j, /), which enables the use of popular NAND gates. With a procedure similar to that in the last section, it can be proved that the solution given by this implementation is also correct. The transition time for each unit and site may be different. The only requirement to guarantee a valid solution is that during the transition period, the output of a gate changes monotonically. In order to give an idea of the completion time, let us assume the transition time for each site and unit are Ts and r. respectively. A site gate takes (1 — vl)T, time to turn Chapter 5. Inference Network for the Transitive Closure Problem ^70 off, and a unit gate takes vh ru time to turn on, refer Figure 5.14. If Ts or ru are identical for all sites and units, the total completion time for the network to get the solution is about flog 2 (N — 1)1((1 — v1 )r, vhru). Use the inference network described above for an N = 10 problem, with 180 74LS00 (a total of 720 2-input NAND gates), 90 inverters in 74LS02, and 90 74S133 (13-input NAND gates, only 9 inputs of each gate are used), the network is expected to arrive at the solution within 5Ons. Here we use the typical transition time for 74LS00 and 74S133 as 9.5ns and 3ns respectively, and vh = 0.6 and vi = 0.2 respectively. This implementation overcomes a major limitations of systolic approaches — global synchronization — by updating the units and sites in continuous-time domain. Compared with the processing elements in a systolic array, each unit in the inference network only consists of a number of NAND gates. Physical implementation of the continuous-time inference network is much simpler than a systolic array. For comparison, a systolic algorithm running on the Connection Machine takes about 10 seconds CM elapsed time, including 0.2 second CM busy time for an N = 10 problem. The potential problem with this implementation, limited fan-out of a gate, may be overcome by using additional driving gates. r- 1 0 0 Tu t2 (b) 1 Vh = Vhru Figure 5.14: Site and unit time delay. Besides the existence of a path between two nodes, if we are also interested in the path itself, a path indicator can be attached to each site. Figure 5.15 shows the circuit for such a path indicator. Under the condition that none of the site or unit value decreases (the Chapter 5. Inference Network for the Transitive Closure Problem^71 sufficient condition for convergence), an indicator at a site will turn on only when the site value becomes 'high' while the unit value is still 'low'. Therefore, when the network is stabilized, if the value of unit (i,j) is 'low', there is no path between the two nodes. If unit (i,j) has 'high' value, and none of the path indicator is on, there is a direct path from i to j. If indicator of site 1 at unit (i,j) is on and unit (i, j) has 'high' value, there is a path from node i to node j through node 1. The path from i to 1 and from 1 to j can be derived recursively based on the path indicator at the corresponding units. Table 5.6 shows how the path indicator works. IND^IN ski/^ ui; Figure 5.15: A path indicator for the transitive closure problem. sui 0 0 0 1 t —8 uii IND 0 0 1 0 1 1 0 1 IND 1 1 1 0 siii i 0 1 1 . t+8 ui; IND 0 T 1 T 1 0 0 1 note IND 0 1 1 0 indicator turns on indicator turns off indicator keeps off indicator keeps on Table 5.6: States of unit, site and path indicator. Chapter 5. Inference Network for the Transitive Closure Problem ^72 5.5 An Example (a) ^ (b) ^ (c) Figure 5.16: An example of a 4-node graph (a), the intermediate result (b) and the transitive closure (c). To illustrate the algorithm and implementation of the network, let us go through the following N = 4 example. The connection matrix of a 4-node graph, Figure 5.16(a), is 11 1 0 0 0 1 1 0 1 0 1 1 (5.56) ‘0 1 0 1i An N = 4 problem requires 12 units, each with 2 sites. A unit is a 3-input NAND gate and a site is a 2-input NAND gate. Figure 5.17 shows the complete circuit. Initial values are applied to the inputs of the inverters at t = 0. It takes the transition time of an inverter and a 3-input NAND gate for 5 unit outputs become 'high' — corresponding to the 5 arcs in Figure 5.16(a). These high unit values are then sent to other units. 5 other unit outputs become high after approximately the transition time of two NAND gates. The paths found so far are show in Figure 5.16(b). After another transition period of two NAND gates, the last 2 unit outputs become high, and the transitive closure, Figure 5.16(c), is obtained. Figure 5.18 shows the unit outputs as functions of time. It Chapter 5. Inference Network for the Transitive Closure Problem^73 takes about 10 nanoseconds to obtain the final result. In Fig. 5.18, unit outputs go up to 'high' in the sequence predicted by Fig. 5.16. However, due to the capacitance and inductance distributed among circuit elements, the waveforms are not as smooth as an ideal output from a digital system. No over-shooting of unit outputs is observed. Down-shooting of less than 0.25V occurs at some units which does not affect the overall updating of the network. Smoother waveforms can be achieved by optimizing the layout of the units within the chip and wiring of elements in each unit. Additional built-in diodes at the output of units can regulate the downshooting of waveforms to be within an acceptable range. 5.8 Discussion The inference network was shown to provide a platform to solve the transitive closure problem. The discrete version algorithm was shown to produce a solution within at most (log(N — 1)1 synchronous iterations. The analog network was composed of logic gates which operated in non-synchronized fashion. The completion time for the analog implementation was shown in nanoseconds range. A potential problem for a large size network is the fan-out limitation of the gates. Additional driving gates may be used to increase the fan-out ability. Since a logic gate is formed by some analog circuit, and the logic circuit solution to the transitive closure problem does not require a clock, analog circuits (or analog simple neurons) can be used to substitute the logic gates for sites and units. Therefore, the circuit for the transitive closure problem becomes indeed an analog circuit (or analog neuron network). The transitive closure problem can also be solved using systolic array or constanttime algorithms. They are synchronous approaches: the calculation at each processing Chapter 5. Inference Network for the Transitive Closure Problem^74 element has to follow some predesigned orders which are derived through tedious analysis of the data dependencies among processors. The inference network structure is obtained from simple optimization procedures, therefore, network algorithm for transitive closure can be derived directly from the problem formulation. Chapter 5. Inference Network for the Transitive Closure Problem ^75 •I^ •^ •^ ri) I •t^ r) Figure 5.17: Circuit for an N = 4 transitive closure problem. Chapter 5. Inference Network for the Transitive Closure Problem^76 i /Out, ... e /out i r — %Oil volts Itonn volts volts /out /out out in volts in volts in volts (n volts volt /out lout /out • Isola, template control fits tx in seconds; n" von n volts in volts 5 4.5 3.5 2.5 1.5 B.5 n Figure 5.18: Hspice simulation of the inference network for a 4-node transitive closure problem. Chapter 6 Inference Network for the Assignment Problems This chapter discusses the behavior of the inference network when it is used in optimization through gradient descent of an energy function — an approach used by many neural network algorithms. The application of the inference network to the assignment problem was discussed. The dynamics of the network was shown similar to that of the Hopfield net for the traveling salesman problem (TSP). The inference network has less links than the Hopfield net, however, the eigenvectors for the two networks were proved similar. Due to the nature of the assignment problem and the use of a decreasing weight for the objective function, the network was shown to converge for a much larger range of problem size than the Hopfield net for the TSP. For all the test cases used with size up to 128, it was demonstrated that the network converges to a near-optimal valid solution. 6.1 The Assignment Problem 6.1.1 The Problem The assignment problem is a classical combinatorial optimization problem. Consider N objects to be assigned to N persons. Both persons and objects are denoted by an integer 0, • • • , N — 1. Given (0 < i j < N) as the benefit of assigning object j to person i, , the objective is to assign each person a distinct object so that the resulting one-to-one correspondence maximizes the total benefit. The Hungarian method, proposed by Kuhn in 1955 [38], is a traditional solution to 77 Chapter 6. Inference Network for the Assignment Problems^ 78 the problem and a special case of linear programming [66]. It requires 0(N 4 ) operations. In 1979, Bertsekas [5] proposed the auction algorithm which guarantees a global optimal solution with a worst-case sequential complexity of 0(NM log(NM)), where M is the largest integer benefit. Wein and Zenios [89] developed a hybrid implementation of the auction algorithm on the Connection Machine to solve large assignment problems. Kosowsky and Yuille [37], motivated by statistical physics, solve the problem using a continuous dynamical system. Kim [34] uses a modified Hopfield's model to solve the assignment problem, and claims achieving better stability and accuracy. (The paper is in Korean.) The assignment problem has many applications. In VLSI layout, it mostly occur in semi-customer design styles such as gate-array or standard-cell design. In gate assign- ment, the slots are library cells and the circuit elements are gates. Supposed each cell can accommodate exactly one gate and each gate has a set of legal locations to preserve the function of the circuit, the assignment is to minimize cost. Similarly, optimal assignment is also used for determining pin locations in global and area routing. Refer [51] for details. 8.1.2 Optimization Formulation The assignment problem can be formulated into a constrained optimization problem. A valid solution satisfies the constraint of one object for each person. The valid solution which maximizes the total benefit is defined as an optimal solution. Let us define 0 < uji < 1 for all 0 < i, j < N. Ili; =1 means person i is assigned to object j. u i; = 0 means person i does not get object j. Based on these definitions, Ei Doi %pa = 0 guarantees that person i does not get two distinct objects. Similarly, Ei that object i is not assigned to two distinct persons. D oi ujitth = 0 guarantees E i ti q = 1 and Ei uij = 1 ensure that only one person gets object j, and person i only gets one object. If ti q 's are put Chapter 6. Inference Network for the Assignment Problems ^ 79 into an N x N array, then a valid solution is an array with exactly one 1 in each row and column, all the rest elements are 0. The objective function to be minimized is: E = AELEuijuil -^F CE[(Etiii — 1 ) 2 + (Etsii — 1) 2 ] j 10.i^ i i^(6.57) —D E E u, r, where 0 < ujj < 1, 0 < i j,1 < N, A, B, C, D > 0 and ri; > 0 Vi, j. In Eq.(6.57), the , sum of term A, B, C gets a minimum of 0 when the constraint is satisfied. Term D is the total benefit. Due to the symmetry of the constraint, the first two terms should be equally weighed. Therefore A = B. A, B and C must not be zero. If A and B are zero, Eq.(6.57) may get a minimum state where all unit values are close to N. If C is zero, the minimum state of Eq.(6.57) is ui = 0 Vi, j, which is not a valid solution either. ; The assignment problem can have some further constraints. For example, a certain person must or must not get a certain object. If person i has to get object j, equals to 1 constantly, which reduces the problem dimension by one. If person i must not get object j, uu = 0 can be set initially and be kept unchanged for the entire calculation. Neural network approach is also used in other optimization applications. Sriram [74] introduced a neural network solution to an NP-complete two dimensional assignment problem. Its energy function is the sum of three terms. Two of the terms represent the constraint, and the third one measures the quantity to be optimized. Constants in the energy function are determined experimentally. Wacholder [86] introduced a neural network-based optimization algorithm for the static weapon-target assignment problem. The similarity of all neural network approaches for constrained optimization is that their energy functions are composed of terms for constraints and the quantities to be optimized. However, distinct energy functions correspond to different dynamic behaviors of the network. Chapter 6. Inference Network for the Assignment Problems ^ 80 6.2 Unit and Site Functions for Optimization The inference network for the assignment problem consists of N 2 units, and N sites on each unit; unit value ui; has the same interpretation as in Eq.(6.57). The algorithm is based on gradient descent — to decrease the value of the objective function Eq.(6.57) monotonically. The following site and unit functions can guarantee that AE = E(k+1) — E (k ) < 0 is satisfied for small changes. The site 1 at unit (i,j) has a site function of sijl (6.58) =^U:1 For all units (i,j), the unit function is defined as AO' + KdAun (6.59) where 1^x> 1 i(x)= (6.60) x 0<x<1 0 x<0 Aut! ) = (A + B)u;r i — EgA^C)te^(B C)411 ) ) + 2C + 2Aufri —^(A^+ 2C + r11 (6.61) (6.62) A, B, C, D are the constants in Eq.(6.57), Kd > 0 is a constant determining the magnitude of the change. B = A is used in Eq.(6.62). D < 0 is also allowed. Summarize Eq.(6.59-6.61), it can be observed that each unit, together with its sites, act very similar to a neuron in the Hopfield net. See Figure 6.19. Each neuron in the Hopfield net is connected with all other neurons. However, each unit in the inference network is only connected with the units in the same row or column. Chapter 6. Inference Network for the Assignment Problems ^ 81 Figure 6.19: Unit and sites in the inference network act like a neuron. Continuous-time unit and site functions for the assignment problem can be similarly defined: uii = f( Kcuij)^ (6.63) where vij satisfies d vi• d t = ( A + B)ui; — E((A + C)ui/ + (B Outi) 1 = 2Au i; — > 2(A -I- 2C — ri;^(6.64) 2 C)siii^—ri •^ (6.65) and f(x) and sift are defined the same as in Eq.(6.60) and Eq.(6.58). 6.3 The Decreasing Objective Function The value of the objective function decreases monotonically for Kd > 0 and any values of A, B, C, and D. According to the definition of function f(x) in Eq.(6.60), if 0 < u (i.11! ) KdAu(ii ) < 1, ur ) — u!" ) = KdAun if 14.11 ) KdAu!" ) > 1,^u!I1+1) — u!, ) = 1 — u!" ) > 0 & KdAt4.11 ) > 1 — u!, ) > 0; if 4 ) KdAuT < 0,^uLk+1) — use) = _ u lie) < 0 & KdAuT < —ufri < 0. Chapter 6. Inference Network for the Assignment Problems ^ 82 In any of the cases, 4+1) — ttn dAUT 0 (6.66) holds. The derivative of the objective function Eq.(6.57) can be derived: dE = —2 ((A + B)ui; — E((A C)uit (B + C)uu)+ 2C + rii ) d uij^ 1 (6.67) ii^— u•SJ • ii s sufficiently small) When uji changes slowly, (u (k+1)^(k) E(k+i) — E (k)^v-■^dE (ti!+. 1) t dui; Notice that dE d u i; Kd Au! k) = —2Kd ((A + B)utj — E((A 2 uti) + 2C + —ri i ) 5_ 0 nua +^n^ 2 Using Eq.(6.66), we can get d Eku, (k+i) n (9) < 0 V i,.i i; —^ — d ui; we thus proved that E(k+ 1 ) - E(k) < 0 (6.68) It can also be proved that network energy decreases for continuous-time updating as well. According to the definition of function fix) in Eq.(6.60), h. = K LL .h when 0 < < 1 u,, = Kcvsi Lt_t d t^c d t when vii > 1^uii = 1 ( duIL g- = 0 when vu < 0^uii = 0 d ti-• dt — 0 therefore dE dt = L'Ll d^d :=LIc4" 1-ddu•• s7 dt __ using Eq.(6.67) and Eq.(6.64), we have dE <0 dt Chapter 6. Inference Network for the Assignment Problems^ 83 The sigmoid function 1 ex - eX) 2^er e - r (6.69) - (1 -I- can be used instead of the piece-wise linear function Eq.(6.60) and still keep Eq.(6.68) hold. However, Eq.(6.60) makes the analysis of the network dynamics easier and results similar performance as using the sigmoid function, as long as Kd is kept small. 8.4 TSP, the Hopfield Model and Comparison 8.4.1 The Hopfield Model for the TSP The traveling salesman problem (TSP) has the following simple description: given a complete digraph of N nodes with an N x N matrix II4 > 0 defining the length of arc (i,j), find a minimum length circuit (or tour) which goes through each node exactly once. Distances dii ire symmetric, that is, dii = di ;. Also assume cli ; = 0. The length d(T) of a tour T is given by d(T) = E(1,i)ET • The problem has been studied extensively for the past few decades and many algorithms have been proposed for its exact solution. Since it is an NP-complete problem, one is therefore also interested in approximate algorithms which take less time to get a good, but not guaranteed optimal, solution. Hopfield and Tank [30] gave 0 < vii < 1 the following interpretation for all 0 < i,j < N: vij = 1 means city i is the j-th stop in the salesman's tour; and vki = 0 means city i is not the j-th stop. A valid tour is one with exactly one 1 in each row and column of matrix [vii]. The energy function to be minimized is B c (EEv,-Ny E = jEEEvxivzi + E E Evz,v,+ _ 2 x i i0i^2 , x vo x^x D i dzliv.i_ vv , i+ 1+ v o _ i ) + 2EEE — — X if*T ( • (6.70) Chapter 6. Inference Network for the Assignment Problems ^ 84 where A, B, C, D > 0. The first and second term indicate that only one 1 is allowed in each column and row. The third term means the total number of l's must be N. The fourth term is the tour length. 8.4.2 Comparison of the Assignment Problem with the TSP Although a lot of studies and improvements have been made to the original Hopfield net, we will compare the inference network with it due to the similar appearance of the energy functions. From Section 6.1.2 and Section 6.4.1, we have observed some similarities and differences between the assignment problem and the TSP, and their optimization formulations; they can be summarized as follows: Similarities: • Constraints can be formulated into the same restrictions: one 1 in each row and each column; • The A and B terms in objective function Eq.(6.57) is the same as those in energy function Eq.(6.70); • Both Eq.(6.57) and Eq.(6.70) are to be minimized; • Both use gradient descent which may end up in local minimum state; • When D term is ignored, the eigenvalues of the connection matrices for both problems are similar, as we will see later and in [2]. Differences: • Each valid assignment has one and only one distinct set of ?A i; values; however, a single tour corresponds to 2 different sets of vi values, since a circular tour can be ; interpreted in two opposite directions; Chapter 6. Inference Network for the Assignment Problems ^ 85 • The C term in Eq.(6.70) restricts the total number of neurons on; C term in Eq.(6.57) specifies the number of on-neurons in each row and column; • D terms for total benefit and total tour length are different due to the interpretation of neurons. Because of the similarities of the two problems, many dynamical behaviors of the network for the assignment problem are similar to those of the Hopfield model for the TSP. Using a negative D, the objective function Eq.(6.57) for assignment can also be interpreted for the TSP, in which u ij = 1 means city j is immediately after city i in the salesman's tour. Since this formulation often yields disconnected sub-circles, additional constraints prohibiting sub-tours must be used. Xu [92] studied this approach. The major reason responsible for the Hopfield model's invalid tours does not exist in the similar model for assignment. A fundamental problem with the Hopfield TSP model is its degeneracy. Since a tour can be clockwise or anti-clockwise, it has 2 distinct tuj valid states. The network aims at one tour but proceeds towards the 2 corresponding states, and finally it often ends up with an invalid state. In the assignment problem, each valid assignment corresponds to a unique u jj array, hence the network evolves towards a particular state. The constraint that there are exactly N l's in the matrix [uij ] is represented differently in the neural models. The Hopfield model has a tendency of getting two or zero l's in a row or column [90] while the total number of l's is still N. C term of Eq.(6.57) specifically restricts the number of l's in each row or column to be one. This reduces the chance of assigning one object to two persons or two objects to one person. Chapter 6. Inference Network for the Assignment Problems^ 86 6.5 Dynamics of the Network 6.5.1 Connection Matrix Let u be an N 2 x 1 vector and ui; be its (iN + j)-th element. Eq.(6.59-6.62) can be represented by the following matrix operation: u (k+1) = f(u(k)^KdPu (k) + KdQ)^(6.71) where P is an N 2 x N 2 symmetric matrix. Its element at row iN + j and column rN + is (when B = A) PiN-1-jo"Ni-s = 2,46(i — r)6(j — s) — (A + C)45(i — r) — (A + C)O(i — s) (6.72) Q is an N 2 x 1 vector, its element at row iN + j is qiN+j D =^ + —ri; 2 (6.73) and f(x) = f((zi,x2,• • • ,x.)t)= (f (xi), 45(x) = f (x2), • ••, f(x.)Y {1 x=0 0 x# 0 (6.74) function f (x) is defined in Eq.(6.60). The determinant of IsI — PI, though lengthy, can be derived (see Appendix E.2 for details): IsI — PI = (s + 2(N — 1)A + 2NC)(s — 2A) (N-1)2 (.9 + (N — 2)A + NC) 2(N-1) Therefore, P has three distinct eigenvalues: = —2(N — 1)A — 2NC^ (6.75) Chapter 6. Inference Network for the Assignment Problems ^ A2 = 2A^ A3 = —(N — 2)A — NC^ 87 (6.76) (6.77) since A, C > 0, we have A l < 0, A2 > 0 and A3 < 0. The eigenvector corresponding to A l is e1 = (1, 1,^, 1) t^(6.78) Or 1 1^1 el= (— — •• • — Y N' N' (6.79) 'N The eigenvectors corresponding to A2 and A3 form (N — 1) 2 - and 2(N — 1)-dimensional sub-spaces 32 2 and 323 respectively. For any A, C > 0, A l A2 A3, therefore, e i 1 32 2 1 323 The objective function Eq.(6.57) can be written using matrices P and Q as: E = —u t Pu — 2Q t u + 2C ^ (6.80) 8.5.2 Eigenvalues Move the Network Towards the Valid Sub-space Since Eq.(6.71) contains non-linear function f', it is hard to analyze its dynamic behavior. However, when Kd is small enough to keep J ' (x) x, Eq.(6.71) can be simplified into . u(k+1) = u(k) KdP11(k) KdQ ^ (6.81) Let u be decomposed into three components u 1 , u 2 and u 3 corresponding to direction e l , sub-space 32 2 and R3 respectively. U = U 1 -F U 2 + U 3 ^ (6.82) where u 1 = (u • é l )e i . Since A 1 , A2, and A3 are eigenvalues, we have p u (k) = A iu l(k) A2u 2(k) A3u 3(k)^ (6.83) ^ Chapter 6. Inference Network for the Assignment Problems^ 88 Q in Eq.(6.81) has component of^hence, Q can not be pre-decomposed into the three sub-spaces. In order to analyze network dynamics, let us assume D = 0 for the moment, hence Q = 2Ce 1 = 2C/Ve 1 . With these assumptions and decomposition, we have, from Eq.(6.81), u 1(k+1) = u l (k) + Kd u l (k) + 2CNKde 1 = (1 + Kd A i )u l(k) + 2CNKda 1^(6.84) u 2(k+1) = u 2(k) + KdA2u 2(k) u 3(k+1) = u 3(k) + KdA 3 u 3(k) ^ ^ (6.85) (6.86) On the other hand, according to Eq.(6.80), we have, E = —Ailu 1 1 2 — A21u 2 1 2 — A 3 1u 3 1 2 — 4CNIu 1 1 + 2C 2CN 4C2N2 =^ )2 A21u 2 1 2 A31u312 Al + 2C^(6.87) Recall that A i , A3 < 0, and A2 > 0, in order to minimize E, what are required are 2CN ^1.111 1 1 = _2CN^ — lAil 2. 1u 2 1 2 increases with the increase of k, and 3. 111 3 1 2 decreases with the increase of k From Eq.(6.84), we can get ^i (k+i)^ 2CN el) 2C N u^= (1 + KdA i ) k (u 1(°) + Ai^Ai el for small Kd, we can have —1 < KdA i < 0, hence i ( c..)^2C N u = —e1 Ai therefore, lui (c.) 2CN I . lall Chapter 6. Inference Network for the Assignment Problems^ 89 which ensures that the first requirement will eventually be satisfied. Since A2 > 0, Eq.(6.85) ensures 1u 2 I increasing with time. Similarly, the negative A3 ensures lu 3 1 decreasing with time. For a valid solution, E i Ei ti,7 = N holds, hence, ( ) u 1( c°) • e l = u ( 00 ) • e l = — (u ( c° ) • el) = N NN = 1 — and u 2(°°) will be very big (without the non-linear function f (x)) and u 3(°°) will be zero. The sub-space formed by eigenvectors corresponding to A2 contains valid solutions, and the sub-space formed by eigenvectors corresponding to A3 contains invalid solutions. 6.5.3 The Role of the Non-Linear Function When Kd is set sufficiently small, KdAtdjk. ) can be very small. If 0 < uLk. ) < 1 also holds, Eq.(6.81) will describe the exact behavior of the network. In this case, vector u moves in the direction towards the valid solution space. Only when u is sufficiently close to a corner of the N-dimensional hypercube, the non-linear function has its impact. But since KdAu is kept small, the result f(u KdLu) can only be slightly different from u KdLlu. Therefore the result vector still points very close a hypercube corner. hi other words, the solution is almost determined before the non-linear function starts to affect the change; the non-linear function only keeps lui within a certain range. 6.5.4 The Complete Q Term D = 0 was assumed in the last two subsections. However, D term is a major part of vector Q. Since this term is related to r ii , it may have components in all of el, 322 and 323 subspaces. For a positive D, this term moves the vector u in a direction in favor of large rii , although this movement can be either towards or away from the valid solution space. However, if the speed of moving the vector towards the valid space is properly rated with Chapter 6. Inference Network for the Assignment Problems^ 90 the speed towards a large benefit, this term can move the vector approximately within the valid space in order to arrive at a near-optimal solution. Like all the other gradient descent algorithms, vector u will fall into local minimum if constants A, B, C and D are not properly given. 6.5.5 Initialization, Updating Rate and Termination Ideally, initial values u!P do not affect the final solution, since the solution is determined by I u 1 I ±L , 1u 2 1 —4 00, 1u 3 1 0, non-linear function f(x), together with the D term implication. A simple and unbiased initialization is 49 = k for all i,j. This initialization can keep Du;, ; roughly invariant with N, so that Kd can be almost the same for different N's. A uniform initialization may cause problems for some specific benefit matrices. For example, when the benefit matrix [r 12} has two rows or two identical columns, the problem has multiple symmetric solutions. In order to break the symmetry, small noise may be added to the initial values. For example, (o)^1 ui• _ — — N *s (6.88) where s is a random number distributed between 0.9 and 1.1. A large value of Kd can speed up the converging process. However, a too large Kd may introduce the non-linear effect too early and cause oscillation. Kd should keep KdAu ii within a small percentage of the initial values 49 so that non-linearity does not occur too early. Three different criteria are used to determine when a solution is achieved or whether the calculation should be terminated. The first of these is the discovery of a valid solution. Since uki represents assignment, a valid solution is obtained if and only if there is exactly one neuron in each row and column with high value and all the rest have low values. To determine whether a particular neuron is high or low, a threshold rule can be applied. Chapter 6. Inference Network for the Assignment Problems ^ 91 • For example, those neurons with values greater than Th = 0.65 are considered high, and those below Ti = 0.30 are considered low. The second criterion is to check whether the network is still changing. If the total change of the uij values between two successive iterations is less than 7'2 = 10 -4 , the neurons are considered stabilized, and the network reaches a stable final state. The final termination criterion is a time-out test. If after T3 = 1500 iterations, neither of the above criteria has been satisfied, the system is considered divergent. 6.6 Determining Constants From Eq.(6.57-6.62), we know that constants A, B, C, and D are related. B can be set equal to A due to the symmetry of the constraint. In Eq.(6.62), if we assume B = A, uki = N vi, j, Du , — 2AV- can be obtained. In order to get a D invariant with ; N, we choose A = B = NN 1 . From the previous discussion, we see that constants A and C determine the eigenvalues, Eq.(6.85-6.86) show that vector u moves towards the valid sub-space with a speed IA 2 Kdi = 2AKd, and moves away from the invalid sub-space with a speed of IA3 Kd I = ((N — 2)A + 2NC)Kd. If D = 0 and Kd is sufficiently small, any positive A and C lead to a valid solution. From Eq.(6.57), if 5 is too large, then C term overweighs A term, hence the network may arrive at a stable state where all u ii 's are close to k. On the other hand, if 5 is too small, the network may converge to the all zero state, where A and B terms are minimized and C term is not. Simulation shows that the ratio 2 can vary in a certain range to obtain a valid solution. However, the range is approximately anti-proportional to N. Term D moves the vector u towards the direction in favor of the large rij's. However, this direction may or may not be the direction of the valid sub-space. If rij are given Chapter 6. Inference Network for the Assignment Problems ^ 92 such that the direction of large ru's is close to the direction of invalid sub-space, a large D may force the vector stay in the invalid space. If the direction of large ru is close to the direction of valid sub-space, but still with some difference, a not-to-large D leads to a solution at a corner of the hypercube; whereas a too-large D bends the vector away from the corner. The larger D is, the farther the vector is from the corner. However, the optimization problem is meaningless if D is too small. From simulation we noticed that the direction of the solution vector u is usually determined at the early stage of the calculation. In the beginning of the calculation, the large D term contributes by moving the vector towards larger total benefit state; when the calculation is near the end, it often tends to draw the vector away from a hypercube corner. Based on this observation, we start with a large D o and reduce its value by a small percentage in each iteration: D := pD, where 0 < p < 1. With a decreasing D, the network can always arrive at a valid solution, since When D term is negligible, the minimum state of Eq.(6.57) is always a valid assignment. The value of D is also related with the distribution of ru. In a valid solution, NN-1 of the elements are expected to settle down at 0. Suppose benefits ru have probability distribution of g(r), and satisfies f f .g(r)dr = NN-1 , suppose further E uit = 2, and uu = k, the solution D o from 1 Do ... = 0 r Au k; = 2A -1Tr 2(A + C) + 2C + can be a guideline in determining initial value Do. 8.7 Simulation Results The overall performance of the algorithm is satisfactory. For all the examples we used with N varies from 2 to 128 and even distribution of benefits ru, the network always converges to a valid assignment as long as the constants are chosen based on the above Chapter 6. Inference Network for the Assignment Problems^ 93 guidelines. For a particular set of benefits rij, the solutions corresponding to different values of 5, 1,(4--4 are usually the same. The value w sometimes changes the solution for large problems, but the total benefit does not change significantly. For all the examples we tested with N < 12, the inference network algorithm gives the same result as exhaustive search. 8.7.1 Small Size Examples The following example with N = 10 illustrate some properties of the network. The benefits^are evenly distributed in [0, 1]: 0.053 0.914 0.827 0.302 0.631 0.785 0.230 0.011 0.567 0.350 0.307 0.339 0.929 0.216 0.479 0.703 0.699 0.090 0.440 0.926 0.032 0.329 0.682 0.764 0.615 0.961 0.273 0.275 0.038 0.923 0.540 0.443 0.837 0.368 0.746 0.469 0.505 0.328 0.480 0.424 0.678 0.139 0.763 0.959 0.707 0.242 0.663 0.759 0.332 0.455 0.685 0.716 0.136 0.720 0.832 0.751 0.681 0.106 0.379 0.719 0.381 0.919 0.163 0.219 0.639 0.261 0.040 0.144 0.941 0.872 0.569 0.972 0.364 0.684 0.931 0.423 0.927 0.594 0.182 0.611 0.401 0.868 0.680 0.538 0.940 0.512 0.289 0.621 0.970 0.668 0.693 0.352 0.940 0.208 0.571 0.579 0.821 0.963 0.724 0.762 Using exhaustive search (takes about one hour on Sun/3), the optimal assignment with total benefit of 9.053 was found: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Chapter 6. Inference Network for the Assignment Problems ^ 94 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 The inference network algorithm gives the above optimal solution almost instantly. For this example, we used initial value 49) E V], and D decreases by 0.6% in each iteration (p = 0.994), the ranges of C, Do , Kd that keep the network to converge are: 0.05 < C < 9.9 with A = B = 0.5 < Do < 11 with A = N 1 ,Do = 4A, p = 0.994, Ka = 0.01 75 Kd = 0.01 C = p=0.994, Tv- , p B =N 1' 0.0005 < Kd < 0.014 with A = B = N C= 75 Do = 4A, p = 0.994 With A = B = NN 1 , C = N, Do = 4A, and Ka = 0.01, D's decreasing rate p can be within the range of p E [0.75,1] to keep the solution optimal. Ka is the main factor determining converging speed. Figure 6.20 shows the number of iterations required for the network to settle down with respect to different values of the constants. Additional constraints, such as somebody has to get a certain object or somebody must not get a certain object, can be introduced to the algorithm without much extra efforts. The same non-zero Kd is used for all the neurons which do not involve any additional constraints; and Kd = 0 is used for all the neurons with additional constraints. If person i has to get object j, set u• 9 = 1; if person i must not get object j, set tet? ) = 0. In the above N = 10 example, if we specifically require u m = 0, u6,8 = 0 and u 3 , 2 = 1, u8 ,5 = 1, our algorithm gives the following result with total benefit of 7.886: Chapter 6. Inference Network for the Assignment Problems ^ 95 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 If the benefit matrix [rii] has two identical rows, the assignment for the two corresponding persons can be exchanged without changing the total benefit. These two assignments are considered symmetric. Similarly, if there are two identical columns in the benefit matrix, there will also be two symmetric assignments. Even two rows or columns have only part of them identical, it is still possible to have symmetric optimal assignments. For example, the following matrix (a) has two optimal assignments (b) and (c), both having total benefits of 3.0. 0.4 0.6 0.9 0.1 0 0 1 0 0 1 0 0 0.4 0.6 0.9 0.1 0 1 0 0 0 0 1 0 0.1 0.2 0.4 0.8 0 0 0 1 0 0 0 1 0.7 0.4 0.2 0.9 1 0 0 0 1 0 0 0 (a) (b) (c) If initial values are all identical, we will get u 01 = u 1 ; for all j and all iterations, therefore, the network will not arrive at a valid solution; instead, it will stay at a state with 0 < 71 < uoi = un,u02 = u12 < Th < 1, U23 = 1.0, U30 = 1.0, and all the rest are Chapter 6. Inference Network for the Assignment Problems^ 96 0. However, if small amounts of noise are applied to ut? with Eq.(6.88), the network is ) observed to arrive at either solution (b) or (c) quickly. The number of iterations required for the neural algorithm to arrive at its solution is mainly determined by the distribution of benefits ri; 's rather than their range (as for the auction algorithm). If larger ri; values are scatted in different rows and columns, the network arrives at the solution fast. On the other hand, if large benefits are concentrated in some rows and columns, the convergence speed is slower. For the following random benefit matrix (a) 0.053 0.914 0.827 0 1 0 0.302 0.631 0.785 0 0 1 0.230 0.011 0.567 1 0 0 (a)^(b) with A = B = N 1 , C = Do = 4A, p = 0.994, and Kd = 0.01, the optimal assignment (b) is obtained in 70 iterations. Using the same constants for another benefit matrix (c), 0.1^0.8 0.11 0 0 1 0 1 0 0 1 0 1 0 0 0.9 0.85 0.9 1 0 0 1 0 0 0 0 1 0 0 1 0.16 0.87 0.17 0 1 0 0 0 1 1 0 0 0 1 0 (c)^(d)^(e)^(.0^(g) it takes 168 iterations to arrive at the optimal assignment (d). Since matrix (c) is a tricky example with three near-optimal valid assignments (e), (f), and (g). If the noise on the initial values are too big, the network may stabilized at a sub-optimal state. The number of iterations required increases gently with N. Figure 6.21 shows the results. The range of C is also related with problem size N. The larger N is, the smaller the C range will be to keep proper ratios of the A, B, C terms. Figure 6.22 shows the maximum C values for different N's which yield valid solutions. Chapter 6. Inference Network for the Assignment Problems ^ 97 8.7.2 Solve Large Problems on the CM Simulations had been conducted for the cases of N = 64 and N = 128 on the 8K and 16K Connection Machine respectively. The algorithm is outlined in Algorithm 4. Table 6.7 and Table 6.8 gives the CM time and total benefits obtained. All benefits r ii are evenly distributed in [0,1]. By using a largest converging Ka, the CM time can be made shorter. Changing D's decreasing rate p may result in a different assignment with slightly different total benefit. Statistically, given i evenly distributed numbers in [0, lb the mathematical expectation of the largest number is If we pick the largest number in the first row and then delete the column it is in, pick the largest number in the second row and delete the column it is in, and so on, the mathematical expectation of the total NTI:T. i benefit is EN =^ For N = 64 and N = 128, the expectations are E64 = 60.24 and E128 = 124.56, respectively. Although the inference network algorithm does not guarantee an optimal solution, the average total benefits we obtained are greater than these expectations. For comparison, optimal total benefits from the Hungarian method are also given. Algorithm 4 : Assignment on the CM for all PE, u[i,j]:= uo, where u0 E while (terminating criteria not met) (refer Section 6.5.5) BEGIN PARALLEL for all i,j h[i, (—i — j)mod(N)] •= u[i, v[(—i — j)mod(N), j]:= u[i,j] FOR k = 0 TO k = N —I BEGIN update u[i, j] using Eq.(6.58-6.62) propagate: 14i, j] := h[i, (j — 1)mod(N)] v[i,j]:= v[(i —1)mod(N),j] END Chapter 6. Inference Network for the Assignment Problems^ 98 END PARALLEL random problem CM busy time benefit-Inference benefit-Hungarian random problem CM busy time benefit-Inference benefit-Hungarian random problem CM busy time benefit-Inference benefit-Hungarian 1 2 3 4 5 31.5 37.7 37.9 31.4 37.7 61.832 62.367 62.047 62.232 61.168 62.450 62.367 62.693 62.443 62.666 8 9 11 10 12 41.7 32.4 38.6 32.0 45.8 61.717 61.656 61.109 61.598 60.093 62.653 62.551 62.220 62.509 62.594 16 15 17 18 19 41.5 30.5 39.3 36.5 35.0 61.417 61.614 60.900 61.065 61.263 62.294 62.460 62.357 62.443 62.486 6 7 33.8 40.6 61.634 61.461 62.294 62.559 13 14 43.4 38.0 62.127 61.492 62.613 62.633 20 average 31.2 36.8 62.002 61.5317 62.659 62.4957 Table 6.7: CM busy times (in seconds) and assignment results for 20 random N = 64 problems using inference network (A = B = N i , C = ri,Do= 4,p = 0.996, Kd = 0.008) and corresponding result from the Hungarian method. random problem CM busy time benefit-Inference benefit-Hungarian random problem CM busy time benefit-Inference benefit-Hungarian random problem CM busy time benefit-Inference benefit-Hungarian 1 118.8 124.846 126.503 8 109.6 124.131 126.323 15 120.4 124.493 126.293 2 112.4 125.232 126.478 9 120.6 124.851 126.227 16 135.3 124.825 126.464 3 94.7 125.295 126.318 10 99.4 125.742 126.368 17 117.1 124.880 126.362 4 121.3 124.723 126.509 11 106.7 125.293 126.486 18 111.8 124.584 126.411 5 101.6 124.165 126.276 12 117.3 124.835 126.465 19 113.5 124.104 126.386 6 115.1 124.697 126.463 13 133.9 125.710 126.478 20 124.6 125.109 126.587 7 99.2 124.983 126.187 14 114.4 124.180 126.517 average 114.4 125.0321 126.4051 Table 6.8: CM busy times (in seconds) and assignment results for 20 random N = 128 problems using inference network (A = B = A,c=i,,D0= 4,p = 0.998, Kd = 0.007) and corresponding result from the Hungarian method. Chapter 6. Inference Network for the Assignment Problems ^ 99 8.8 Discussion The auction algorithm can be parallelized. Different from the Hungarian method and the inference network algorithm, the complexity of the auction algorithm depends on the range of the benefits: its complexity is O(NM log(NM)), where M is the largest integer benefit. The auction algorithm is efficient at the beginning when many people bid for assignments. However, it usually has a long 'tail' when only a few people are left unassigned. This tail results in a relatively long completion time. Wein etc. [89] proposed a hybrid auction algorithm for the CM in which tails are truncated and processors are dynamically assigned to the active bidders. The algorithm can shorten the tailing effect when N is large (for example N > 256) and virtual processor ratio is greater than 1. It can solve an N = 1000 or N = 2000 problem with maximum benefit M = 1000 on the 16k CM in 22.2 and 96.3 seconds respectively. Therefore, the hybrid auction algorithm is more efficient than the inference network algorithm for a large size assignment problem. In all our simulations with N up to 128, the inference network algorithm was shown to produce valid assignments and the results were usually close to optimal; the smaller the problem size was, the more likely an optimal solution would be reached. 20 different benefit sets for each problem size N were tested. The results obtained are given in Table 6.9. The optimum solutions in the table were obtained from the Hungarian method. problem size N 8 16 32 64 128 4 0 # of optimal solutions obtained 20 20 13 2 1 average difference from optimum 0.0% 0.0% 0.5% 1.1% 1.5% 1.1% Table 6.9: Solution quality vs. problem size. 20 distinct random examples were used for each N. Chapter 6. Inference Network for the Assignment Problems ^ 100 The computational complexity of the Hungarian method is 0(N4 ). It takes about two minutes for the Hungarian method to produce an N = 128 assignment on a Sparc station. With a much slower clock rate (7 MHz), the inference network algorithm on the CM also takes about two minutes. The Hopfield's model for the traveling salesman problem presented an approach for constrained optimization different from traditional AI methods. Unfortunately, it often yields invalid solutions and does not work well for large problems (for example N > 15) [90]. Despite the similarities, the inference network algorithm for the assignment problem did not yield invalid solutions as the Hopfield model did for the TSP, due to the formulation of the problem and the setting of the constants (Simulations were conducted for random problems with size up to 128). Both the assignment problem and the TSP are formulated to find an on-off neuron pattern to optimize an objective function. In the inference network, each valid assignment corresponds to a single state satisfying the constraint — exactly one neuron on in each row and column. However, the Hopfield model for the TSP is degenerated: in the Hopfield net, each valid tour corresponds to two distinct states, and both satisfy the constraint. Therefore, although the Hopfield net is a great idea for combinatorial problems, the TSP may need to be remodeled into some other constraints on neurons. The D term in the objective (energy) function measures the quantity to be optimized. Therefore this term must be significantly large to achieve the goal of optimization. On the other hand, large D terms lead to divergence; a small D is advantageous for the network convergence. A compromise between these two aspects is to use a decreasing D term. Our simulation has shown that using a decreasing D, the network converges for all the test cases with problem size up to 128. Since the Hopfield net has similar eigen characteristics to the inference network, it can be expected that the Hopfield net can also have better convergence if a decreasing D is used. However, a decreasing D will not Chapter 6. Inference Network for the Assignment Problems ^ 101 solve the degeneracy problem of Hopfield's model for the TSP. Chapter 6. Inference Network for the Assignment Problems ^ 102 cf0 1 4. 200.00 E o 1:1 i 180. * E 3.00 o z 2. B o d 160.00 Z 4i4 0 0 Z 0. 0.00 4.00^8.00 12.00 Kd*10-3 (a) 140.00 2.00^6.00 10.00 c (b) 320.00 E 280.00 o Is V 240.00 •P, t 200.00 d Z 160 00 . 120.00 2.00^6.00 10.00 D (c) Figure 6.20: (a) A = B = NN i , C = R, Do = 4A, p = 0.994, the larger Kd is, the faster the network converges; (b) A = B = A, Do = 4A, p = 0.994, Kd = 0.01, the network converges slowly when C is small, and converging speed is almost the same within a certain range of C; (c) A = B = -f is i , C = R, p = 0.994, Kd = 0.01, the larger Do is, the faster it converges. Chapter 6. Inference Network for the Assignment Problems^ 103 800 600 0 .1= g 400 200 0 10^20^30^40 Figure 6.21: Number of iterations vs. problem size. All benefits r e; are randomly distributed in [0, 1]. A = B = N 1 C^Do = 4A, Kd = 0.01, u!`) E [V, kl ]• , ) Figure 6.22: Maximum C for the algorithm to converge vs. problem size. All benefits are randomly distributed in [0, 1]. A = B = #7 , Do = 4A, Ka = 0.01, tiT E^W-]. Chapter 7 Conclusions 7.1 Summary This thesis introduced a novel parallel computing platform — the binary relation inference network, which embodies time-efficient parallel computing structures to perform a class of optimization problems. Candidates competing for the optimum were explicitly represented by the sites at a unit; the selection of the optimum was conducted by the unit = functions. The infere nce network algorithms were derived directly from suitable problem formulations. Sites in the inference network perform binary deduction and produce candidates. Units collect the candidates and make decisions based on the optimization criteria. The complete inference network is composed of N 2 units, each unit has a fan-in of N and a fan-out of 2N. The network has logarithmic convergence rate for the graph problems discussed. Each individual unit is a simple computing device. The inference network is a uniform structure with a high degree of redundancy, since the nature of the optimization problem is to make decisions based on multiple choices. The connectivity of the network units grows with problem size N. Inference network algorithms in both discrete and continuous-time domains for the shortest path problem, the transitive closure problem, and the assignment problem were discussed. The synchronous inference network algorithm for the shortest path problem was mathematically equivalent to the divide and conquer version of the sequential matrix 104 Chapter 7. Conclusions^ 105 algorithm for the problem. Compared with all other shortest path algorithms discussed in Chapter 4, the inference network was shown to solve the problem with the least number of synchronous iterations. Therefore, it will be the fastest if every algorithm runs on the kind of hardware that optimizes its performance. Implementing the discrete algorithm on the Connection Machine resulted in a shorter completion time than one of the fastest systolic algorithms for N < 512 (the maximum size investigated). It was also demonstrated that the network can unify many existing all-pair shortest path algorithms by defining different unit functions. The inference network algorithm, which was based on simple procedures for finding the shortest path, was proved to produce optimal solution for synchronous, asynchronous, and continuous-time updating, and its convergence was independent of the problem size. The inference network algorithm for the transitive closure problem was derived directly from the problem formulation. The solution was obtained efficiently using combinational logic without a clock. No gradient descent was involved. Theoretical analysis proved that the convergence of the algorithm to the optimal solution was independent of the problem size. The structure of the continuous-time network and simulation of its performance were given. The dynamic behavior of the inference network and the impact of its energy constants were discussed through the example of the assignment problem. The constants in the energy function (corresponding to link weights) were determined for the problem. Compared with results from exact solutions, the assignment results from the inference network were found to be near optimal for all the test cases with N up to 128. Compared with the Hopfield model for the TSP, the inference network was shown to converge for a larger size of problems, due to the use of a decreasing weight for the objective function as well as the formulation of the problem. The inference network was also applied to data consistency checking (Appendix A), Chapter 7. Conclusions^ 106 consistency labeling (Appendix C), and matrix inversion (Appendix B). For data consistency checking or consistency labeling, the inference network finds or eliminates all inconsistency. Among the applications discussed, the assignment problem was solved using a different strategy from the other problems. For the assignment problem, the inference network aims at decreasing the network energy through gradient descent, which is very similar to other neural network optimization approaches. For other problems (shortest path, transitive closure, data consistency checking, and consistent labeling), the inference network performs simple optimization procedures for the problem, and the decreasing of the network energy was just a result of optimization. For all the applications discussed except matrix inversion, synchronization of the units was not require. In matrix inversion, the inference network was used as a systolic array. 7.2 Contribution The major contribution of this thesis is the architecture of the inference network. The network was designed to implement the optimization procedure in a direct manner. Each decision making unit in the network always has access to all candidates competing for the optimum and these decision-making units can operate asynchronously. Once a computation has been started, the decision making units actively optimize their values until an optimum is reached. For the class of graph problems considered in this thesis (shortest path, transitive closure, consistent checking), the inference network competes with previously formulated neural network models. The inference network has the advantage of requiring no training or weight determination. Additionally, the inference network units have much smaller fan-in and fan-out than the neurons of a typical neural network. These features are fundamental to cost-effective hardware implementation. Chapter 7. Conclusions^ 107 The inference network has shown some of the advantages of a neural network and some of the advantages of a typical parallel processing array. Like many neural networks, the inference network does not require a rigorous synchronization and its units are simple computational engines. When applied to the assignment problem, the inference network behaves like a Hopfield-type neural network. Similar to a parallel processing array, the inference network can achieve optimal solution bounds for many problems (for example, shortest path, transitive closure, and consistent labeling) and the inter-processor links carry no weights. It can also be used synchronously to perform matrix inversion like a systolic array. The inference network was shown to work in both synchronous and asynchronous modes, in both discrete-time and continuous-time domains. Traditional AI algorithms, systolic array algorithms, and dynamic programming algorithms concentrate on the detailed recurrence formulations. They are usually discrete-time algorithms pursuing exact solutions. A systolic array must have a clock to synchronize all the processing elements to ensure the correctness of the results. The inference network for most of the problems discussed was shown not to require rigorous synchronization, therefore a clock was not necessary. It was shown that the inference network can be built using simple integrated circuits (for shortest path), logic gates (for transitive closure) and simple neurons (for consistency labeling). The analog inference network operating in a continuous-time domain was shown to be easy to implement since it does not need a clock and it does not need one set of circuits for each bit of data. Many neural network optimization approaches define an energy function in the form of a weighted sum and pursue the solution by decreasing the energy in gradient descent. Optimal solutions are not guaranteed and, if the energy function has more than one weight, determination of the weights in the energy function (and consequently the link weights in the network) is not a easy job. For all the applications discussed in the Chapter 7. Conclusions^ 108 thesis except the assignment problem, no gradient descent was involved in the inference network. The inference network architecture was derived from simple optimization procedures. For problems in which the optimization of a binary relation is always advantageous to the optimization of other binary relations, (for example the shortest path, transitive closure, data consistency checking, and consistent labeling), the network remains active until all the units reach their optimum and the global optimal solution is obtained. In these cases, network links carry no weights. The detailed structure difference between the inference network and neural networks or other parallel computing models is that the inputs to an inference network unit are grouped into sites, and the evaluation of each site is independent of the evaluation of other sites. This gi'ves a unit the freedom to update the unit value whenever a new candidate for the optimum is produced. 7.3 Open Issues and Future Work One of the open issues about the inference network is the number of units it requires. For example, classical neural network approaches for the N-city traveling salesman problem require N 2 neurons [30]. The maximum neural network model [49] solves the K-partite subgraph problem with N * K neurons. The neural network approaches for the shortest path problem also use N2 neurons [19,83]. A question that arises is whether it is necessary for the inference network to have N 2 units for a problem of size N (e.g. a graph with N nodes). Each unit in the inference network represents a binary relation. If the objective is to obtain a value for each of the N 2 binary relations concerning N nodes, having less than N 2 units means that the final results must be obtained in some order and some kind of synchronization is required. Therefore, it is not very likely that the number of Chapter 7. Conclusions^ 109 units in the inference network can be reduced while preserving its asynchronous feature. When the problem size is large, the divide and conquer scheme may be used to solve a large problem in a number of iterations. The transitive closure problem is an example that can be tackled by a divide and conquer scheme. After applying the inference network to a subgraph of the original problem, all the connected nodes can be merged and treated as one single node in the rest of the graph. Therefore, a smaller inference network can be applied to a larger graph and the transitive closure of the original graph will be obtained after using the inference network iteratively. Another open issue is what other problems can be solved on the inference network? To shed some light on this issue, the problems of the shortest and longest path will be examined. The shortest path and the longest path problems can be similarly represented by a set of binary rdiations. The shortest path problem can be solved on the inference network while the longest path cannot. The basic reason is not that the former is a polynomial-complexity problem and the latter is an NP-complete [21] problem. In the shortest path problem, the shortest path length from node i to j always equals to the total of the shortest path lengths from i to k and from k to j for all k's. However, the longest path from i to j is often not the concatenation of the longest path from i to k and from k to j for any k. The question is which problem can be naturally represented by binary relations and solved through binary operations on these relations (as discussed in Section 2.8). Can the inference network be applied to an NP-complete problem? Although none of the applications discussed in this thesis are NP-complete, this does not exclude the possibility of applying the inference network to an NP problem. However, since the inference network consists of a polynomial number of processors, if a polynomial completion time is expected to solve an NP-problem, the solution can not be guaranteed optimal. The author has attempted to implement the algorithm proposed by Press [68] for the Chapter 7. Conclusions^ 110 traveling salesman problem on the inference network. The algorithm combines simulated annealing with route rearrangement. The inference network supports the communication requirement for the kind of route rearrangement used. However, since the algorithm often converges to a local minimum and route rearrangement is sequential in nature, the performance of the inference network for this algorithm is not outstanding. It remains open to find efficient mapping of NP-complete problems on the inference network. Is it possible for the inference network to produce an optimal solution to an NPcomplete problem if non-polynomial time is allowed? This also remains open for further study. For the polynomial-complexity shortest path problem, Section 4.3.3 has shown that with our definition of unit function and network energy, the energy function is a decreasing concave function with only one minimum. This allows the inference network to produce the optimal solution to the shortest path problem in asynchronous or continuous-time domains. For an NP-complete problem, the shape of the search space is so complicated that it is very hard to define the unit functions of the inference network so that the network energy has only one minimum. The inference network can be applied to dynamic programming, often a process of sequential decision making. Computing shortest path is a typical problem that can be solved by dynamic programming [11]. Since the shortest path problem can be solved efficiently on the inference network, this opens up the possibility of applying the nonsequential inference network to some dynamic programming problems. However, whether and how a specific problem can be solved in a non-sequential way requires further study. Some future work which may be done to the inference network includes: • Use the inference network platform for logical analysis that can be represented by an AND/OR graph. Chapter 1 discussed the relation between the inference network topology and an AND/OR graph. In artificial intelligence, many logical Chapter 7. Conclusions^ 111 reasoning problems can be represented by an AND/OR graph. Therefore, if the reasoning procedures can be parallelized, it is possible to use the inference network to carry out the analysis. • Derive other optimization applications for the inference network, for example: — Weapon-Target Association: The costs of using any of N weapons for any of N targets are given. The objective is to minimize the total cost of associating the weapons to the targets. The problem is similar to the assignment problem except that an advanced weapon may be able to kill several targets and a particularly dangerous threat may require targeting by several weapons; — Minimum Spanning Tree: It is a fundamental problem in artificial intelligence and intimately related with the assignment and traveling salesman problems; • Map the inference network onto a 3-dimensional (instead of current 2-dimensional) processor array in the Connection Machine, where the third dimension corresponds to the sites at each unit. This will increase the number of virtual processors required, however, site calculation will be parallelized. Refer Section 3.3; • Solve the traveling salesman problem by adding additional constraints to the assignment algorithm to eliminate sub-tours, refer Xu [92]; • Design the general or special purpose inference network using VLSI circuits and test its performance; • Relate the algorithms to practical applications in robotics, image processing, VLSI layout, and control etc. Bibliography [1] A.V. Aho, J.E. Hoperoft, and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Pub. Co., 1974 [2] S.V.B. Aiyer, M. Niranjan, and F. Fallside, 'A Theoretical Investigation Into the Performance of the Hopfield Model', IEEE Trans. on Neural Networks, Vol. 1(2), pp. 204-215, 1990. [3] H.M. Alnuweiri, 'Constant-Time Parallel Algorithms for Image labeling on a Reconfigurable Network of Processors', Technical report CICSR-TR92-004, Dept. of Electrical Engineering, Univ. of British Columnbia [4] D.H. Ballard and C.M. Brown, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, 1982 [5] D.P. Bersekas, 'The Auction Algorithm: A Distributed Relaxation Method for the Assignment Problem', Annals of Operations Research, vol. 14, pp. 105-123, 1988 [6] G. Broome11 and J. Robert Heath, 'Classification Categories and Historical Development of Circuit Switching Topologies' Computing Surveys, 15(2) 95-133, June 1983 [7] W.K. Chen, Theory of Nets: Flows in Networks, John Wiley & Sons, Inc. 1990 [8] Cormen, Leiserson and Rivest, Introduction to Algorithms, McGraw-Hill, 1990 [9] G.B. Dantzig, 'All Shortest Routes in a Graph', Operations Research House, Stanford Univ. Tech. Rep., pp. 66-3, (1966) [10] E.W. Dijkstra, 'A Note on Two Problems in Connexion with Graphs', Numer. Math., 1, (1959) pp. 269-271 [11] S.E. Dreyfus and A.M. Law, The Art and Theory of Dynamic Programming, Academic Press, Inc., 1977 [12] A. El-Amawy, 'A Systolic Architecture for Fast Dense Matrix Inversion', in IEEE Computer, Vol. 38, No. 3, 1989 [13] B.A. Farley, A.H. Land and J.D. Murechland, 'The cascade algorithm for finding all shortest distances in a directed graph', Management Sci., 14, pp. 19-28 (1967) 112 Bibliography^ 113 [14] J.A. Feldman and D.H. Ballard, `Connectionist Models and Their Properties', Cognitive Science, Vol. 6, pp. 205-254, 1982 [15] J.A. Feldman, M.A. Fanty, N.H. Goddard, and K. Lynne, 'Manual of Rochester Connectionist Simulator', 1986. [16] J.A. Feldman, M.A. Fanty, N.H. Goddard, and K. Lynne, 'Computing with Structured Connectionist Networks', IEEE Trans. Computer, pp. 91-102, March 1988. [17] R.W. Floyd, 'Algorithm 97, Shortest Path', Comm. ACM, Vol. 5 (1962), pp. 345 [18] S. Fortune and J. Wyllie, 'Parallelism in Random Access Machines', Proc. of 10th Annual ACM Symp. on Theory of Computing, pp. 114-118, Mar. 1978 [19] T. Fritsch and W. Mandel, 'Communication Network Routing using Neural Nets Numerical Aspects and alternative Approaches', in IEEE, 1991 [20] Tutorial text book, The 2nd Int'l Conf. on Fuzzy Logic and Neural Networks, July 17-22, 1992, IIZuka, Japan [21] M.R.Garey and D.S. Johnson, Computers and Intractability: A guide to the Theory of NP-Completeness, W.H. Freeman and Company, 1979 [22] N.H. Goddard, M.A. Fanty, and K, Lynne, The Rochester Simulator, Tech. Rep. 233, Computer Science Dept., Univ. of Rochester, 1987. [23] L.J. Guibas, H.T.Kung, and C.D. Thompson, 'Direct VLSI Implementation of Combinational Algorithms', in Proc. Caltech Conference on VLSI, California Institute of Technology, Pasadena, 1979, pp. 509-525 [24] S. Hammering, 'A Note on Modifications to the Given's Plane Rotation', in J. Inst. Math. Appl. Vol. 13, pp. 215-218, 1974 [25] R. Helton, 'A self-adaptive connectionist shortest-path algorithm utilizating relaxation methods', Proc. Int'l Joint Conf. on Neural Networks, vol. 3, pp. 895-901, 1991 [26] W.D. Hillis, The Connection Machine, The MIT Press, 1985 [27] , `Connectionist Learning Procedures', in Artificial Intelligence, Vol. 40, pp. 185-234, 1989 [28] J.J. Hopfield, 'Neural Networks and Physical Systems with Emergent Collective Computational Abilities', Proc. Nat'l Acad. Sci. USA, Vol. 79, pp. 2554, 1982. Bibliography^ 114 [29] J.J. Hopfield, 'Neurons with Graded Response Have Collective Computational Properties Like Those of a Two-State Neurons', Proc. Nat'l Acad. Sci. USA, Vol. 81, pp. 3088, 1984. [30] J.J. Hopfield, and D.W. Tank, 'Neural Computation of Decisions in Optimization Problems', Biolog. Cybern., Vol. 52, pp. 1-25, 1985. [31] J.J. Hopfield, and D.W. Tank, 'Computing with Neural Circuits: A Model', Science, Vol. 233, pp. 625-632, 1986. [32] T.0 Hu, 'Revised matrix algorithms for shortest paths', SIAM J. Applied Math., 15 (1967), pp. 207-218 [33] Y.N. Huang, J.P. Cheiney, 'Parallel computation of direct transitive closures', Proc. of 7th Intl Conf. on Data Engineering, pp. 192-199, 1991 [34] G-H Kim, H-Y Hwang, and C-H Lee, 'A modified Hopfield network and its application to the layer assignment', Trans. of the Korean Institute of Electrical Engineers, vol. 40, Iss. 2, pp. 234-237, 1991, in Korean [35] S. Kirkpatrick, C.D. Gelatt, Jr., and M.P. Vecchi, 'Optimization by Simulated Annealing', Science, 220, pp. 671-680, 1983 [36] H. Kita, 0. Sekine and Y. Nishikawa, 'A design Principle of the Analog Neural Network for Combinatorial Optimization: Findings from a Study on the Placement Problem', poster presentation during the Int'l Joint Conf. of Neural Network, July, 1991, Seattle. [37] J.J. Kosowsky and A.L. Yuille, 'Solving the Assignment Problem with Statistical Physics', Int'l Joint Conf. on Neural Networks, vol. 1, pp. 159-164, July 1991, Seattle [38] H.W. Kuhn, 'The Hungarian Method for the Assignment Problem', Naval Research Logistics Quarterly, 1955 [39] S.Y. Kung, and S.C. Lo, 'A spiral systolic architecture /algorithm for transitive closure problems', IEEE Int'l Conf. on Computer Design, ICCD'85, New York, USA, pp. 622-626 [40] S.Y. Kung, P.S. Lewis, and S.C. Lo, 'On Optimally Mapping Algorithms to Systolic Arrays with Application to the Transitive Closure Problem', IEEE Intl. Symposium on Circuits and Systems, vol. 3, 1986 [41] K.P. Lam, 'Using Hybrid Techniques for a Continuous-Time Inference Network', Proc. of the IEEE Symposium on Circuits and Systems, pp. 1721-1724, 1991. Bibliography^ 115 [42] K.P. Lam, 'A Continuous-Time Inference Network for Minimum-Cost Path Problems', Int'l Joint Conf. on Neural Networks, July 1991, Seattle, pp. 367-372 [43] K.P. Lam, and C.J. Su, 'A Binary Relation Inference Network', Proc. of the Fifth Int'l Parallel Processing Symposium, Anaheim CA, April 30 - May 2, 1991, IEEE Press, pp. 250-255 [44] K.P. Lam, 'Time-range Approximate Reasoning for Microprocessor Systems Design', CICSR Technical Report TR 90-4, The University of British Columnbia, Feb. 1990, 33 pages, accepted for publication, Proc. of IEE, Part E, 1992 [45] K.P. Lam and C.J. Su, 'Inference Network for Optimization Applications', Proc. of the second Int'l Conf. on Fuzzy logic and Neural Networks, July 1992, pp. 189-192, Iizuka,, Japan [46] H. Lang, 'Transitive Closure on an Instruction Systolic Array', Systolic Array Processors, Proc. of Int'l Conference on Systolic Arrays, 1988 [47] Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart, and Winston, 1976 [48] K.C. Lee, Y. Takefuji, and N. Funabiki, 'A Maximum Neural Network for the Max Cut Problem', Intl Joint Conf. on Neural Networks, July 1991, Seattle, pp. 379-384 [49] K.C. Lee, N. Funabiki and Y. Takefuji, 'A parallel Improvement Algorithm for the Bipartite Subgraph Problem', IEEE Trans. on Neural Networks, vol. 3, No. 1, 1992 [50] F.T. Leighton, Introduction to Parallel Algorithms and Architectures, MorganKanfman, 1992 [51] T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, John Wiley & and Sons Ltd, 1990 [52] P.S. Lewis and S. Y. Kung, 'Dependence Graph Based Design of Systolic Arrays for the Algebraic Path Problem', Proc. of Ann. Asilomar Conference on Signal, System and Computers, pp. 13-18, 1986 [53] V. Li, Knowledge Representation and Problem Solving for an Intelligent Tutor Advisor, M.A.Sc. thesis, Univ. British Columbia, 1990. [54] W-M. Lin and V.K. Prasanna, 'Parallel Algorithms and Architectures for Discrete Relaxation Technique', in IRIS No. 274, Institute for Robotics and Intelligent System, Univ. of Southern California, Los Angeles, California Bibliography^ 116 [55] A.K. Mackworth, 'Consistency in Network Relations', Artificial Intelligence, 8, pp. 99-118, 1977 [56] A.K. Mackworth, 'Constraint Satisfaction', Encyclopedia of Artificial Intelligence, Ed. by J. Shapiro, Wiley Publishers, vol. 1, pp. 205-211, 1987 [57] B. Marh, `Semiring and Transitive Closure', in Technische Universitlit Berlin, Fachbereich 20, report 82-5, 1982 [58] B. Marh, 'Iteration and Summability in Semirings', in Algebraic and Combinational Methods in Operations Research (Proc. of the Workshop on Algebraic Structure in Operations Research) Ed. by R.E. Burkard, R.A. Cuninghame-Green, and U. Zimmermann. Ann. Discrete Math. vol. 19, pp. 229-256, 1984 [59] D. Marr, Vision, W.H. Freeman, San Francisco, 1982 [60] J.S.B. Mitchell and D.M.Keirsey, 'Planning Strategic Paths thought Variable Terrain Data', SPIE applications of artificial Intelligence, Vol. 485, pp. 172-179, 1984 [61] C. Mead, Analog VLSI and Neural System, Addison-Wesley, 1989 [62] J.J. Modi, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford, 1988 [63] A. Moffat and T. Takaoka, 'An all pairs shortest path algorithm with expected time 0(n 2 logn)' , SIAM J. Computing, 6, pp. 1023-1031, 1987 [64] N.J. Nilsson, Principles of Artificial Intelligence, Spinger-Verlag, 1980 [65] F.J. Niiiiez and M.Valero, 'A Block Algorithm For the Algebraic Path Problem And Its Execution On A Systolic Array', Systolic Array Processors, Proc. of Int'l Conference on Systolic Arrays, pp. 265-274, 1988 [66] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Inc., 1982 [67] J. Pearl, Heuristic: intelligent search strategies for computer problem solving, Addison-Wesley Pub. Co., 1984 [68] William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling, Numerical Recipes, Cambridge Univ. Press, 1986 [69] M.J. Quinn, 'Analysis and Implementation of Branch-and-Bound Algorithms on a Hypercube Multicomputer', IEEE trans. on Computers, Vol. 39 No. 3, 1990 Bibliography^ 117 [70]Y. Robert and D. Trystram: 'Systolic Solution of the Algebraic Path Problem', in Systolic Arrays: Papers Presented at the First Int'l Workshop on Systolic Arrays, Ed. W. Moore, A. McCabe and R. Urquhart, 1986 [71] G. Rote, 'A Systolic Array Algorithm for the Algebraic Path Problem (Shortest Paths; Matrix Inversion)', in Computing Vol. 34, pp. 191-219 [72]L. Shastri, Sementic Nets:Evidential Formalization and Its Connectionist Realization, Morgan-Kaufman, Los Altos/Pitman Pub., London, Feb, 1988 [73]P.M. Spira, 'A New Algorithm for Finding All Shortest Paths in a Graph of Positive Arcs in Average Time O(n 2 log 2 n)),' SIAM J. Comput. 2 (1973), 28-32 [74]K.B. Sriram and L.M. Patnaik, 'Neural Network Approach for the Two-Dimensional Assignment Problem', Electronics Letters, vol. 26, No. 12, 1990, pp. 809 [75]H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Pub. Co., 1987 [76]C.J. Su and K.P. Lam, 'Implementation of All-Pair Shortest Path Algorithms with a Binary Relation Inference Network', 10 pages, submitted for publication, February 1991 [77]C.J. Su and K.P. Lam, 'Evaluation of an Inference Network Algorithm for the Shortest Path Problem', 15 pages, submitted for publication, May. 1992 [78]C.J. Su and K.P. Lam, 'A Wavefront Array for the Algebraic Path Problem', 16 pages, submitted for publication, June. 1991 [79]C.J. Su and K.P. Lam, 'Systolic Mapping of the Inference Network for Shortest Path Problems', in Proc. of hit 'l Joint Conf. of Neural Network, Singapore, Nov., 1991, pp. 917-922 [80]C.J. Su and K.P. Lam, 'Dense Matrix Inversion on Connection Machine', 18 pages, to be submitted. [81] Y. Tabourier, 'All shortest distances in a graph. An Improvement to Dantzig's Inductive Algorithm', Discrete Math., 4 pp. 83-87 1973 [82) R. Tamassia, J.S. Vitter, 'Optimal parallel algorithms for transitive closure and point location in planar structures', Proc. of ACM Symposium on Parallel Algorithms and Architectures, pp. 399-408, 1989 [83] S.C.A. Thomopoulos, L. Zhang, and C.D. Wann, 'Neural Network lnplementation of the Shortest Path Algorithm for Traffic Routing in Communication Networks', in Proc. of Int'l Joint Conf. on Neural Networks, vol. 2, pp. 591, 1989 Bibliography^ 118 [84] A.A. Toptsis, 'Parallel transitive closure computation in highly scalable multiprocessors', Proc. of Intl Conf., Advances in Computing and Information, pp. 197-206, 1991 [85] Technical Summary or User's Guide of the Connection Machine from the Thinking Machine Corporation, 1991 [86] E. Wacholder, 'A Neural Network Based Optimization Algorithm for the Static Weapon-Target Assignment Problem', ORSA Journal of Computing, Vol. 1, No. 4, 1989, pp.232 [87] D.L. Waltz, 'Waltz Filtering', Encyclopedia of Artificial Intelligence, Ed. by J. Shapiro, Wiley Publishers, vol. 2, pp. 1162-1165, 1987 [88] B.F. Wang, and G.H. Chen, 'Constant Time Algorithms for the Transitive Closure and Some Related Graph Problems on Processor Array with Reconfigurable Bus Systems', IEEE Trans. on Parallel and Distributed Systems, TPDS-1, 4, 1990, pp. 500-507 [89] J.M. Wein, and S. Zenios, 'Massively Parallel Auction Algorithms for Assignment Problem', 1990, Third Symposium On the Frontier of Massively Parallel Computation, IEEE Computer Society Press, pp. 90-99 [90] G.V. Wilson and G.S. Pawley, 'On the Stability of the Travelling Salesman Problem Algorithm of Hopfield and Tank', Biological Cybernetics, vol. 58, pp. 62-70, 1988 [91] Kenny Wu, 'Analog Implementation of a Continuous-Time Inference Network for 5 cities Shortest Path Problems', Elec 475 project report, University of British Columbia, April, 1991 [92] X. Xu and W.T. Tsai, 'Effective Neural Algorithm for the Traveling Salesman Problem', Neural Networks, Vol. 4, pp. 193-205, 1991 [93] L. Yang and W.K. Chen, 'An extension of the revised matrix algorithm', IEEE Proc. Int. Symp. on Circuits and Systems, pp. 1996-1999 (1989) [94] J.Y. Yen, 'Finding the Lengths of All Shortest Paths in N- Node NonnegativeDistance Complete Networks Using 1N 3 Additions and N 3 Comparisons', J. Assoc. Comput. Mach, 19 (1972), pp. 423-424 [95] U. Zimmermann, 'Linear and combinational optimization in ordered algebraic structures', (Chapter 8), in Discrete Math. Vol. 10, 1981 Appendix A Data Consistency Checking Using the Inference Network The inference network can be used to check consistency of a set of relations, derive binary relations from a consistent relation set, infer binary relations from an inconsistent relation set based on the reliability of the given relations. We will consider symmetric relations — R(i, j) = R(j, i) V i j — since these algorithms are mainly used in time or location , referencing applications. Because of the symmetry of the binary relations, the inference network has N(N — 1)/2 units. For N nodes, there are up to N *(N — 1) binary relations in existence representing the relations between any two of them. The number of independent binary relations required to uniquely define all the binary relations in the network is (N — 1). Given some of the binary relations, the number of iterations required for the inference network to obtain all the remaining ones is determined by the structure and consistency of the relations. A set of binary relations can be kept in the inference network, unit (i, j) for relation R(i, j). A set of relation is consistent if the inference network representing the relations is consistent; a network is consistent if for all the units, all site outputs for its different sites are identical to the unit output. Given inference rules, The consistency of the relations can be checked. Consistency may be declared only when all the units and sites are defined. In checking the consistency, 119 Appendix A. Data Consistency Checking Using the Inference Network ^120 unit output is evaluated according to: Previous unit output if unit output is defined and all site outputs are identical to it. Unit output = Site output ^if unit output is undefined and^(A.89) site outputs are all identical. Inconsistent Message all other cases. A site output is defined as the deduction of the two inputs from it's incoming links. If no Inconsistent Message is obtained after [log 2 (N — 1)1 iterations, the relation set is consistent. Here N is the number of nodes involved in the relation set. When the given relations are inconsistent, at some units, sooner or later, the outputs from its (N — 2) sites will be different. The network must have a conflict resolution mechanism in order tb proceed further inference. There are various possibilities on conflict resolution, among which some reasonable ones are: • using an average operator(minimizing high frequency noise), • using the max operator(corresponding to ORing the inference results), • using the min operator(corresponding to ANDing the inference results), • using a weighted average operator. Given initial relations and their corresponding certainty factor (reliability, hardness of the constraint), an inference network can 1. arrive at a stable consistent state, with each unit output being identical to all its site outputs; 2. arrive and stay at a metastable inconsistent state, with at least one unit output different from some of its site outputs; ^ ^ Appendix A. Data Consistency Checking Using the Inference Network^121 3. become unstable with its degree of inconsistency growing bigger and bigger when the conflict resolution schemes are used. As an example, we define site and unit functions as: (k+1)^()^(k)^ Siii^= Utik (A.90) and = (1 — rii)* atr rij *ur )^(A.91) in which 1^N-1^(k) ^(k) avi — _^ , 9 E^(A.92) , where rii is the certainty factor of relation R(i, j) (represented by ui1). If R(i,j) is fully constrained, r, i = 1. Otherwise, ri; varies within [0,1). A global energy function can be defined to measure the network consistency. For the inference network involving N nodes (N > 3), unit (i,j) (0 < j < i < N — 1) is one of the C(N, 2) units in the network and site 1 (0 </Oij<N-1) is one of the (N — 2) sites attached to unit (i,j), the energy function is defined as: N-1 i-1 N-1 ^E (k) = E E E (s!'", ")— un 2^(A.93) - 1=1 j=0 /=0,10i0j where E( k ) is the network's global energy after k iterations, sr ) is the output of site 1 at unit (i, j) after (k + 1) iterations, 49 is the output of unit (i,j). The value of sr l) is the deduction from te and 49. The value of 47 ) is the result of conflict resolution operated on its site outputs. Series {E} shows the state of network: 1. The network converges, if {E} decreases monotonically; 2. The network diverges, if {E} increases monotonically; Appendix A. Data Consistency Checking Using the Inference Network ^122 3. The network converges to a metastable inconsistent state, if {E} converges to a non-zero constant; 4. The network converges to a stable consistent state, if {E} converges to zero. When units are updated asynchronously, the network always converges. Given a single change at any unit (x, y) in a stabilized network, using the definitions (A.93 - A.92), it can be derived that AE (k) = E(k) — Elk-i) —3 * (N — 2) * (1 — rx2 v )* (avikv) — ti(kv-1) ) 2 < 0 (A.94) holds only when (1) r zy = 1, i.e. R(x,y) is fully constrained or (2) aviky) = ulky-1 ), i.e. no change. For synchronous updating, after the k-th iteration, the change of network energy can be derived as: AE (k) N-11-1^N-11-1 j-1 = E E AE1,9 + 6 * E EE(4,)60) + (4;0 4:0 + ce)40) i=1 j=0^1=2 j=11=0 N-1 i-1 E E —3 * (N — 2) * (1 — 71) * ( 1=1 i=o a 4k.) _ 4-0)2 N-1 1-1 +6* E EE(49d!iik) ctiiodfk.) dfs. )49) 1=2 j=1 1=0 (A.95) AE < 0 does not always hold for general synchronous updating. AE can be greater than 0, which means for some given combinations of initial relations and certainty factors, the network may diverge. Appendix B Inference Network Algorithm for Matrix Inversion The inference network algorithm for matrix inversion is developed based on Grout's algorithm [68]. The processor array is composed of N x N PEs connected in a two dimensional grid, as shown in Figure 3.11. There are three types of PEs in the array. Type A, type B, and type C refer to PEs whose coordinates satisfy i > j, i = j and i < j respectively. The matrix elements are copied to the processing elements in a pre-designed way so that when data are propagated in the array, the effect are just like the data organization in Figure B.23. It takes 5N — 4 time units to complete the inversion. Each processing element requires a local memory 'c' to calculate a summation and a control counter l' to trigger and complete the calculation and propagation. Initially, the control counter for PE [i, j] is set to k := —i — j. Algorithm 5 gives the detail. Result of the inversion is kept in c[i, j] after 5N — 4 time units. For more in-depth discussion about an N x N systolic array for matrix inversion, see author's paper [80, 78]. Algorithm 5 : Matrix Inversion on the CM: BEGIN PARALLEL for all i,j (initialization) h[i, (—i — j) (mod N)]:. v[(—i — j) (mod N), A:. ai ; k[i, j] := —i — j^c[i, j] := 0 END PARALLEL FOR count =1 TO count = 5N — 4 DO BEGIN PARALLEL for all i,j h[i,j]:= h[i, (j —1) (mod N)]^v[i,j]:= v[(i — 1) (mod N), j] 123 Appendix B. Inference Network Algorithm for Matrix Inversion^124 lc[i, j] := k[i, + 1 IF (0 < k[i, j] < min{i, j}) c[i, j] •= c[i, j]^h[i, j] * v[i, j] ELSE IF (k[i, j] =^j}) IF (type = A)^h[i, j] := c[i, j] := (h[i, j] — c[i, ELSE IF (type = B) h[i, j]:= v[i, := c[i, j]:= v[i, j] — c[i, j] ELSE^v[i, := c[i, j] := v[i, j) — c[i, j] ELSE IF (k[i, = max{i, j}) IF (type = A)^v[i, j] := c[i, j] ELSE IF (type = C) h[i, j] := c[i, j] ELSE IF (k[i, j] = N + min{i, j}) IF (type y B)^j] := v[i,j] := c[i, j] := 11v[i, j] ELSE IF (type = C) c[i, j] := h[i, j] * v[i, j] ELSE IF (N + min{i, j} < k[i, j] < N + max{i, j}) c[i, j]:= c[i,^h[i, j]* v[i,j] ELSE IF (k[i, j] = N + max{i, j}) IF (type = A)^v[i, j] := c[i, j] := —c[i.j] ELSE IF (type = C)^c[i,j]:=^v[i, ELSE IF (k[i, j] = 2N + min{i,j}) IF (type = A)^h[ij] := c[i, j] ELSE IF (type = C) v[i, j] := c[i, j] ELSE IF (k[i, j] = 2N + max{i, j}) IF (type = A)^c[i, j] := h[i, j]* v[i, j] ELSE IF (2N + max{i, j}) < k[i, j] < 3N) c[i , j] •= c[i , j]^hEi , ji * v[i,j] END IF Appendix B. Inference Network Algorithm for Matrix Inversion^125 END PARALLEL Figure B.23: Data flow in calculating A -1 . Each parallelogram contains data for one phase. * is an element in the original matrix; * is an element in L or U; o is an element in L -1 or U-1. Appendix C A Weightless Neural Network for Consistent Labeling Problems The objective of a consistency labeling problem is to label a set of N objects using a set of M labels which satisfy some compatibility constraints. Only unary or binary constraints are considered here. A unary constraint prohibits an object to have a certain label. A binary constraint restricts the combination of the labels for two objects. A weightless neural network is proposed here to eliminate all inconsistent labels for each object based on unary and binary constraints. The inconsistency elimination algorithm is based on the sequential algorithm discussed in [54]. Unary constraints are used to determine candidate labels for each object. Binary constraints are used to set up initial restriction of labels between object pairs. These restrictions are passed to other related object pairs and subsequently reduce candidate labels for objects. The proposed neural network consists of 5 i-IP" clusters of neurons, denoted as G(i, j) (1 < j < i < N). Each cluster G(i, j) with i > j detects inconsistent labels between object i and j. Initial unary and binary relations are used to set up initial states of the nuerons in cluster G(i, i) and (G(i, j) (for all i and j) respectively. If any inconsistent label is found for object i (or j) in cluster G(i, j), a signal is sent to cluster G(i, i) (or G(j, j)). When cluster G(i, i) receives the signal indicating an inconsistent label for object i, it activates G(i, j) (j = 1,2, • • • , i — 1) and G(j,i) (j = i 1,i + 2, • • • , N) to eliminate the inconsistent label from the candidate labels for object i. There are M independent neurons in cluster G(i, i). The initial state of neuron m 126 Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^127 (1 < m < M) is determined by the unary constraints for object i. If 1„, is a unaryconsistent label for object i, neuron m in cluster G(i,i) is initially set 'on'. Neuron m will turn off if label 1„, is later found inconsistent for object i. If all neurons in a cluster G (i , i) are off, the labeling process will be terminated since there is no consistent label for object i. When all the inconsistent labels are eliminated, if neuron m in cluster G(i, i) is on, object i can be labeled lm . Each neuron cluster G(i, j) (i > j) consists of two layers of neurons. The lower layer has M x M neurons. Neuron (p, q) (1 < p, q < M) has two inputs from neuron p in cluster G(i, i) and neuron q in cluster G(j, j). Neuron (p, q) is initially off if simultaneous labeling of 11, for object i and /9 for object j is prohibited by a binary constraint. After the network starts to update, neuron (p, q) remains 'on' if and only if both neuron p in cluster G(i, i) and neuron q in cluster G(j, j) are on. The upper layer has 2M neurons, each has M inputs from a row (or a column) of neurons in the lower layer and an output to a neuron in cluster G(i , i) (or G(j, j)). If all neurons in a row (or a column) in the lower layer are off, the corresponding neuron in the upper layer will be turned off, and a signal will be sent to cluster G(i, i) (or G(j, j)) indicating an inconsistent label for object i (or j). The network will settle down when all inconsistent labels are eliminated. No oscillation can occur since neurons can only switch from 'on' to 'off' after initialization, or remain in an 'on' state or an 'off' state. The correctness and completeness of the elimination process is ensured by the corresponding sequential algorithm [541. Links in the network only transmit an on/off state, therefore, all link weights can be set to one. Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^128 Figure C.24: Interconnected clusters form a weightless neural network for consistency labeling problems. Figure C.25: A diagonal cluster G(i, i) in the weightless neural network. Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^129 Group G(i,j)i. > j G(j,j) Wn n rt-i^p,q M neuron (PA)^RVp CVq Figure C.26: A non-diagonal cluster G(i, j) (i > j) in the weightless neural network. Appendix D Algorithms Related with the Shortest Path Problem D.1 Mapping the Shortest Path Algorithm onto the Connection Machine Chapter 3 has shown that the Connection Machine is an efficient platform available to map the inference network algorithm on to. After initializing the processor array according to Eq.(4.13), the shortest paths can be obtained in at most {log 2 (N — 1)1 iterations, or, Nilog 2 (N — 1)1 time units. As discussed in Section 3.3, each iteration consists of updating the data flows, evaluating site values, and calculating unit values. Mapped onto the CM, INFERENCE becomes: Algorithm 6 : Shortest Path on the CM (to be referred to as INFERENCE-CM) for all PE, u[i,j]:= u(°)(i,j) count := flog 2 (N — 1)1 while (count > 0) BEGIN BEGIN PARALLEL for all i, j h[i, (—i — j)mod(N)J := u[i,j], v[(—i — j)mod(N), •= u[i, j] yold[i,j]:= u[i,j] FORk=0 TOk=N-1 BEGIN update: u[i, j] := min fu[i , j], h[i, j] v[i, j]} propagate: h[i,j]:= h[i, (j — 1)mod(N)J v[i,j]:= v[(i —1)mod(N),A 130 Appendix D. Algorithms Related with the Shortest Path Problem^131 END END PARALLEL if Vi,j u[i,j] = u old[i,j], break; count := count —1 END It is not difficult to prove the equivalence of INFERENCE-CM and INFERENCE by comparing the corresponding unit values after each iterations. D.2 Unifying Some Established Algorithms Using an Inference Network All-pair shortest path algorithms can be implemented by defining proper unit and site functions in an inference network. In the following, we shall describe several algorithms and derive their inference network implementation for finding all shortest paths in a general directed graph. D.2.1 Floyd's Algorithm Floyd's [17, 11] algorithm (FLOYD) constructs optimal paths by inserting nodes when appropriate, into more direct paths. After k iterations, function fk(i, j) evaluates to the optimal path length between i and j, where the intermediate nodes belong to the set of nodes {1, 2, ... , k}. Its recurrent relation is =^ fk_1(k,j)}^(D.96) with boundary condition fo (i, j) = clii. The total number of basic operations required is Appendix D. Algorithms Related with the Shortest Path Problem ^132 Since FLOYD requires fk (i, i) to detect negative cycles, its inference network implementation needs N 2 units. The unit and site functions are defined as: = min {u (k-1) (i, j), 3 (k)^k)}^(D.97) u k-1 (i, k) u (-i)( k, j ) =^ )^k^ (D.98) Initial unit values are u ( °)(i, j) = dij . When there is no negative loop, the network settles down after k = N, and u( N) (i,j) gives the shortest path length from node i to node j, and diagonal units (1, i) have zero values. In the presence of negative loops, there will be at least one 'diagonal' unit having a negative value. Iteration has to stop after exactly N steps, in order to obtain shortest paths which do not traverse the same arc for more than once. Compared with INFERENCE, this algorithm only needs to update one specific site (indexed by k) at each unit for each iteration; but it takes N instead of n < Fog2 (N — 1)1 iterations to find all the shortest paths. D.2.2 Matrix Algorithms Matrix algorithm (MATRIX) defines a matrix binary operation 0 called minaddition: W = A B = [w ii = min(aik bki)] where A = [a id ], B = [kJ] ^ (D.99) If DI = D is the length matrix [4], then the elements of DN gives the shortest path lengths, where D N is the minaddition of N D's: D N =D0D0D0•0D0D^(D.100) This algorithm is valid when there are no negative loops in the network. Minaddition is associative, for example, D 4 =D 0D0 D0D = D 2 D 2^(D.1 01) Appendix D. Algorithms Related with the Shortest Path Problem ^133 Eq.(4.14-4.15) uses this property to make the inference network converge in no more than (7og 2 (N —1)1 iterations. In fact, the INFERENCE algorithm is intimately related with the MATRIX algorithm, although the former is derived directly upon an inference network platform. It is not difficult to observe that, if we define another matrix binary operation 1 as W = AIB = [wij = minfa ii ,ba where A =^B = [bii ]^(D.102) then Eq.(4.14-4.15) can effectively be represented as D (k) D (k-1) 1 ( D (k-1) D (k-i))^(D.103) Revised matrix algorithm (RMATRIX) updates matrix elements asynchronously to avoid unnecessary operations. Two distinct processes are defined: (i) a forward process j<1^{1 i<i ^{1 = mm Ice dij? } where p =^q = p'^<l< 1 N^ j>1^f i>1 ^f ( ) (D.104) and (ii) a backward process db) = min 1<l<N le + di;) } where p = If 1<j b 1>j q = (D.105) [93] shows that the shortest paths can be obtained by one forward process followed by on backward process, one backward process followed by one forward process, three forward processes, or three backward processes. So the total number of basic operations required is 4N(N — 1)(N — 2) or 6N(N — 1)(N — 2). The inference network for RMATRIX contains N(N —1) units (diagonal units are not necessary). The matrix elements are updated in an order shown in Figure D.27. Site function for both forward and backward processes are the same: 8 (k)(i , 0 = u (k-1)(i 0 + u (k-1)(i , j)^ D.106) , ( Appendix D. Algorithms Related with the Shortest Path Problem ^134 1247 1 358 236 4 5 6 10 7 8 9 10 10 9 8 7 10 6 9 632 853 1 7421 Figure D.27: The order of updating length matrix in a forward (left) and backward (right) process Unit functions for forward process is: minfu(k-00,i),^min^.{s(k)(i,j,1)}} when k = f(i,j) u(k)(i,i)=^l< 1 < N,10 id 03 j)^ otherwise (D.107) where y(i,i)= max{i,j}(max{i,j} — 1) + miniim 2 (D.108) Unit functions for backward process is: U (k) (i, j) = 1 min{u(k -1 )(i,j),^min^.{s(k)(i,j,l)}} when k = b(i,j) 1<l<N,i0i,10) ^ u (k--1)(i , j) otherwise (D.109) where max{N — i,N — j}(max{N — i,N — j} +1)^ 1-min{N—i,N—j}1-1 (D.110) b(i,j) —^ 2 The network updates one or two units at a time. To obtain all the shortest paths, the units of inference network have to be updated in a specific order for N(N-1) (one forward and one backward process) or IN(N — 1) iterations (three forward or three backward). Appendix D. Algorithms Related with the Shortest Path Problem ^135 D.2.3 Dantzig's Algorithm Dantzig's algorithm [9] (DANTZIG) constructs shortest paths using arcs whose nodes numbered between 1 and k for the k-th iteration. gk (i,j) (i < k,j < k), the shortest path length from i to j in the k-th iteration, is obtained from gk-i(i,j), dik and dki. [11] gives the algorithm in the recurrent form: gk(i,k) =^min^fgk _ 1 (i,1)-1- go (1,k)} (1 = 1,2,...,k -1),^(D.111) l<1<k-1 min {go (k, 0 + gk_ i (I,j)} (j = 1, 2, ... , k - 1),^(D.112) 9k(k,j) = l< 1<k -1 gk (k, k) = min{0, 1 < rirtnk _ l igk (k,l) + gk (1,k)}}^(D.113) =^ gk(k,j)} (i,j = 1,2,...,k -1) (D.114) where go (i, j) =^for all i and j. The total number of basic operations required is 2N 2 (N - 1). The structure of inference network for DANTZIG is the same as that for FLOYD, which has N 2 units. The unit function for DANTZIG is u (2k -1- 1) (i , j) = 1 min foo( i, A i < m i nk -1 { 3 (2k+00 , j, l)}} j < i = k or i < j = k u (2k) j)^ otherwise (D.115) m i n { u (2k-1-1)(i , j) , s(2k+2)(i, ^k)}^i < k and j < k u (2k+2)(i , j) = { ^(D.116) =k otherwise min{° ' 1 < Ti nnk - 1 1.5(2k+2)(i ' j ' 1)}}^= j j)^ ^ where 2 < k < N, u( 4 )(i,j) = dii , and s(r)(i,j, = + ti('' -1 )(/, j) V 5 < r < 2N + 2 In the (2k +1)-th iteration, 2(k -1) units are updated, k - 1 sites at each unit are involved in the minimization; in the (2k + 2)-th iteration, a total number of (k -1) 2 +1 units with a total of k(k - 1) sites are minimized. The total number of synchronous iterations required is 2(N - 1). ^ Appendix D. Algorithms Related with the Shortest Path Problem^136 Modified Dantzig's algorithm [81] (MDANTZIG) avoids unnecessary operations by further decomposing the first two recurrent relations into k — 1 recurrent steps. Define 22(i, k) = g(k, j) = oo, gk(i, k) = (i, k) and gk(k, j) = -1 (k, j) can be obtained from: k)^ V1 < i < k —1 dik > gt -1 (1, k) k) =^ k), gk _ 1 (i 1) + dik} V1 i < k — 1 dik < 4-1 (1, k) di-1 (k, gkl (k, j) = { j) (D.117) Vi < j < k — 1 dk, ? gt-1 (k, 1) min {gkl -1 (k, j), dki + gk_1(1, j)} V1 <j<k-1 dm<gki -1 (k, 1) (D.118) A significant amount of operations can be saved when dik > gkl -1 (/, k) or dk, > gi-1 (k, 1) is satisfied. Corresponding to the (2k + 1)-th iteration in DANTZIG, the inference network implementation for MDANTZIG requires k — 1 iterations. In each of these iterations, 2(k — 1) units are updated; and at each of these units, only one site is activated. So MDANTZIG requires a total number of 2(N + 2)(N — 1) iterations. Appendix E Determinants and Eigenvalues E.1 The Determinants of Some Matrices The following matrix are all n x n. a b b ••• b a b a b ••• b b a—b b b a ••• b b 0 ••• b b ••• b—a 0 ••• 0 a— b ••• 0 • •• 0 b—a b—a • •^• b ••• a 0 b a(a—br -1 a—b 0 - b(b — a)(a — b)n -2 (n — 1) (a + (n — 1)b)(a — b) n-1 8 2 — a l^0^0 •• • al Si al al •• • a2 82 a2 •• a2 0 82 — a2 0 a3 a3 s3 •• • a3 0 0 s3 — a 3 •• • a3 an an an • • • Sn a n — sn an • • • Sn an — sn — •• • a 2 sn n-1^n-1 Sn H (s i — ai)^a;^(si — ai) 1=1^=1 j=1 11(Si — ai)^ai II (si — ai) i=1^j=1 137 138 Appendix E. Determinants and Eigenvalues ^ s — al —al —a 1 • • • —al s — al 0 0 •• • al —a 2 s — a2 —a 2 • • • —a 2 0 s — a2 0 •• • a2 —a 3 —a 3 s — a3 —a3 0 0 .9 — a 3 .- - a3 an —a n —an —Sn —Sn • • • — — ( Sn s a o s n-1 E ss 3 - an n-2 ai i=1 (s — Eai)srl i =1 E.2 Eigenvalues of the Inference Network Connection Matrix Let P be an n 2 x n 2 symmetric matrix. Its element at row iN + j and column rN + s is pi;,,., = ub(i — r)b(j — s) — vb(i — r) — vb(j — s) u,v 0 where b(x) = {1 x = 0 0 x# 0 Matrix P can be divided into n 2 n x n submatrix: P = ••• Pin Pll P12 P13 P21 P22 P23 P2n P31 P32 P33 P3n Pnl Pn2 Pn3 ••• Pnn Appendix E. Determinants and Eigenvalues ••• —u — 2v —v —v —v —u — 2v —v —v —v —u — 2v ••• —v —v —v —v • •• —u — 2v where Pii = and Pi.; = 139 —v 0 0 ••• 0 0 —v 0 ••• 0 0 0 —v 0 0 0 Si V V Si —v —v •• • V V 0 0 • • • V 0 v 0 • •• 0^•^• V V • •• • 0 IsI — PI^= 0 v Si 0 0^• • • • 0 • 0 0 • v 0 v •• • v v S ^• • • v 0 • • v^• • • v 0 0 0 0 v •• • • • 0 •• • 0 •• • v •• • v •• • si • •• 0 0 si v v si • • • 0 0 0 0 v 0 0 • •• si v v 0 • •• • • • 0 • •• 0 0 • • • • • • 0 0 • • • • •• 0 V1 <i<n V 1 < i 0j < n 0 • —v • •• • •• • • • 0^• • • •• ••• v v Appendix E. Determinants and Eigenvalues^ 32 V V 32 • • • V 32 V • • • V • • • v V 32 •• • V • • • • • • = • • • V V • •• 32 V V v 0 • • • 0 si v 0 v . • • 0 v si • • • 0 0 •• • • • • v v • • • • •• • 32 • • • • •• v 0 • 0 v • •• 0 v • • 0 • •• si • •• • •• 0 0 -• v si v • •• v v si • •• v • •• si • • • 0 • •• 0 v 0 0 v 0 • •• • • • 0 V v 0 0 • • • v v • • V 0 • • • 32 V 0 0 V V .•• v 0 V 32 • • • • • • 32 • • • • • • v 140 • •• v • •• • •• v v Appendix E. Determinants and Eigenvalues^ 141 0 0^0 82 V ^ 0 0^0 ^ • 0 0^0^• • •^0 0 • • 0 32 • • • • •• • •• ^ v • • • .52^0^0^0^• • •^0^0 • • • 0 ^ 0^•^0^33 V^•^V^• • •^0^0 • • • 0 0^v^0^v 3 3 • • • v^. • •^0^0 • • • 0 • •• 0^0 • • • v^v^v • • • •• 0^0 • • • 0 0^0 0^0 • •• 33 V • • • V 0 v • • • 0^0 0^0 • •• ^ 33 • • • V • •• ^ V • • • 33 • 33 ••• ^ 0 • • • ...^• • • 0^0 • • • v^0^0 • • • 0 n-1 82 V • • • V^Si V •^V ^ 32 • • • V^V Si • • • V V^V • • • 32 where .s i = s u 2v, s 2 = 8+U+ V (E.119) V • • • Si (n 1)v, and s 3 = s u v, using the determinants derived in Appendix E.1, — PI =^u 2nu)(8 u) (n-1)2 (8 u nv) 2( ' ) Appendix E. Determinants and Eigenvalues^ 142 P has three distinct eigenvalues: A l = —(u 2nv), its corresponding eigenvector is e 1 = (1 , 1, . . 1) t = ( 1 , 1^1 xt N N" N ) A2 = —u, its (n — 1) 2 eigenvectors form a (n — 1) 2 -dimensional subspace; A3 = —(u nv), its 2(n — 1) eigenvectors form a 2(n — 1)-dimensional subspace. Since A l 0 A2 0 A3, el _L at2 _L R3.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A binary relation inference network for constrained...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A binary relation inference network for constrained optimization Su, Crystal J. 1992-12-31
pdf
Page Metadata
Item Metadata
Title | A binary relation inference network for constrained optimization |
Creator |
Su, Crystal J. |
Date | 1992 |
Date Issued | 2008-09-18T16:51:35Z |
Description | Constrained optimization is an essential problem in artificial intelligence, operations research, robotics and very large scale integration hardware design, etc. Many constrained optimization problems are computation-intensive, and yet require fast solutions. This thesis presents a novel parallel computing network designed for some optimization problems which can be represented by binary relations and the calculation upon them. Traditional graph search techniques provide sequential solutions to constraint satisfaction, but their speed is limited by the computational ability of a single processor. Parallel processing offers faster solutions by dividing its work load among many processors. A parallel algorithm can be developed from its sequential counterpart by decomposing the unfinished task dynamically according to the availability of processors. These kind of algorithms usually have difficulty in distributing work load evenly among processors [69]. Many distributed parallel computing models have been proposed for various types of problems. A distributed computation model usually embodies the data structure for a specific type of problem [16]. Many of them work in the discrete time domain [71, etc]. Neural network approaches have been proposed for optimization problems [28], however, optimal solutions are often not guaranteed when the gradient descent scheme is used. The determination of the link weights in a Hopfield-type neural network can be elaborate and convergence for large problems remains a problem [2, 90]. This thesis proposes a new parallel computing network — a binary relation inference network. The network topology is set up to match conventional optimization procedures. The network transforms some optimization problems (shortest path, transitive closure, assignment, etc.) into network convergence problems and divides the work load evenly among all processors. Link weights are known. It can operate in synchronous discrete-time, asynchronous discrete-time, and continuous-time domains. Inference network algorithms are derived directly from problem formulation, and are effective for a large range of problem sizes. The inference network consists of interconnected components; each component is composed of a unit, a number of sites attached to the unit, and links from the unit to sites on other units. Each unit is a simple computational engine. The topology of the inference network matches naturally to many optimization problems, since its deduction sites produce candidates to compete for the optimum and its decision-making units select the optimum from the site values. Either directly or through a transformation mapping to a systolic structure, discrete inference network algorithms can be implemented on a commercially available parallel processing facility, such as the Connection Machine. Analog inference network can be built using integrated circuits which solve problems efficiently in the continuous-time domain. In this thesis, mathematical analysis and numerical simulation were conducted for the synchronous, asynchronous, and analog inference network. Various applications has been discussed. The inference network has shown to solve the all-pair shortest path problem efficiently in various updating schemes. The inference network algorithm for the transitive closure problem was derived straightforwardly from the formulation of the problem and the topology of the inference network. Efficient continuous-time solution was obtained from a non-synchronized logical circuit. The convergence of the network for the shortest path or transitive closure problems was shown to be independent of the problem size. It was demonstrated that the inference network can solve the assignment problem in a way similar to the Hopfield net's solution to the traveling salesman problem. However, the former was shown to converge for all test cases with problem size up to 128. |
Extent | 6572227 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2008-09-18 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065233 |
URI | http://hdl.handle.net/2429/2258 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1993-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- [if-you-see-this-DO-NOT-CLICK]
- ubc_1993_spring_phd_su_crystal.pdf [ 6.27MB ]
- Metadata
- JSON: 1.0065233.json
- JSON-LD: 1.0065233+ld.json
- RDF/XML (Pretty): 1.0065233.xml
- RDF/JSON: 1.0065233+rdf.json
- Turtle: 1.0065233+rdf-turtle.txt
- N-Triples: 1.0065233+rdf-ntriples.txt
- Original Record: 1.0065233 +original-record.json
- Full Text
- 1.0065233.txt
- Citation
- 1.0065233.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
India | 20 | 0 |
United States | 12 | 8 |
Germany | 9 | 1 |
China | 8 | 8 |
United Kingdom | 8 | 0 |
Canada | 6 | 0 |
France | 4 | 0 |
Russia | 4 | 1 |
Indonesia | 2 | 0 |
Poland | 1 | 0 |
Iran | 1 | 0 |
City | Views | Downloads |
---|---|---|
Unknown | 34 | 19 |
Ashburn | 6 | 0 |
London | 5 | 0 |
Shenzhen | 4 | 6 |
Saint Petersburg | 3 | 0 |
Munich | 2 | 1 |
Beijing | 2 | 2 |
Sunnyvale | 2 | 0 |
Jakarta | 2 | 0 |
Montreal | 2 | 0 |
Hangzhou | 1 | 0 |
New Delhi | 1 | 0 |
Guangzhou | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065233/manifest