A BINARY RELATION INFERENCE NETWORKFOR CONSTRAINED OPTIMIZATIONByCrystal Jinghua SuB. A. Sc. and M. A. Sc., Tsinghua University, China, 1985 and 1988A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF ELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIASept. 1992© Crystal Jinghua Su, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature) Department of ^E- I^r^e=7-n5,1-7-eCreqjThe University of British ColumbiaVancouver, CanadaDate^-Tec!_^tz.DE-6 (2/88)AbstractConstrained optimization is an essential problem in artificial intelligence, operations re-search, robotics and very large scale integration hardware design, etc. Many constrainedoptimization problems are computation-intensive, and yet require fast solutions. Thisthesis presents a novel parallel computing network designed for some optimization prob-lems which can be represented by binary relations and the calculation upon them.Traditional graph search techniques provide sequential solutions to constraint satisfac-tion, but their speed is limited by the computational ability of a single processor. Parallelprocessing offers faster solutions by dividing its work load among many processors. Aparallel algorithm can be developed from its sequential counterpart by decomposing theunfinished task dynamically according to the availability of processors. These kind ofalgorithms usually have difficulty in distributing work load evenly among processors [69].Many distributed parallel computing models have been proposed for various types ofproblems. A distributed computation model usually embodies the data structure for aspecific type of problem [16]. Many of them work in the discrete time domain [71, etc].Neural network approaches have been proposed for optimization problems [28], however,optimal solutions are often not guaranteed when the gradient descent scheme is used. Thedetermination of the link weights in a Hopfield-type neural network can be elaborate andconvergence for large problems remains a problem [2, 90].This thesis proposes a new parallel computing network — a binary relation inferencenetwork. The network topology is set up to match conventional optimization proce-dures. The network transforms some optimization problems (shortest path, transitiveclosure, assignment, etc.) into network convergence problems and divides the work loadiievenly among all processors. Link weights are known. It can operate in synchronousdiscrete-time, asynchronous discrete-time, and continuous-time domains. Inference net-work algorithms are derived directly from problem formulation, and are effective for alarge range of problem sizes.The inference network consists of interconnected components; each component iscomposed of a unit, a number of sites attached to the unit, and links from the unit tosites on other units. Each unit is a simple computational engine. The topology of theinference network matches naturally to many optimization problems, since its deductionsites produce candidates to compete for the optimum and its decision-making units selectthe optimum from the site values. Either directly or through a transformation mappingto a systolic structure, discrete inference network algorithms can be implemented ona commercially available parallel processing facility, such as the Connection Machine.Analog inference network can be built using integrated circuits which solve problemsefficiently in the continuous-time domain.In this thesis, mathematical analysis and numerical simulation were conducted forthe synchronous, asynchronous, and analog inference network. Various applications hasbeen discussed. The inference network has shown to solve the all-pair shortest pathproblem efficiently in various updating schemes. The inference network algorithm forthe transitive closure problem was derived straightforwardly from the formulation of theproblem and the topology of the inference network. Efficient continuous-time solutionwas obtained from a non-synchronized logical circuit. The convergence of the networkfor the shortest path or transitive closure problems was shown to be independent of theproblem size. It was demonstrated that the inference network can solve the assignmentproblem in a way similar to the Hopfield net's solution to the traveling salesman problem.However, the former was shown to converge for all test cases with problem size up to128.iiiTable of ContentsAbstract^ iiList of Tables viiiList of Figures^ xAcknowledgements xiii1 Introduction 11.1 Constrained' Optimization ^ 11.2 Traditional Approaches for Constrained Optimization ^ 31.3 Parallel Approaches ^ 51.4 Scope of this Thesis 92 Topology of the Inference Network 112.1 Inference on Binary Relations ^ 112.2 A Novel Inference Network 122.3 Inference Network Topology ^ 152.4 Updating Schemes ^ 172.5 Transfer Function 172.6 Is the Inference Network a Neural Network? ^ 212.7 The Inference Network Cannot Be Treated as a Hypercube ^ 242.8 Examples and Potential Applications ^ 26iv3 Inference Network Implementation^ 323.1 Analog Inference Network 323.2 Discrete-Time Inference Network ^ 363.3 Mapping to the CM Using Local Communication ^ 374 Inference Network for the All-Pair Shortest Path Problem^414.1 The Shortest Path Problem ^ 414.2 Inference Network Algorithm for the Shortest Path Problem ^ 434.3 Algorithm Convergence ^ 464.3.1 Synchronous Algorithm Converges in at most flog 2(N —1)1 Iterations 464.3.2 Convergence of the Asynchronous Algorithm ^ 474.3.3 Convergence of the Analog Algorithm ^ 484.4 Relationship to other Algorithms ^ 504.4.1 A* Algorithm ^ 504.4.2 MATRIX Algorithm 514.4.3 FLOYD Algorithm ^ 524.4.4 DANTZIG and MDANTZIG Algorithms ^ 534.5 Unifying Some Established Algorithms Using an Inference Network . ^ 544.5.1 Floyd's Algorithm ^ 544.5.2 Revised Matrix Algorithm ^ 554.5.3 Dantzig's and modified Dantzig's Algorithms ^ 564.6 Comparison with Established Algorithms ^ 574.6.1 Parallel Processing with Optimal Hardware ^ 574.6.2 Performance on Connection Machine ^ 584.7 Discussion ^ 605 Inference Network for the Transitive Closure Problem 625.1 The Problem ^ 625.2 Unit and Site Functions ^ 635.3 Convergence Consideration 645.4 Implementation Using Logical Circuits ^ 685.5 An Example ^ 725.6 Discussion 736 Inference Network for the Assignment Problems 776.1 The Assignment Problem ^ 776.1.1^The Problem 776.1.2^Optimization Formulation ^ 786.2 Unit and Site Functions for Optimization ^ 806.3 The Decreasing Objective Function 816.4 TSP, the Hopfield Model and Comparison ^ 836.4.1^The Hopfield Model for the TSP 836.4.2^Comparison of the Assignment Problem with the TSP ^ 846.5 Dynamics of the Network ^ 866.5.1^Connection Matrix 866.5.2^Eigenvalues Move the Network Towards the Valid Sub-space 876.5.3^The Role of the Non-Linear Function ^ 896.5.4^The Complete Q Term ^ 896.5.5^Initialization, Updating Rate and Termination ^ 906.6 Determining Constants ^ 916.7 Simulation Results 926.7.1^Small Size Examples ^ 93vi6.7.2 Solve Large Problems on the CM ^ 976.8 Discussion ^ 997 Conclusions 1047.1 Summary ^ 1047.2 Contribution 1067.3 Open Issues and Future Work ^ 108Bibliography^ 112A Data Consistency Checking Using the Inference Network^119B Inference Network Algorithm for Matrix Inversion^123C A Weightless Neural Network for Consistent Labeling Problems ^126D Algorithms Related with the Shortest Path Problem^130D.1 Mapping the Shortest Path Algorithm onto the Connection Machine . . . 130D.2 Unifying Some Established Algorithms Using an Inference Network^. . 131D.2.1 Floyd's Algorithm ^ 131D.2.2 Matrix Algorithms 132D.2.3 Dantzig's Algorithm ^ 135E Determinants and Eigenvalues 137E.1 The Determinants of Some Matrices ^ 137E.2 Eigenvalues of the Inference Network Connection Matrix ^ 138viiList of Tables3.1 Connection Machine busy time (in seconds) to solve the all-pair shortestpath problem. Direct program: the CM is simply programmed as the in-ference network; Mapping: the inference network is mapped to a processorarray using local communication. 393.2 CM busy time (in seconds) for the inference network and a systolic algo-rithm to solve (I — A)-1 on a 16K CM. ^ 404.3 Execution time for shortest path algorithms executed on their optimalhardware configuration. a in RMATRIX is 2 or 3. ^ 594.4 The number of time units required to obtain the shortest paths whenimplemented on the Connection Machine. ^ 594.5 CM busy time for INFERENCE-CM (random), the worst case of INFERENCE-CM (worst), and Robert and Trystram's algorithm (Robert) to solve theshortest path problem given random distance graphs . ^ 605.6 States of unit, site and path indicator. ^ 716.7 CM busy times (in seconds) and assignment results for 20 random N = 64problems using inference network (A = B = 1 1 ,C = 7,4,Do = 4,p =0.996, Kd = 0.008) and corresponding result from the Hungarian method. 986.8 CM busy times (in seconds) and assignment results for 20 random N = 128problems using inference network (A = B = AA,C = N , Do = 4, p =0.998, Kd = 0.007) and corresponding result from the Hungarian method. 98viii6.9 Solution quality vs. problem size. 20 distinct random examples were usedfor each N. ^ 99ixList of Figures1.1 Relationship between some optimization problems. ^22.2 (a) An AND/OR graph indicating various ways to find the distance froma to b. (b) Substitute the 3-input AND relation in (a) with two 2-inputAND relations. ^ 122.3 (a) A complete AND/OR graph for three relations; (b) the correspondinginference network of (a). ^ 292.4 Unit (i, j) in the binary relation inference network. ^ 302.5 (a) A classical sigmoid artificial neuron; (b) a basic component in theinference network. ^ 312.6 (a) Topology of a 3-dimensional hypercube with 8 processors and 3 bidi-rectional links on each processor; (b) a full N = 3 inference network with9 units and 4 bidirectional links per unit. ^ 313.7 Examples of inference network components, (a) for the shortest path prob-lem, (b) for the transitive closure problem. ^ 333.8 (a) Torus structure of the inference network; (b) a circular disk. ^ 343.9 (a) A square structure with multiple sets of buses; (b) connection of a unitto bus lines. ^ 353.10 An analog inference network system. ^ 353.11 The systolic mapping of the inference network to the Connection Machine. 395.12 (a) Site and (b) unit functions for the transitive closure problem. . . . . 655.13 Essential components for a unit ^ 695.14 Site and unit time delay. 705.15 A path indicator for the transitive closure problem. ^ 715.16 An example of a 4-node graph (a), the intermediate result (b) and thetransitive closure (c).^ 725.17 Circuit for an N = 4 transitive closure problem. ^ 755.18 Hspice simulation of the inference network for a 4-node transitive closureproblem. ^ 766.19 Unit and sites in the inference network act like a neuron. ^ 816.20 (a) A = B = N1 , C = Do = 4A, p = 0.994, the larger Kd is,the faster the network converges; (b) A = B = A, Do = 4A, p =0.994, Kd = 0.01, the network converges slowly when C is small, andconverging speed is almost the same within a certain range of C; (c) A =B = N 1 , C = V, p = 0.994, Kd = 0.01, the larger Do is, the faster itconverges. 1026.21 Number of iterations vs. problem size. All benefits rii are randomlydistributed in [0,1]. A = B = C =^Do = 4A , Kd = 0 01N-1 7^N 7^-^d — ^tz!. 1) E [PO, V] 1036.22 Maximum C for the algorithm to converge vs. problem size. All benefitsrii are randomly distributed in [0, 1]. A = B = Ni , Do = 4A, Kd = 0.01,u!(i) E [1,2,^103B.23 Data flow in calculating A -1 . Each parallelogram contains data for onephase. * is an element in the original matrix; * is an element in L or U;0 is an element in L-1 or U-1 125xiC.24 Interconnected clusters form a weightless neural network for consistencylabeling problems. ^ 128C.25 A diagonal cluster G(i, i) in the weightless neural network. ^ 128C.26 A non-diagonal cluster G(i, j) (i > j) in the weightless neural network. ^ 129D.27 The order of updating length matrix in a forward (left) and backward(right) process ^ 134xiiAcknowledgementsI am grateful to my supervisor Dr. K.P. Lam for giving me inspiration, providing effectivedirection, allowing me to explore the areas I am interested in, and offering advice evenwhen he is working outside of UBC.I would like to thank Dr. H. Alnuweiri sincerely for taking over the duties as myresearch supervisor at a time of need. My thanks also go to my supervisory committeemembers Dr. P.D. Lawrence and Dr. R. Ward for spending time with me despite of theirbusy schedules. These people help greatly in this thesis presentation.I appreciate the institute of Advanced Computer Studies at University of Marylandand the Thinking Machine Corporation Network Service for granting the access to theirConnection Machines.Chapter 1Introduction1.1 Constrained OptimizationA Constrained Optimization Problem [56] has four major components: V, D, C and f,where V is a set of variables {v i , v2, • • • , vn }, and IVI = n (the cardinality of V is n);D = D1 x D2 x D3 x Dn is the vector of domains for all the variables in V, and Di isthe domain of vi - a set of values from which vi can take one; C is the set of constraintsfor subsets (of arbitrary sizes) of variables in V, and a constraint on a subset of variablesis a restriction on the values that they can take simultaneously; f is an objective functionto be optimized. Constraints can also be an integral part of the objective function, wherethe function measures the compatibility between the constraints and the solution.A valid solution for constrained optimization is an assignment of one value to each ofthe variables in V such that all the constraints are satisfied. An optimal solution is a validsolution which optimizes the objective function. The goal of constrained optimization is,therefore, to find the optimal solution. However, when there is no efficient way to getthe optimal solution (exact solution), a good valid solution (approximate solution) is alsoacceptable.Many practical problems can be interpreted as constrained optimization problems.The assignment problem, to be discussed in Chapter 6, is a typical example. Given Npersons, N jobs, and the benefit of each person getting each job, an one-to-one person-jobassignment is to be found to maximize the total benefit. Here we may take persons as1Chapter 1. Introduction^ 2variables and jobs as their domains. The constraint is each person getting a distinct job,and the objective function calculates the total benefit.The all-pair shortest path problem (Chapter 4) can also be formulated into a con-strained optimization problem, although not as obvious as the assignment problem.Given a map of N cities and road distances between some of them, how to find theshortest path between two cities? The variables in this problem are city pairs, and theirdomains are all the combinations of possible intermediate cities. The objective functionis the total distance between these city pairs. An implicit constraint is that if the shortestpath from city one to city three passes city two, then this path must be the concatenationof the shortest paths from city one to two and from city two to three [11]. The transitiveclosure problem is a path-finding problem related with the shortest path problem, andtherefore can also be considered as an optimization problem (Chapter 5).ED = min0 = +^e =^V 0=AETransitiveClosure^I^iI^I^II^. I^ILFind ^i LFindshortest pathsDataConsistencyconsistent relationsFigure 1.1: Relationship between some optimization problems.Figure 1.1 outlines some interesting relationships between the problems of shortestpath (page 41), transitive closure (page 62), data consistency checking ,(page 119) andChapter 1. Introduction^ 3matrix inversion. First, the Algebraic Path Problem (abbreviated to APP), is a gener-alization of several conventional problems [1, 57, 58, 71, 95]. Substituting the generaloperators ED and ® with specific ones, an algorithm for the APP [71, 70] can give solutionsto the shortest path problem, the transitive closure problem, and computing (I — A) -1 .Second, transitive closure, shortest path, and data consistency problems are related. Thetransitive closure problem determines whether a binary relation is defined for two nodesin a graph. The data consistency problem examines the consistency of a given set oftime/location referencing data, and subsequently derives a consistent set of data basedon the given data. The shortest path and data consistency problems, in addition to thedetermination of the existence of a binary relation, find a specific — the shortest or con-sistent — relation from a number of given or derived relations concerning the two givennodes.1.2 Traditional Approaches for Constrained OptimizationThe procedure of finding a valid solution to a constrained optimization problem is usu-ally called constraint satisfaction [56]. Symbolic Artificial Intelligence (AI) proceduresfor constraint satisfaction can be used in constrained optimization: generate all valid so-lutions using constraint satisfaction techniques, evaluate the objective function for eachsolution, and find the optimal. Generate and test [56] is the most straightforward ap-proach for constraint satisfaction. It simply generates all possible combinations of thevariable assignments and tests their compatibility with the constraints. Generate andTest is very inefficient and valid only when the variable domains are discrete and finite.A predicate is a function that maps an argument into a truth value (TRUE or FALSE).Backtracking [56] is a basic graph search algorithm which systematically explores vari-able domains by sequentially evaluating predicates in a specific order. As soon as anyChapter 1. Introduction^ 4predicate has all its variables evaluated, its truth value is determined. Whenever a predi-cate is evaluated as false, the backtracking procedure moves back to the last variable withunassigned values remaining in its domain (if any) and evaluate its next value. When thesize of a problem is large, backtracking is inefficient due to the combinatorial explosionof computations [67]. Searching with the help of heuristic information, for example A*search [64], may significantly reduce the computational cost. However, good heuristic in-formation is often unavailable. Mackworth [55] and Waltz [87] proposed to systematicallyreduce the search space by eliminating local inconsistency. Although these algorithmscan often significantly reduce the sizes of the variable domains, they usually have to becombined with backtracking to get final solutions (refer [55] and [87]).Simulated annealing [35] has shown significant potential for solving large-scale combi-natorial optimization problems. It is an iterative improvement procedure using controlleduphill step to achieve a global minimum. This procedure was used in the traveling sales-man problem [35, 68]. Linear programming [68] is a well-studied area for optimizationproblems and has many established algorithms to perform relaxation [9]. Probability re-laxation is a labeling procedure used in artificial intelligence and especially in computervision [4, 59]. When a variable domain is finite and discrete, each variable is assigned alabel which is associated with a weight or probability. Relaxation procedure moves thelabel vector around in order to satisfy all the constraints, as well as optimize the objec-tive function [4]. Dynamic programming [11] provides a general approach for optimizationproblems that can be broken down into smaller but similar problems. Dreyfus [11] ex-plores the mathematical formulations for problems like equipment replacement, resourceallocation, shortest path, and traveling salesman, etc.All these approaches have limitations. The major difficulties with symbolic AI con-straint satisfaction are its sequential nature and the combinatorial explosion. SymbolicChapter 1. Introduction^ 5AI algorithms usually can not be parallelized straightforwardly without serious commu-nication overhead (refer [69]). When the size of a problem is large, it is impossible toexplore and evaluate all the valid solutions. Simulated annealing is a sequential pro-cedure and does not guarantee the solution to be optimal [68]. Linear programmingand dynamic programming are not general methods for all optimization problems; theirefficiency depends critically on the recurrent formulation of the problem. Some of thealgorithms can be parallelized; others are computational-intensive and have a sequentialnature (refer [11]).1.3 Parallel ApproachesThe design of efficient techniques for parallel computers is strongly influenced by theassumption of the parallel computation model used. There are different classificationsof parallel machines; one of them is to divide parallel machines into shared-memorymodels [75] and distributed-memory models [75]. The main difference between the twomodels is the way in which communication among processors is handled.In a shared memory system, processors write and read data to and from a singleshared memory. The memory serves both as a communication medium as well as astorage medium. The PRAM (Parallel Random Access Machine) model assumes that allprocessors can perform a single access to the shared memory in 0(1) time, regardless ofthe data access pattern [18]. Within a single memory access period, all processors canperform either a read operation or a write operation but not both.A parallel algorithm can be developed from its sequential counterpart by decomposingits computation dynamically to multiple processors. Quinn [69] proposed synchronousand asynchronous models to perform backtracking (page 3). In the synchronous model,all unexamined subproblems are kept in a single priority queue stored in the memoryChapter 1. Introduction^ 6of a master processor; whereas in the asynchronous model, each processor keeps its ownpriority queue. Quinn's results show that the synchronous model predicts an eventualdecrease in speedup as the number of processors increases, due to the increased com-munication overhead. Quinn's asynchronous model has better performance. However,when the number of processors increases, theoretical speedup saturates and experimentalspeedup decreases [69].In a distributed-memory model, each processor has a local memory. Data exchangeis accomplished through inter-processor connections. Communication among processorsis restricted by the network topology, bandwidth, data movement pattern, and otherrelated factors [50]. To obtain good performance, small diameter (the maximum numberof times a message has to be forwarded from one processor to its destination), uniformcommunication, extensibility, short wires, and redundant paths are recommended [26].On the other hand, in order to reduce cost, minimum number of wires, efficient layout,fixed degree, and fitting to available technology are necessary [26].In the attempts to solve problems requiring a large amount of communication, variousinterconnection topologies have been proposed [6]. Some examples are crossbar, ring, tree,grid, torus and hypercube. Reconfigurable networks are used to achieve higher networkthroughput. An example is given in [3].Connectionist networks, neural networks and systolic arrays are some intensively-studied distributed-memory networks. Shastri [72] showed that structured connectionistknowledge representation can handle problems that have proven difficult for logic-basedapproaches. The connectionist approach uses distributed representation which reflectsthe way a human brain works [14]. Constrained optimization problems can be solvedby using such a connectionist network, in which constraints are represented by links andtheir weights [16]. Unit outputs v must be given proper interpretations so that a possiblestable state of the network corresponds to a meaningful state of the problem (refer [28]).Chapter 1. Introduction^ 7The presence and weight of the links, denoted by T, are pre-defined according to theknown convergence requirements such that the network will arrive at a desired stateeventually. Computational energy of the network E(T, v) can be defined indicatinghow much the state of the network disagrees with the given constraints (refer [28]).Optimization procedures move the outputs v in the space of possible states representingsolutions, progressing in the direction that tends to decrease the energy. When used forpolynomial-complexity problems, a connectionist network can be a vehicle to parallelizean otherwise sequential algorithm.The Hopfield network was used in solving the traditional traveling salesman prob-lem [31, 2]. The objective function was composed of four terms; three of them representthe constraints and one represents the length of the tour to be minimized. The resultis reasonably good fbr small size problems and its speed is acceptable. However, thevalidity and quality of the solution depend critically on the weight of each term in theobjective function [90]. It is very hard to determine these weights for this problem witha large number of cities [90]. Combinatorial optimization can be done in an analog neu-ral network with modified Hopfield's energy function and parameters [36]. The idea oftransforming a constrained optimization problem into a network convergence problemwas also used to solve the map coloring and Necker cube problems [15, 16].Winner-Take-All (WTA) is a decision-making mechanism used in distributed comput-ing situations. A WTA network is comprised of neurons connected by mutually inhibitorylinks [14]. Neurons in a WTA network compute their values based on such a rule: theneuron with the largest value keeps its value and the values of the rest of the neuronsare set to zero. A maximum neural network is composed of a number of interconnectedclusters where each cluster is a winner-take-all network. Maximum neural networks havebeen used for the bipartite subgraph problem [49] and the max cut problem [48]. It hasbeen proven [48, 49] that the maximum neural network always generates valid (optimalChapter 1. Introduction^ 8or near-optimal) solutions for these two problems and the link weights of the network areeasy to determine. When the maximum neural network is used for the max cut problem,the solution quality improves as time elapses [48]. Unlike the energy function used inthe Hopfield model, which is comprised of four weighted terms, the energy functions forthese two problems only contain one term which covers the constraints as well as thequantity to be optimized. Since there is only one coefficient involved in the energy func-tion, link weights can be easily determined by letting the energy decrease monotonically.Therefore, the two problems solved by Lee [48, 49] provide examples of a neural networkwith easy-to-determine link weights. The proposed inference network will be comparedwith some neural network models in Section 2.6 when the inference network topology isdefined.A systolic array is another approach to deal with some optimization problems. Severalsystolic architectures and algorithms were proposed for algebraic path problems (includ-ing shortest path, matrix inversion, and transitive closure) [71, 70, 12, 39, 23]. Thedefinitions for the shortest path and transitive closure problems can be found on page 41and page 62 respectively. A systolic architecture is a number of systematically connectedprocessors; each processor calculates and transfers data rhythmically in a pre-definedway. A systolic array is a discrete-time system; processing elements in the array mustbe properly synchronized to ensure a correct solution. Unlike many connectionist andneural network approaches, systolic algorithms do not involve relaxation.In distributed-memory models, a network topology can embody specific computa-tional structures needed to solve a certain class of problems [16]. A systolic structure fora given problem is usually derived by analyzing the data dependencies of the correspond-ing sequential algorithm for the problem (refer [52]). Grid and torus connection are mostcommonly used in parallel computers because of their simple hardware implementation.However, parallel algorithms written for these architectures often have to be carefullyChapter 1. Introduction^ 9designed in terms of data delivery and synchronization. The popular hypercube topol-ogy has a higher space-compression factor and a lower communication factor than whatis naturally needed in optimization problems (refer [43] or Section 2.7). For optimizationproblems, this thesis defines a parallel computing model in which both the evaluation ofthe candidates competing for the optimum and the selection of the optimum are explic-itly represented by the values of network components. It is a distributed-memory modelwhen it is used in synchronous discrete time domain.1.4 Scope of this ThesisThis thesis presents a novel parallel computing network — the binary relation inferencenetwork which is suitable for some constraint optimization problems, such as the all-pair shortest path problem, the transitive closure problem, and the assignment problem.Constrained optimization is an essential problem in artificial intelligence, operations re-search, robotics, and very large scale integration hardware design, etc. Many constrainedoptimization problems are computation-intensive, and yet require fast solutions.The approach proposed here is based on defining and using an inference networkwhose topology is derived from simple intuitive approach to optimization. It will beshown that the inference network algorithms are obtained from the formulation of theproblems, and they can be executed in synchronous discrete-time, asynchronous discrete-time, or continuous-time domains. It will be shown that the link weights of the inferencenetwork are easy to determine. The network will be shown to produce optimal solutionsto the shortest path (Chapter 4) and the transitive closure (Chapter 5) problems. Itdoes not guarantee an optimal solution to the assignment problem (Chapter 6) becausegradient descent will be used to decrease the network energy. The trade-off of elimi-nating synchronization is the large amount of communication between the elements inChapter 1. Introduction^ 10the network. With these communication channels, each decision-maker (a unit in thenetwork) always has access to all candidates competing for the optimum. Each elementof the network is a very simple computational engine.The thesis has the following contents. Chapter 1 is an introduction. Chapter 2 dis-cusses the topology and operation of the inference network. It also shows several struc-tural versions and updating mechanisms for different instances of optimization problems.Chapter 3 is about implementation issues. It discusses the components of the networkand their interconnection. Both analog and discrete implementations are studied. Atransformational mapping of the discrete-time inference network onto the ConnectionMachine is also introduced. The inference network algorithms for some constrained op-timization problems are discussed in Chapter 4 to Chapter 6, and results are comparedwith other established methods. Chapter 4 gives the synchronous, asynchronous, andcontinuous-time algorithms for the shortest path problem and discusses their conver-gence. The network can also unify several other algorithms for the problem in terms ofdifferent inference network operations. The efficiencies of various algorithms are also dis-cussed. Chapter 5 gives the inference network algorithm for the transitive closure problemand the continuous-time implementation of the algorithm using logic gates. Chapter 6derives algorithms for the assignment problem and discusses network dynamics in pur-suing approximate solutions. A decreasing weight is used in the energy function for thequantity to be optimized, which results in good network convergence. Finally Chapter 7summarizes the contributions of this thesis and discusses future work.Chapter 2Topology of the Inference Network2.1 Inference on Binary RelationsA node is an item. It can be a node in a graph, a city in a map, or a time instance in aschedule, etc. A relation associates two or more nodes. A binary relation R(x, y) is eithera qualitative or quantitative relation between two nodes x and y. A binary relation ismore basic than a general m-ary relation R(xi , x2, ..., x,n) for m distinct nodes, which canoften be defined by a set of binary relations about the m nodes, together with a set ofinference rules on the binary relations. Examples of inference rules are Unary and binarydeductions which draw a conclusion from one or two premises respectively.The relation 'x > y' is a binary relation. The relation `u is the largest among u, v, w'is a 3-ary relation which can be derived from binary relations `u > v' and `u > w'. Unarydeduction obtains y > x from x < y, whereas binary deduction obtains x > z from x > yand y > z. A relation between x and z may also be derived using another intermediatenode u, for example x > u and u > z where u y. The final inference result of therelation between x and z is determined based on all possible intermediate nodes.Consider a problem with N nodes denoted as 1,2, ... , N where the relations betweenthe nodes can be fully represented using binary relations R(i, j) (1 < i , j < N arenodes). Operations on the relations follow the format: R(i, j) and R(j, k) R(i, k)(deducing relation R(i, k) from relations R(i, j) and R(j, k)). There are other ways todeduce R(i, k): from R(i, m) and R(m, k), R(i, n) and R(n, k), etc. Inference takes place1 1Chapter 2. Topology of the Inference Network^ 12to determine relation R(i, k) based on various deduction results.2.2 A Novel Inference NetworkThe inference network is a novel topology for solving constrained optimization problems.To the best of the author's knowledge, no research has been reported on networks withthe same architecture and operations.The inference network architecture aims at representing optimization procedures ex-plicitly. Its components are interconnected in such a way that a decision maker alwayshas access to the candidates competing for the optimum.(a)^ (b)Figure 2.2: (a) An AND/OR graph indicating various ways to find the distance from ato b. (b) Substitute the 3-input AND relation in (a) with two 2-input AND relations.The basic inspiration of the inference network topology came from a conventionalAND/OR graph [64]. An AND/OR graph is a notation used in artificial intelligenceto represent logical relations. As shown in Figure 2.2, arrows are tied up by arcs intogroups. The logic between the relations tied up by an arc is AND — all the tied relationsChapter 2. Topology of the Inference Network^ 13are needed to derive a result. The logic between any two groups of relations is OR— each group represents an independent way to derive the result. Figure 2.2(a) showsvarious ways to calculate the distance from a to b. The distance may be obtained fromthe distance from b to a alone, OR, from the distances a c AND c b, OR, fromthe distances a d AND d e AND e -4 b. Since m-ary deduction can often bedecomposed into a number of binary deductions, an AND/OR graph with only 2-inputAND relations is often sufficient to represent the logic relation among a set of nodes.Fig. 2.2(b) is a variation of Fig. 2.2(a) and only uses 2-input AND relations.The AND/OR graph is a scheme to describe logical relations. It is not a parallelcomputing network; however, it shows how to derive a conclusion from given data, andhow the given data are to be associated. Based on an AND/OR graph, the inferencenetwork was proposed which is composed of units, sites and links. The correspondenceand descriptions of the entries in the inference network and an AND/OR graph are givenin Figure 2.3.Figure 2.3 contains all the binary deductions which can be done to the three binaryrelations. In this figure, any distance can be calculated from the other two distances. Onthe other hand, any distance is used in evaluating another distance. Given a set of Nnodes, there will be up to N(N — 1) binary relations between the nodes. Generally, aresult R(i, j) can be deduced from any two relations R(i, l) and R(1,j). If all the possiblebinary deductions for an N-node problem are shown in a graph, the graph will become acomplicated, regularly connected, closed network with a large number of units. How canone ensure that such a closed network will not oscillate? The convergence of a parallelcomputing network can be analyzed by defining and examining the computational energyof the network. The Hopfield model has shown an example of using an energy functionin convergence analysis.The topological distinction between the inference network and neural networks orChapter 2. Topology of the Inference Network^ 14other parallel computing models is that the inputs to an inference network unit aregrouped into pairs (sites), and the evaluation of each pair (corresponding to site calcu-lation) is independent of the evaluation of other pairs. Figure 2.5 shows a unit with itspaired inputs, as well as a typical neuron with all inputs connecting to it directly. Atypical neuron collects all of its inputs and produces an output accordingly. An inferencenetwork unit is able to update its value based on the change of any of its input pair. Thereason to group the inputs is that the result obtained from one relation pair (for examplea c b) can be used to update the relation (a -.4 b) without waiting for results fromothers (a --od—).e--■borb-i a).Another difference between the inference network and some other parallel computingmodels (for example a systolic array) is that the former can be updated synchronously,asynchronously, or in continuous-time. This is a feature related with the structure of theinference network. To be more specific, the grouping of unit inputs into pairs makes itpossible to compute candidates for optimum independently. An AND/OR graph showsvarious paths to evaluate a relation using other relations, and the evaluations throughdistinct paths are independent. Again let us look at Figure 2.2, the calculation of distancea --0 c b is independent of the calculation of a d b. This indicates thatthe evaluation of sites can be parallelized. For some applications, the evaluation of allrelations can be active at the same time. For example, distance d b and a —' b canbe calculated simultaneously. When the distance a d e --) b is not known, distancea b is determined by a —0 c b and b a. After a —■ d —0 e b is obtained, distancea b can be updated (if necessary) based on the new information. This analysis showsthat for some applications, both the units and the sites in the inference network can beupdated synchronously or asynchronously (Section 2.4).Chapter 2. Topology of the Inference Network^ 152.3 Inference Network TopologyThe binary relation inference network consists of simple computational elements andoperates by communicating between the elements. Each computational element is com-posed of a unit, a number of sites attached to the unit, and links from the unit to a siteat another unit, refer Figure 2.4. Unit and site functions are usually obtained from ap-plication considerations. Just as described in the previous section, a site output derivesa result from two relations, and sites attached to a certain unit provide different ways toderive a relation. A unit determines a binary relation based on its site values. In general,site functions represent deduction rules or calculate factors competing for the optimum;and unit functions resolve conflicts or perform optimization. The dynamic state of thenetwork is determined by its link weights, unit and site functions, initial state, and up-.dating mechanism. The network is named inference network because conclusions areobtained by binary deduction and multi-input conflict resolution.We will use ui; for the value of unit (i, j) and ski/ for the value of site 1 at unitFigure 2.4 shows unit (i, j) in the inference network. Relation u11 and arriveat site 1 of unit (i, j) and produce a result siii — relation between i and j accordingto site 1. At the same time, other results about i and j — so, • • • , ski's/ — canalso be derived. Unit (i, j) is responsible to derive a relation between i and j basedon the site values so, sij2, • • • SijN. The key features of this topology which make itsuitable for optimization problems are its deduction ability at each site and the comparingability at each unit. A 2-input site performs binary deduction or calculates a candidatecompeting for the optimum, for example, it calculates the distance from i to j via aspecific intermediate node. A multi-site unit provides the platform for multiple-inputdecision-making or optimization, for example, finds the shortest path from i to j.For an N-node problem, the general inference network contains N 2 units. For allChapter 2. Topology of the Inference Network ^ 161 < i,j < N, unit (i, j) has N sites, namely site 1 (1 < 1 < N). Unit (i, j) has 2Noutgoing links leading to site j at units (i, 1) and site i at units (/, j), where 1 < 1 < N.Site 1 at unit (i,j) has two incoming links from unit (i,l) and unit (/,j). A unit is asimple computational engine which is able to calculate its value based on its N site valuesusing a simple unit function. The ability required for a site is to evaluate a simple sitefunction of two variables — its two inputs. Like a neuron in an artificial neural network,a unit does not require any sophisticated processor, nor does it require a large memory.For many applications, diagonal units (i, i) can be omitted, hence the total numberof units is reduced to N(N — 1), the number of sites at each unit reduced to (N — 2), andthe number of outgoing/incoming links from/to each unit reduced to 2(N — 2). The sitelabel 1 at unit (i,j) satisfies 1 < 1 < N and 1 i j. For applications with undirectedgraphs, d,, = d1, (symmetric distance matrix), the network size can be cut down by half:only C(N, 2) units in the network'. Each unit in this case still has (N —2) sites and eachsite has two incoming links; unit (i,j) and unit (j,i) will be used to refer to the same unitin the network.For clarity, Figure 2.4 only shows the sites and links for unit (i, j). However, otherunits also have N sites and each site also has two incoming links. If all the sites and linksat each unit are put into Figure 2.4, the complete inference network will become a closednetwork with feedback. If the N 2 units of the network are placed into a 2-dimensionalarray on a plane, and unit (i, j) stays at row i and column j, then only the units inthe same row or column communicate among themselves. There is no direct connectionbetween unit (i, j) and unit (v, w) if i, j, v, w are four distinct nodes in the problem.1C(N, 2) = N * (N — 1)/2 is the number of different 2-object combinations from N objects withoutconsidering the order.Chapter 2. Topology of the Inference Network^ 172.4 Updating SchemesThe inference network works in both discrete and continuous-time domains. The discreteinference network is one which operates rhythmically with a clock. It can work in syn-chronous, asynchronous, or partially synchronous modes. In synchronous updating, allunit values are updated at the same time based on the unit values at the previous step.Asynchronous updating changes one unit at a time. The new value of the unit is thenused in the updating of the next unit. Partially synchronous mode combines these twoschemes — updating some of the units synchronously at a time. The number of iterationsrequired to complete a task shows the speed of a discrete algorithm. An iteration is usedto stand for the process that can be performed on the inference network, given the resultsof the previous iterations. Each iteration comprises the simultaneous binary deduction(evaluation of sites) at all units and their subsequent conflict resolution (evaluation ofunits).In the analog inference network, all units constantly change their values until a stablestate is reached. Updated unit values are used by other units right away. The timerequired for the network to arrive at the stable state indicates the speed of an algorithm.A unit value can be calculated after all its sites are evaluated in parallel. It is also possibleto update the unit value as soon as one or more sites are evaluated. The reason behindthis is that optimization can take place whenever a candidate is generated or updated;the unit value will always be the optimum-so-far. For analog implementation, it is notdifficult to update a unit as soon as its site changes (refer Figure. 3.7).2.5 Transfer FunctionA site function SO is a function of two unit values:= S(uii,Uti)^ (2.1)Chapter 2. Topology of the Inference Network^ 18A discrete unit function U() is a function of its previous unit value and all its site values:(k+1)^(k) (k+1) (k+1)^(k+1)U )t'i U' j SO^Si j2^ SijN (2.2)A continuous-time unit function U() is a function of its own value and all its site values:uii(t + 6) = U (uii(t),^+ 6), sii2 (t + 6), . , siiN (t + 8))^(2.3)For an optimization problem, the candidates competing for the optimum are usuallyindependent of each other, which means the unit function is separable:ui;^. . . U'(^u'( 12, 13) ) .)^ (2.4)where 10, 11, 12,^, 1N is an arbitrary permutation of so, so, 84N, and uq. U'() isa 2-variable function related with U(). For example, if U() is a minimization of N 1variables, then U'() is a minimization of two variables. For many unit function UO, theN 1 variables in U() can be associated randomly and minimized in an arbitrary order.The state of the inference network can be described by all its unit values, which canbe put into an N2-dimensional vector= (u119U129 9U1N9 U21, U22, • • • , UN NUpdating of the network can be characterized by a transfer matrix P if: the network updates in synchronous discrete mode or continuous-time domain; unit function U() and site function S() are linear and are the same for all the unitand sites; unit function is separable.For synchronous discrete updating, the state of the network is determined by equationu(k+1)^Pu(k)^ (2.5)Chapter 2. Topology of the Inference Network^ 19For the analog network, network state can be determined by equationdu pdt^u (2.6)where the transfer matrix P is an N2 x N2 sparse matrix whose element at row (iN+j)and column (kN + 1) is PiN-i-j,kN-1-1. PiN+j,kN+1 shows whether and how a binary relationconcerning nodes i and j is related with another binary relation concerning nodes k and1.piN+JAN+1 = aS(i — k)6(j — 1) + b6(i — k) cS(j — 1)^(2.7)where a, b, c are constants determined from applications andS(x) = 1 x = 00 x 0 0The first term of piNi_jAN+1 shows that a unit has feedback from itself; the second andthird terms of PiNi-j,kN+1 indicate that a unit interacts with the units in the same row(i = k) or the same column (j = 1). When i k and j 1, pust+j,kN+1 = 0, which meansa unit does not communicate directly with units which are not in the same row nor inthe same column with it.For the assignment problem, to be discussed in Chapter 6, a transfer matrix P isdefined whose elements satisfy Eq.(2.7). Both Eq.(2.5) and Eq.(2.6) hold for the appli-cation. The transfer matrix is used to examine network convergence for the problem(Chapter 6). For the shortest path and transitive closure problems, to be discussed inChapter 4 and Chapter 5 respectively, a transfer matrix can also be defined indicatinglink weights. For these two applications, a, b and c in Eq.(2.7) are all one. However,since these two applications use non-linear site and/or unit functions, the updating ofnetwork state cannot be described by linear equations Eq.(2.5) or Eq.(2.6).Chapter 2. Topology of the Inference Network ^ 20When the inference network is used to solve an optimization problem and is sta-bilized at a solution, the unit value for each unit (i, j) is denoted as u71P Vi, j. Al-though u7 are unknown before the problem is solved, they can sometimes be ex-pressed in a mathematical expression. For example, in the shortest path problem,t47.7 = min{ min{ (uit uu)}}. When the expression for ti7P is known as a func-tion of unit values, the energy function of the network can have the following form:E = E E(u - un2 (2.8)where uji is the current value of unit (i, j), and t477P is an expression of the expectedvalue of unit (i, j) when the network is stabilized. The energy function indicates thedistance of the current network state from its stable state. When the network arrivesat the expected solution, uii = Vi, j and E = 0. When changes in continuoustime domain, and lien ui; = u7P, then energy E will be sufficiently close to zero whent—ootime t is sufficiently long. Plugging the expression for uker into the energy function, thederivative of the energy can be used to analyze the convergence of the network.In some applications, it is impossible to know 47 before solving the problem. How-ever, the constraint of the problem can be put into a mathematical term E, which equalsto zero when the constraint is satisfied and is greater than zero otherwise. Meanwhile,the optimization problem is to minimize an objective function Eo under the constraint.Under this circumstance, the energy function can be defined as E = AoE, A.E0 wheredtl, and Ao are constants. The objective is to decrease the energy until arriving at a stateat which Ec = 0 and E, is minimized. The Hopfield model is an example. Giving eachterm proper weight is critical here if gradient descent is used to decrease the energy. Forsome graph partition problems, Lee shows [48, 49] that both the problem constraint andthe objective function can be expressed in a single term Ec , therefore the energy func-tion E = AEco contains only one coefficient, and there is no need to adjust the weightsChapter 2. Topology of the Inference Network^ 21between the constraint and the objective function.When linear unit and site functions are used in synchronous or continuous-time up-dating, the energy function has the form ofE = utPu^+R^ (2.9)where u is a vector of all units, P is the above transfer matrix, Q and It are constantvectors. Refer Chapter 6 for the derivation of Eq.(2.9). Although units in the network areconnected with feedback, oscillation will not occur in the electronically implemented net-work, as long as the energy function decreases monotonically, that is, in the mathematicalterm, < 0.2.6 Is the Inference Network a Neural Network?An artificial neural network is an abstract computational model based on the structure ofanimal brains. Individual neurons do not transmit large amounts of symbolic information.Instead, they compute by being appropriately connected to a large number of similarneurons [14]. A number of neurons form a state and represent a concept (distributedrepresentation, refer the Necker Cube example in [16]). The operation of each neuron isvery simple. A neural network often has a uniform structure. The change of the stateof a neuron aims at reducing network energy, and is often in the direction of gradientdescent (for example, the Hopfield model for the traveling salesman problem). Neuralnetwork approaches focus on parallelism, robustness, adaptation, and learning. Neuralnetworks usually solve an optimization problem right from its definition (define an energyfunction which is composed of constraints and the quantity to optimize). Algorithms forsome types of neural networks involve a lot of learning and training to determine linkweights [27]. The objective of network updating is to decrease its energy, and link weightsChapter 2. Topology of the Inference Network^ 22are set up to achieve this goal. Artificial neural networks have both digital and analogversions.The inference network is a parallel computation model. When it is used synchronously,it is a distributed memory model. When it is used in continuous-time (without a clock),it is a circuit without the kind of memory in a von Neumann machine. Some electroniccomponents in the circuit (capacitance, etc) act like memories to keep voltage valuesetc. Its topology is a reflection of an optimization process — evaluating candidates anddetermining the optimum. Each processor has a specific mission — an explicit reason whyit performs the operation. Unit and site functions are defined to be essential proceduresin optimization. Proper definition of the unit and site functions results in a decreasingenergy function. The network has a uniform structure and units are identical. Thenetwork does not involve training and learning. All link weights are known. Algorithmsare developed right from problem formulation. The network can operate in both discreteand continuous-time domains.Based on the above information, the inference network can be regarded as a neuralnetwork with known link weights. However, in the inference network, the sites at a unitare independent of each other. Changes at any site (or sites) can be used to update a unitvalue. In other words, a unit does not have to collect all its site values to calculate itsown value. When a new candidate arrives, the optimum-so-far will be compared with thenew candidate to find a new optimum. In a neural network, a neuron usually collects allthe inputs and then produces an output (for example, a weighted sum). The inputs arenot explicitly divided into groups. Figure 2.5 compares an inference network componentwith a classical sigmoid artificial neuron.It is interesting to compare the inference network with a specific type of neural net-work, namely, the maximum neural network [49]. In a maximum neural network, theclusters are responsible for decision-making, just like the units in the inference network.Chapter 2. Topology of the Inference Network^ 23However, all neurons in a cluster, except the one with the highest value, are set to zeroonce the decision is made; whereas a unit in the inference network chooses the optimumfrom site values, but it does not reset the site values. Each unit in the inference networkhas one output; whereas in a general maximum neural network, all neurons in a clustersend out their values. Neither the maximum neural network nor the inference networkrequire a rigorous synchronization procedure. The inference network guarantees optimalsolutions to the shortest path (Chapter 4) and transitive closure (Chapter 5) problems.Its algorithm for the assignment problem (Chapter 6) provides valid near-optimal solu-tions. The maximum neural network provides valid near-optimal solutions to the maxcut and bipartite subgraph problems [48, 49].An inference network with N2 units can be considered as a sub-net of a maximumneural network with-N2 clusters. The former has less links. If the extra links are takenout from an N2-cluster maximum neural network, and the winner-take-all rule for eachcluster is defined as the unit function, the N2 cluster maximum neural network willbecome an inference network. A fully connected N 2-cluster maximum neural network ismore powerful than an N 2-unit inference network because the former uses more links.Despite the similarities between the maximum neural network and the inference net-work, the inference network topology does not naturally match problems like max cutand bipartite subgraph, and the author is not aware of any solution on a maximum neuralnetwork to solve the shortest path, transitive closure or the assignment problems.In Lee's solution to the bipartite subgraph problem [49], the value change of eachneuron is based on the gradient descent of the function to be optimized, although winner-take-all scheme is used in each cluster. Even if the maximum neural network is used tosolve the shortest path or the transitive closure problem, as long as the states of theneurons change in the direction of gradient descent, whether the solution is optimal isstill questionable.Chapter 2. Topology of the Inference Network^ 242.7 The Inference Network Cannot Be Treated as a HypercubeA hypercube [26, 75] is a topology commonly adopted by many parallel machines. Theinference network is a topology derived from simple intuitive procedures of optimization.Node connectivity is defined as the number of distinct links connected to a hypercubenode or an inference network unit, which is a constant for both the hypercube and theinference network. The ratios between any two of node connectivity, total number ofnodes (units), and total number of links represent some features of a topology. Theinference network cannot be treated as a hypercube because its ratios are quite differentfrom the corresponding ratios of a hypercube. Figure 2.6 shows the topology of a simplehypercube and a simple inference network.If we define the space compression factor as the ratio of total number of nodes (units)to node connectivity, then the hypercube topology has a higher space-compression factorthan the inference network. For a problem with size N, the inference network has N 2units. If a hypercube with the same number of nodes is to be used (2 M = N2), thedimension of the hypercube isM = log2 (N2) (2.10)The node connectivities for the inference network and the hypercube are 2(N — 1) andM respectively. For any N > 2, the M which satisfies Eq.(2.10) is smaller than 2(N — 1).This indicates that the amount of communication required naturally by optimizationproblems is higher than the hypercube with the same number of processors can provide.The following table shows the connectivities of the inference network and a hypercube.Refer Figure 2.6 for illustration.Inference network topology for directed problemsChapter 2. Topology of the Inference Network^ 25problem size ( N)connectivity ( 2(N — 1) )total units( N2 )total links ( (N — 1)N2 )Hypercube topologyhypercube dimension ( M)connectivity ( M)total nodes (2M )total links ( M * 2(M-1) )3 4 5 64 6 8 109 16 25 3618 48 100 1803 4 5 63 4 5 68 16 32 6412 32 80 192Nevertheless, it is possible to implement the inference network on a piece of hardwarewhich has a hypercube topology. In this case, a link in the inference network has to beimplemented as an indirect path through several processors. The transformation mappingof the inference network to a systolic architecture, to be discussed in Section 3.3, is anexample.The communication requirement of the inference network is a major challenge tocomputer technology. Each unit in the N 2-unit network has links to 2N other units. Thisamount of communication cannot be accommodated by many existing digital parallelprocessing facilities. However, since modern technology is able to build a fully connectedanalog neural network chip (for example with 50 neurons), it is technically possible tobuild a VLSI chip for the inference network.Chapter 2. Topology of the Inference Network ^ 262.8 Examples and Potential ApplicationsLet us look at some application examples of the inference network. The objective of theshortest path problem is to find the shortest path between two cities in a given distancegraph. In this application, a unit value represents a distance between two cities. Unit(i,j) looks for the shortest path from city i to city j. Site 1 on this unit calculates thedistance from city i to city j via city 1 by adding two unit values. Therefore, each siteon the unit provides the distance of one path and the unit obtains the shortest path byminimizing the site values.Another example is to use the network for data consistency checking. Given N eventsand the time intervals between any two of them, the objective is to check whether thesetime intervals are consistent with each other. The inference network for this problemis composed of )1(14-7.4) units, and each unit (i,j) represents the time interval betweenevent i and event j. At site k, the time interval between event i and event j is obtainedfrom intervals from event i to event k and from event k to event j. The given data areconsistent if the intervals between i and j from all sites are equal. In this application,site function is an addition and unit function checks consistency of site values.The inference network topology can be extended to a weightless neural network forthe consistency labeling problem. The weightless neural network consists of clusters ofneurons. Each cluster (i, j) is more complicated than a unit (i, j), however, it also worksfor one binary relation — the consistency between the labels for object i and object j.The interconnection between the clusters is the same as the interconnection between theunits: cluster (i, j) communicate with cluster (i, k) and cluster (k, j) for all k's. Cluster(i, j) eliminates inconsistent labels for object i and j, and whenever the eligible labels forobject i (or object j) change, the information is passed to cluster (i, k) (or cluster (k, j))for all k's. Details about this weightless neural network can be found in Appendix C.Chapter 2. Topology of the Inference Network^ 27The inference network can be applied to an N-node problem if the problem has thefollowing features:1. Since a unit only keeps a binary relation, the problem must be fully represented bya set of binary relations uu, where i and j are nodes of the problem. The natureof the relations uu are the same for all i and j;2. With the links provided by the inference network, the state of the problem mustprogress through the interaction of the binary relations, and relation u u is onlydirectly used in determining relation uji and u12;3. Due to the structure of each site, candidates for the optimal flu are obtained froma binary function on /Ai/ and tiu;4. Deduction rules are defined to derive sib( from ua and uu;5. Decision-making rules are defined to determine relation u12 from s ill , so, • • • ,sio;6. An energy function can be defined for the problem, and through the site and unitupdating, it will eventually arrive at a global or local minimum when the networkis stabilized.Among the above features, the first two are the most important, and they can beused to determine if it is possible to solve a problem on the inference network platform.Although the longest path problem — finding the longest path in a general graph —sounds similar to the shortest path problem, it can not be solved on the inference networkbecause the second and third requirements are violated. In the shortest path problem,the shortest path length from node i to j always equals to the total of the shortest pathlengths from i to k and from k to j for all ks. However, the longest path from i to joften is not the concatenation of the longest path from i to k and from k to j for anyChapter 2. Topology of the Inference Network ^ 28k. Despite that the shortest path problem is solved nicely on the inference network, thelongest path problem cannot be solved easily on the same network. If a relation betweeni and j cannot be determined from the relation between i and k and the relation betweenk and j, the kind of deduction is not supported by the inference network.The inference network is not suitable for any problem that cannot be fully representedby a set of simple binary relations. For example, if there are more than one salesmenin the traveling salesman problem, and the salesmen's destinations are associated withsome priorities, it is not easy to define binary relations ui; Vi, j such that they canfully represent a state of the problem. On the other hand, even if it is possible to fullyrepresent the complicated traveling salesman problem using a set of binary relations,the deduction upon the relations is probably too complicated for the inference networkplatform to support (the second and third requirements above).For what kind of optimization problems can the inference network produce global op-timal solutions? In the inference network, each unit performs local optimization based onits inputs and the unit function. The shortest path (Chapter 4), transitive closure (Chap-ter 5), data consistency checking (Appendix A), and consistency labeling (Appendix C)are a few examples with a common feature: local optimization of a binary relation isalways advantageous to the optimization of other binary relations. For applications withthis feature, the inference network produces globally optimal solutions. For example, inthe shortest path problem, any unit with a non-optimal path continues to optimize andactivates other units to change accordingly. Once all units have settled down, a solutionto the all-pair shortest path problem is found.Chapter 2. Topology of the Inference Network^ 29Inference Network AND/OR graph^Description unit^oval node^a binary relation to be optimizedsite arc for AND^associating two relations for a resultlink^arrow^showing dependence of relationssites on a unit^untied OR logic^independent ways to derive a relationFigure 2.3: (a) A complete AND/OR graph for three relations; (b) the correspondinginference network of (a).Chapter 2. Topology of the Inference Network ^ 30General configuration:1 < i,j,/ < NN2 units, N sites per unit2N incoming links per unitNo-diagonal-units configuration:1 < i,j,I<N 10i0jN(N — 1) units, N — 2 sites per unit2(N — 2) incoming links per unitUndirected network configuration:1 < i,j,/ <N 10i0 junit (i,j) coincides unit (j, i)N(N — 1)/2 units, N — 2 sites per unit2(N — 2) incoming links per unitFigure 2.4: Unit (i,j) in the binary relation inference network.Chapter 2. Topology of the Inference Network^ 31 ••■•■•••■■•011. v u iiwhere D andcan be either linear ornon-linear functions(b)Figure 2.5: (a) A classical sigmoid artificial neuron; (b) a basic component in the inferencenetwork.( a ) (b)Figure 2.6: (a) Topology of a 3-dimensional hypercube with 8 processors and 3 bidirec-tional links on each processor; (b) a full N = 3 inference network with 9 units and 4bidirectional links per unit.Chapter 3Inference Network Implementation3.1 Analog Inference NetworkEach unit in the inference network is a simple computational engine. The followingchapters will show that for a variety of applications, operations involved in the site andunit functions involve only a few basic arithmetical operations. Therefore, a unit canbe built by wiring up some operational amplifiers or logic gates operating in continuoustime. An output of a site is sent to a single unit as input; an output of a unit drives anumber of sites.Figure 3.7 shows two units and the sites attached to them. Figure 3.7(a) is a networkcomponent for the shortest path problem. An analog unit for a 5 city shortest pathproblem was built and tested [91]. The value of a unit is the voltage measurement at theoutput of a unit. Each site is an operational amplifier to add two voltages. Each unitis composed of some voltage comparator to choose the minimum voltage among the siteoutputs. The inference network algorithm for this application will be given in Chapter 4.Figure 3.7(b) is a unit and its attached sites for the transitive closure problem. A 'high'unit value indicates that there is a path between the two nodes. Each site is an ANDgate determining the existence of a specific path between two nodes. The unit is an ORgate determining the existence of any path between the two nodes. Detailed discussioncan be found in Chapter 5.What spatial configuration will the units form if the inference network is to be built32Uui,2U2j C0mparat0rU i,NU.(a)Chapter 3. Inference Network Implementation^ 33physically? Let us answer the question by looking at the connectivity of the units. First,if there is a link from the first unit to a site at the second unit, then there must be a linkfrom the second unit to a site at the first unit, refer Figure 2.3. Therefore, bidirectionallinks can be physically used. Second, all units are not directly connected. If the N 2 unitsof the inference network are placed on a 2-dimensional array and unit (i, j) sits at row iand column j, only units within the same row or column have links in between.(b)Figure 3.7: Examples of inference network components, (a) for the shortest path problem,(b) for the transitive closure problem.In light of these features, a torus structure, as shown in Figure 3.8, can be used.It is formed by layered circular disks and a cylindrical surface wrapped on these disks.Each unit locates at the center of a layer, and communicates with all other relevant disksthrough bidirectional links in the cylindrical surface. For a directed problem with sizeN, there are N3(N — 1) bidirectional links on the cylindrical surface.Since only the units in the same row or column are connected, it is possible to usemultiple buses, and each bus only contains unit signals in the same row or column. Fig-ure 3.9 shows this structure. Each unit is connected with a vertical bus and a horizontalChapter 3. Inference Network Implementation ^ 34•Figure 3.8: (a) Torus structure of the inference network; (b) a circular disk.bus; each bus contains signals from units in a particular row or column. Units get inputsfrom two bus lines, calculate outputs, and send results back to the two bus lines to driveother units. The width of each bus is N and there are 2N buses in total.Despite the feedback in the network, for each of the applications considered in thisthesis, it is shown that the inference network has stable dynamic behavior and eventuallyit will converge. An energy function can be defined for each application and can be provedto be decreasing. The feedback links in the network will not cause oscillation.A complete analog inference network system consists of a conventional computer, theinference network formed by circuits or VLSI chips, and D/A and A/D converters, asshown in Figure 3.10. The computer is a monitor as well as a storage medium. Theanalog inference network is composed of a physical configuration (torus, square, etc.),and units plugged in to it. The units can be multi-functional, which have switches todetermine their functionality for the given application, or, they can be simple units for aspecific application. Sites at a unit are evaluated in parallel.The advantage of using analog units instead of digital ones is due to the simplicity ofanalog units. An analog system uses additional D/A and A/D, but it does not need a setInput signal frombus for the columnOutput signal drivesbus for the columnMonitor/Data/ResultPhysical Connection(Analog Units)Chapter 3. Inference Network Implementation^ 35Figure 3.9: (a) A square structure with multiple sets of buses; (b) connection of a unitto bus lines.of circuits for each bit of the data, whereas a digital system does. Besides, the units in ananalog system do not need a clock to synchronize their operations. A potential problemwith the above implementation is the limited driving power of the output signals. Whenthe network size is large, additional driving circuits may be necessary.Figure 3.10: An analog inference network system.Chapter 3. Inference Network Implementation^ 363.2 Discrete-Time Inference NetworkThe computational requirement for the components of the inference network is simplythe ability to perform basic arithmetical operations. A site produces an output basedon two inputs; a unit collects the site values and generates a unit output. Neither asophisticated CPU nor a big memory is needed. However, each unit does require a fairlybig communication capacity: each unit has 2N outgoing links and 2N incoming links.A discrete inference network can be built using digital components similar to thoseshown in Figure 3.7. Physical connection of units are also similar. The major differenceis, in order to update the network synchronously, a global clock is required to synchronizethe activities of each component. At the beginning of each time step, sites attached to aunit obtain inputs through its incoming links, calculate site outputs simultaneously, andthen the unit performs its unit function and produces the output for the next time step.The Connection Machine (CM) [26, 85] can be programmed as the inference network.The Connection Machine is a single instruction multiple data stream computer whichconsists of a large number of simple processors and is efficient for massively parallelcomputing. It has up to 16K, 32K or 64K processors; each processor has up to 1MBitof memory. Its processors are connected logically into an n-cube topology and supporthighly efficient n-dimensional grid communications. An algorithm can be executed onthe Connection Machine efficiently if it mainly requires regular and local communication.Non-local communication can be achieved with the trade-off in speed. If an applicationrequires more processors than the CM physically has, virtual processors can be created.The CM uses a conventional computer, such as SUN or VAX, as a front-end machine.Several high-level languages for parallel programming are supported, such as *Lisp, C*,CM Fortran, which are parallel extensions of the corresponding sequential languages. CMprovides the utility to measure computational time: CM busy time is the time the CMChapter 3. Inference Network Implementation^ 37is involved in calculation; CM elapsed time includes the time the CM is communicatingwith its front end, in addition to CM busy time.When the Connection Machine is programmed as the inference network, a processorarray can be declared in which each processor corresponds to a unit in the inferencenetwork. The network can operate in both synchronous and asynchronous modes. Eachprocessor performs site operations as well as unit operation. To obtain site values of aunit, each processor is required to communicate with 2(N — 1) other processors. Unitvalues can then be calculated within each processor. Although each CM processor hasonly four hardware links with its neighbors, arbitrary communication can be achievedfrom software level. For example, C* code[i][i]work [k][l]worksends the element [1c][1] of vector variable work to position [i][j]. The programmer doesnot need to do any hardware reconfiguration. If the two elements are not hypercubeneighbors, the message has to travel through other processors to get to its destination.However, since the cost of non-local communication increases rapidly with problem sizeN, this way of using CM is very inefficient.3.3 Mapping to the CM Using Local CommunicationUnder the situation that existing hardware does not support the operations of the infer-ence network efficiently, we propose a systolic transformation mapping of the network tothe Connection Machine. In the previous section, all the sites of a unit constantly resideon the unit. However, since there is only one processor for these sites, site values areevaluated in serial, and site evaluation normally involves non-local communications. Thehigh communication cost caused by directly programming the CM can be significantlyreduced by proper arrangement of site calculations at each unit: each unit value passesChapter 3. Inference Network Implementation^ 38through a set of units (processors) and stops at each unit to evaluate one site of the unit;each unit calculates its site values in its own order.Here are the details. The N x N inference network is mapped onto an N x N processorarray on the CM, each processor for one unit. See Figure 3.11. Each processing element(PE) has a local memory keeping the unit value. Two data flows, one horizontal and onevertical, pass though each PE constantly. Each data flow consists of N unit values froma certain row or column. Since each unit only communicates with all the units in thesame row or column, a horizontal (or vertical) data flow can be shared by all the unitsin the same row (or column). The data flows in the two directions are ordered in such away that the two inputs of a certain site arrive at a PE at the same time. Eq.(3.11-3.12)is the initialization of the data flows:h[i, (—i — j) (mod N)] := u[i, j]^(3.11)v[(—i — j) (mod N), j] := u[i, j] (3.12)where h[k, 1] and v[k,1] are the data at u[k, 1] after the initialization. It takes N timeunits for the data flows to traverse the array. In this mapping, the N sites are notphysically attached to a unit all the time, but are attached to the unit one after anotherrhythmically. For most optimization problems, the unit function is separable, as shownin Eq.(2.4), hence a PE does not need extra memory to keep any site values.One synchronous iteration is composed of N 1 steps on the CM:• form the vertical and horizontal data flows (prepare updated unit values for thesite inputs) according to Eq.(3.11-3.12);• repeat N times:— update local memory (calculate site values and revise the unit value);•• -• •• • •1• • • ...■11. ^ ^ _L _L rTT:T.TIT:^1 < j < Nh[i,j]u [i ,j]Chapter 3. Inference Network Implementation^ 39Figure 3.11: The systolic mapping of the inference network to the Connection Machine.— propagate horizontal and vertical data.The result of this mapping is that each unit only has to communicate with its fourneighbors. Table 3.1 shows that the mapping is far more efficient than directly programthe CM into the inference network, which uses non-local communication.N 4 8 16 32 64 128 256 512Direct program 0.062 0.313 0.818 1.828 5.957 15.72 133.04 1029.0Mapping 0.087 0.113 0.176 0.296 0.701 1.88 8.21 37.1Table 3.1: Connection Machine busy time (in seconds) to solve the all-pair shortest pathproblem. Direct program: the CM is simply programmed as the inference network; Map-ping: the inference network is mapped to a processor array using local communication.This mapping of the inference network to a square processor array on the ConnectionMachine makes good use of the available processors. Because of the restrictions of theChapter 3. Inference Network Implementation^ 40CM, a rectangular geometry must be declared on the CM if a systolic algorithm uses aparallelogrammic processor array. Therefore, the rectangular geometry, which completelycovers the parallelogrammic array, results in a higher virtual processor ratio (defined asthe number of virtual processors over the number of physical processors). Table 3.2 givesthe Connection Machine busy time for the inference network (refer Appendix B) anda systolic algorithm [70] to complete a matrix inversion. The inference network uses asquare processor array whereas the systolic algorithm uses a parallelogrammic one. Thetwo algorithms have similar operations. When the problem size is large, the inferencenetwork is significantly more efficient. More details can be found in [80, 78]N 4 8 16 32 64 128 256 512Inference Network 0.22 0.39 0.88 1.21 1.91 3.86 17.73 115.35systolic algorithm 0.09 0.17 0.38 0.89 2.31 8.27 37.29 210.38Table 3.2: CM busy time (in seconds) for the inference network and a systolic algorithmto solve (I — A)-1 on a 16K CM.Although the mapping of the inference network to the CM is more efficient thandirectly program the CM into the inference network, it must be made clear that thismapping is just a vehicle to utilize a piece of existing hardware. It is not the end ofthe inference network itself. As a matter of fact, the mapping introduces additionalrestrictions on the inference network: it requires the sites to be evaluated in specificorders and unit calculation to be strictly synchronized.Chapter 4Inference Network for the All-Pair Shortest Path ProblemThis chapter discusses the inference network algorithm for the all-pair shortest pathproblem. For discrete synchronous updating, the algorithm was shown to converge tothe global minimum state within 1log2 (N — 1)1 iterations'. Unlike most existing dis-crete approaches and some neural network approaches, the algorithm was also shownto converge to the optimal solution for asynchronous or continuous-time updating. Theinference network was proved to converge, independent of problem size, to the optimalsolution for the problem. It was demonstrated that the inference network not only canprovide an efficient platform to solve the shortest path problem efficiently, but also canunify many established algorithms for the problem by defining different unit and sitefunctions.4.1 The Shortest Path ProblemThe shortest path problem has the following description. Consider a densely connectedgraph with N nodes denoted as 0, 1, ... , N — 1 and a set of directed arcs A. Every arcfrom node i to node j in A is associated with a distance (or length, cost, time etc.) dipFor those node pairs without an arc in between, define their distance as infinity. Assumefurther that dii = 0 if it is not otherwise specified. The length of a path is the total lengthof the arcs on the path. The problem is to find the shortest paths between all pairs ofnodes. Rote [71] gives more formal definition of the problem using semiring theory.1 rI X1 (Z > 0) is the smallest integer greater or equal to x.41Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^42Generally, dii can be negative as well as positive. However, if a directed loop withnegative length exists in the distance graph, the shortest paths will have negative infinitelength — because the more a path repeats the loop, the shorter the path can be. Inorder to have a well defined problem, we restrict our discussion to graphs containing nonegative-length loops. (Negative-length arcs may exist, as long as any closed loop inthe graph has a positive or zero length.) A subset of these graphs are those containingundirected arcs with non-negative lengths, that is, in mathematical terms, Vi, j di; =clii > 0 where 0 < j < N — 1.Finding the shortest paths in graphs or trees is an intensively studied problem inartificial intelligence and operations research. Established discrete algorithms for theshortest path problem fall into three categories. Dijkstra's algorithm [10] (to be referredto as DIJKSTRA) is the most efficient one among the single-pair algorithms, which findthe shortest path between a specific pair of nodes at a time. There are several algorithmswhich can find the shortest paths from a specific source node to all other nodes at thesame time. Yen's [94], Spira's [73], and Moffat's[63] algorithms (to be referred to as YEN,SPIRA, MOFFAT, respectively) are examples of this kind of single-source algorithms;among them SPIRA is the most efficient. The third class of algorithms, all-pair algo-rithms, find the shortest paths between all the node pairs at the same time. This classincludes the matrix and revised matrix algorithms [13, 32, 93], Floyd's [17], Dantzig's [9]and modified Dantzig's [81] algorithms, to be referred to as MATRIX, RMATRIX, FLOYD,DA NTZIG, MDANTZIG, respectively. Systolic algorithms like Rote's [71] and Robert andTrystram's[70] (to be referred to as ROTE and ROBERT respectively) are also all-pairalgorithms. All-pair algorithms are easier to be parallelized.The shortest path problem is also solved by neural network approaches. Fritsch [19]discussed eigenvalue analysis method and proposed to use self-organizing feature mapChapter 4. Inference Network for the All-Pair Shortest Path Problem ^43and a modified Hopfield net to solve the problem. Thomopoulos [83] introduced a neu-ral network algorithm run on a network structure similar to the Hopfield net. Likeneural network approaches to other problems, these algorithms can not guarantee anoptimal solution. The problem is also solved by the connectionist approach in digitalenvironment[25]. It obtains the solution through relaxation with self-adapting step size.The shortest path problem has applications in communication, transportation net-works, integrated circuit layout, and control, etc. Applications of the single-pair shortestpath problem are common in daily life. Single-source and all-pair solutions are useful inVLSI layout etc. For example, compaction is a process in VLSI layout aiming at min-imize the layout area while meeting all the design rules. It is an optimization problembecause masks can move around as long as spacing, minimum-size, and shape rules aremet. One-dimensional compaction changes one coordinate of the masks while preservingthe design rules and not altering the function of circuits. In two-dimensional compaction,both coordinates can change simultaneously to minimize the area. Solving a single-sourceshortest path problem is one of the ways to perform a one-dimensional compaction. Thedifficulty of two-dimensional compaction lies in determining how the two dimensions ofthe layout must interact to minimize the area. One way out of this difficulty is to do twoone-dimensional compactions under some additional constraints. The test of the con-straints involves solving two all-pair shortest path problems. Lengauer [51] gives detailedalgorithms for them.4.2 Inference Network Algorithm for the Shortest Path ProblemBy giving proper unit and site functions, the inference network can be used for the all-pair shortest path problem. Each unit (i, j) (0 < i, j < N — 1) in the network works forthe shortest path from node i to node j. Its value uji is the shortest path length betweenChapter 4. Inference Network for the All-Pair Shortest Path Problem^44i and j found so far. Site 1 at unit (i, j) records a path from node i to node 1 and then tonode j. Its value so is the sum of the path lengths from i to 1 and from 1 to j. All linkshave weight 1. For undirected shortest path problem with no negative-distance path,initial value ut?) is the given distance di,. For directed graph, ut? ) are obtained fromdistances dij using Eq.(4.13). If a path from node i to i exist with positive length, theshortest path has, by common sense, zero length. If the path from i to i has a negativelength, by repeating this loop, the length of the shortest path will approach negativeinfinity.d13^i j0^i = j, dij > 0—oo i = j, di; < 0(4.13)Each site on a unit performs siii := ull + uu^ (4.14)and all the units in the network performui; := min{uih min{sig}}^ (4.15)Sites and units can be updated asynchronously, synchronously with a clock, or in continuous-time.The algorithm for synchronous discrete time, asynchronous discrete time, and continuous-time domains are given in Algorithm 1, and Algorithm 2, Algorithm 3 respectively.Algorithm 1 : Shortest Path — Synchronous (to be referred to as INFERENCE)set initial unit values according to Eg.(4.19)k := 0while (1)BEGINChapter 4. Inference Network for the All-Pair Shortest Path Problem ^45BEGIN PARALLEL for all i,jBEGIN PARALLEL for all 18121) := u(1-1) 41 -1)END PARALLELu(iV^n^{0 < < N-11 4111)}}END PARALLELif Vi, j u! ) = u 3 -1) break;k:=k+1ENDAlgorithm 2 : Shortest Path — Asynchronousset initial unit values according to Eg.(4.13)REPEATrandomly pick a unit (i,j)BEGIN PARALLEL for all 1Sip :=^ut;END PARALLELuji :=^min^{so}}o <1<N-1UNTIL no unit value changed when any of the units is picked.Algorithm 3 : Shortest Path — continuous time• Set initial unit values according to Eq. (4.13);• All sites at each unit perform function sip := uit u4;• All units perform^:= min{ui;^min^{siii}};'0<l<N-1• Sites and units will eventually stabilize and each unit value represents the distanceof a shortest path.Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^464.3 Algorithm ConvergenceThere are two important considerations in using the inference network for the shortestpath problem. First, will the network always converge to the desired solution? Second,what are the parameters or conditions that affect the convergence rate of the network?The answer to the first question is 'yes', because at the k-th iteration, all paths containingat most 2" arcs take part in the minimisation; and the local minimisation performed ateach distinct unit drives the network to a global minimum state — which is the desiredsolution. The convergence rate of the network is O(log2 N).4.3.1 Synchronous Algorithm Converges in at most flog2(N — 1)1 IterationsSince all the shortest paths containing at most 2" arcs can be found in p iterations and ashortest path in an N-node graph contains at most (N — 1) arcs, the inference networkconverges to the solution in at most flog2 (N —1)1 synchronous iterations.In Algorithm 1, the inference network algorithm uses distances which is the lengthof one arc, in the first iteration to calculate the shortest paths containing at most 2 arcs.In the second iteration, it uses the shortest paths containing 1 or 2 arcs to calculateshortest paths containing at most 4 arcs. Repeating this procedure p times, all theshortest paths containing at most 2" arcs can be found. In other words, a shortest pathcontaining q arcs can be found in (log e iterations. Since it is assumed that there isno negative loop in the graph and, by common sense, a shortest path does not contain aclosed loop with non-negative length, shortest paths are all open paths. If an open pathcontains at least one infinite-distance arc, then the length of the path, which is infinity,can be determined right away. If there are a number of open paths between node i andj which are composed of finite-distance arcs only, the k-th of them contains /k(i,j) sucharcs, then the number of iterations required, to find the shortest path between nodeChapter 4. Inference Network for the All-Pair Shortest Path Problem^47i and j falls into range:I log2( qn /k(i, j) } ) 1 < ni; 5_I log2( mr {^} ) 1The algorithm completes when the shortest paths between any two nodes are found,therefore, the number of iterations required, n, to find all shortest paths is given bylog2 ( max { min { /k(i,j) } } )1^n < I log2 ( max { max { lk(i,i) } } ) 1j^k j^kFor an N-node problem, an open path consists of at most N —1 arcs, that ismax { max { /k (i,j) } } < N — 1s,^khence all shortest paths can be found in at most ilog 2 (N — 1)1 iterations. Usually, thenetwork requires fewer iterations to converge than this upper bound. The number ofiterations required depends solely on the maximum number of arcs in an open path inthe graph; it is not directly related with graph density.4.3.2 Convergence of the Asynchronous AlgorithmAll the algorithms we found in literature for the shortest path problem are synchronousalgorithms. However, it can also be proved that using the same updating rules asyn-chronously, the inference network Algorithm 2 converges to the global minimum state aswell. For asynchronous updating, the unit which has the least chance to update is mostlikely the one which determines the completion time.In synchronous evolution, utk. ) is the shortest path length after k-th iteration, and itgives the shortest path among the paths from i to j which contain at most 2k arcs each.For asynchronous updating, we need a record nii to indicate that all the paths from i toj which contain no more than^arcs have been considered in minimizing uii. Some,but not all, paths contain more than^arcs may also contribute in the minimisation.Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^48For any n4, u4 in the asynchronous case is less or equal to u; °g2 n'31) in the synchronouscase. Site and unit functions for the asynchronous inference network are the same as insynchronous evolution for each individual unit and site. The initial value of n q is always1, and it is updated according to= min{nit nu}In each iteration, a random unit (i, j) is picked to be minimize. In addition tocalculating and updating sip and u4, n12 can be recorded. Paths are all minimized ifVi, j nu > N — 1. If units are updated one by one systematically, and one round ofnetwork updating starts only when all the units are updated in the previous round, thenit takes N(N — 1) flog2 (N — 1)1 iterations to obtain the shortest paths for all node pairs.If updating is in a random order, minimizing a unit (i, j) with its n4 greater than thatof some other units just makes the updating process longer. Hence for random updating,the number of iterations required is at least N(N — 1) [log 2 (N — 1)1, and depends onthe unit (i, j) which increases its n4 value slowest. The asynchronous algorithm willeventually converge, that is, arrive at the state of n 11 > N —1 bpi, j, if each unit has achance to update sometime.It is worth point out that n4 is used here to prove the convergence of the algorithm.However, it is not necessary to calculate them if only the shortest paths are to be found.4.3.3 Convergence of the Analog AlgorithmIn the analog shortest path algorithm, each unit performstz4 := min{u4 , min {skit}} (4.16)o <I<N-1Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^49Taking into account the transient period and assuming that each unit is a first-ordersystem, the behavior of a unit can be described bydd t3 =^ 0 <^——AiJui; + Ai; minfuii ,^min^{soil^(4.17)Nwhere constant^> 0 is the open-loop pole of unit (i,j) which partially determines thespeed at which ui; changes.The network energy for this problem can be defined as2E = E E (u, - min{uij,^min {si./}})0 <1<N-1The global minimum state of the network is signified by E = 0, that is V i, j,^=min^{so}. At this state, unit values will not change any more and the network0 < / < N —1is stabilized. The following is a proof that the network will arrive at a state sufficientlyclose to this global optimal state as the elapse of time.• From Eq.(4.16), ui; >^min^{skid;0 <I<N— 1min^{so}, ;Ld• If ui; =^ = 0, the unit keeps its value;0 <I<N-10<l<N-1^--1.1- = —Ai; (ui;^ ) < 0, whichd t^0 <1 < N-1—^min^Isiii}}• If ?Ai; >^min^{so}, d udrives uii towards^min^{sill};0 <I < N — 1• The farther ui; is from^min^{sip}, the faster ui; decreases. Therefore,0 < / < N — 1>0 ; d E < 0 when the behavior of a unit is governed by Eq.(4.17):d td Ed E d ui;dt = du dt,^j^s.,{sill}} d ui;= 9 N-- \"--‘.. ^Ls ttij — MillIttij,^milli j 0<1 <N-1^j d t‘..., 1 ( d ui;1 2 <0= —2 F.^‘--"^A • k d t^i ^4Chapter 4. Inference Network for the All-Pair Shortest Path Problem^50du- E^E. E where Ai; > 0. when ui; >^min^{so}, --!1<dt- JA; dt dt^ dt0 < 1 < N-1Ce ti*^ d2 E0, and 7-, > 0, hence^> 0; d E^ dui^ j,^i= 0 holds only when —at = 0 Vi, , that s, the network has settled down anddt dtui; =^min^{so} Vi, j holds (all the paths are optimized);0 <1<N-1dE• if^= 0, then ui • =^min^{so} Vi, j, which means E = 0;dt 0 < < N-1dE E > n^< 0, and d-:Tif- > 0 show that the network energy is a decreasing convexdt —function which has to approach its stable value zero with the elapse of time. Theenergy decreases quickly at the beginning, and the decreasing rate slows down whenthe value of E is close to zero;• when lirn •51- = 0, we have Jim uij = min{u4,^min^{soil^j, that is,t^t 00^0 < I < N-1the network will eventually settle down at the global optimal solution.4.4 Relationship to other AlgorithmsThis section addresses the relationship between the inference network algorithm and otherexisting algorithms, such as A*, MATRIX, FLOYD, and DANTZIG. The essential differencebetween the inference network algorithm and other algorithms is that the former is de-rived directly from the topology of the inference network. It is a parallel algorithm withthe information of how to be executed on a parallel network. Those existing algorithmsare sequential algorithms which, by themselves, do not bear information about how tobe parallelized.4.4.1 A* AlgorithmA* [64] is a heuristic search algorithm. It maintains two lists during the search; theopen and closed lists contain nodes which are to be expanded and have been expandedChapter 4. Inference Network for the All-Pair Shortest Path Problem ^51respectively. To determine the candidate to expand, it uses a heuristic guess of the lengthfrom the current node to the goal node. In searching for the shortest path from s to t,the algorithm chooses a node with the least estimated length f(s, t, n) to expand, wheref(s,t,n) = g(s,n) it(n,t)^ (4.18)and g(s,n) is the length of the partial solution from s to n, h(n, 0 is an estimation ofh(n, t) — the shortest path length from n to t.There is a remarkable degree of computational correspondence between INFERENCEand A*. If n is a node in the open list, unit values u(s, n) and u(n, t) in the inferencenetwork correspond to g(s, n) and h(n, t) in A*, respectively. Of the four values, g(s, n) isshortest path length between s and n; u(s, n) and u(n, t) are path lengths which may befurther minimized; h(n,t) is heuristic information. One unit value in the inference net-work serves as the heuristic information of another unit. Both the heuristic informationin A*, it(n, t), and the heuristic information used in INFERENCE, u(s, n) and u(n, t), haveto be greater than the shortest path length to ensure a correct solution. The differenceis that the heuristic information it in A* has to be provided prior to the search; whereasthe inference network updates the shortest paths between two nodes in each iteration,and the results in one iteration are used in the next iteration as heuristic informationuntil a global minimum is reached.4.4.2 MATRIX AlgorithmMATRIX is an all-pair algorithm which can be expressed in matrix operations. It definesa matrix binary operation called minaddition:W A 0 B ['Di; = qn(aik bki)] where A [aij], B = [kJ]^(4.19)If D1 = D is the distance matrix [4], the elements of DN-1 gives the shortest pathChapter 4. Inference Network for the All-Pair Shortest Path Problem ^52lengths, where DN-1 is the minaddition of N — 1 D's:DN-1 =D0D0D0.-•0D0D^(4.20)The divide and conquer version of MATRIX [8, 47] uses the fact that minaddition isassociative:D4 =DODODOD = 0 D2^(4.21)With this feature, only rlog2 (N — 1)1 0-operations are required.INFERENCE is also intimately related with MATRIX, although the former is deriveddirectly from the inference network platform. It is not difficult to observe that, if wedefine another matrix binary operation 1 asW = A_LB = [wi; = min{ a;2, bii}] where A = [ail], B = [kJ]^(4.22)then Eq.(4.14-4.15) can be effectively represented asDk = Dk-1 1(Dk-1 Dk-1 )^where k = 1, 2, ... , flog2(N —1)1^(4.23)Under the assumption of no negative-length loops, dii = 0, therefore Eq.(4.23) is equiva-lent to the divide and conquer version of MATRIX. However, the matrix algorithm itselfdoes not carry information of how to implement it on parallel processing arrays, whereasthe inference network provides a platform to solve the problem in parallel whose updatingequations are mathematically equivalent to the divide and conquer scheme.4.4.3 FLOYD AlgorithmFloyd's algorithm (FLOYD) constructs optimal paths by inserting nodes, when appro-priate, into more direct paths. After k iterations, function fk (i,j) is evaluated to theoptimal path length between i and j, where the intermediate nodes belong to the set ofgk(17 j)Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^53nodes {0,1, ... , k}. Its recurrent relation isfk(i,j) =^fk-i(i, k) + fk-i(k,i)}^(4.24)with boundary condition fo(i,i) = dij.The difference between FLOYD and INFERENCE is that the former considers onedistinct intermediate node at a time and iterates N times; while the latter consider allintermediate nodes in each iteration and iterates log 2 (N — 1) times.4.4.4 DANTZIG and MDANTZIG AlgorithmsDantzig's algorithm (DANTZIG) constructs shortest paths using arcs whose nodes num-bered between 0 and k at the k-th iteration. gk (i, j) (i, j < k), the shortest path lengthfrom i to j in the k-th iteration, is obtained from gk_1(i,j), dik and dki. Dreyfus [11]gives the algorithm in recurrent form:= min^Igk_1 (i,l) + g_1 (1, k)}^(i = 0,1,... ,k-1),0<1<k-1 (4.25)= 0<1<mi k-1 {g_i (k, 1) -I- gk_1 (1, j)}^(j = 0,1, ... , k — 1), (4.26)= min{0,^min^{gk (k, 1) + gk(l, k)}} (4.27)0<1<k-1= min{gk_i (i, j), gk(i, k) + gk (k, j)}^(i, j = 0,1,...,k —1) (4.28)where g_i (i,j) = dij for all i and j. The total number of basic operations required is2N2 (N — 1).Modified Dantzig's algorithm (M DANTZIG) avoids unnecessary operations by furtherdecomposing Eq.(4.25-4.26) into k recurrent steps. Define g2(i, k) = g2 (k , j) = co,A(i, k) = 4-1 (i, k) and gk(k, j) = (k, j) can be obtained from:k)^ VO < i < k — 1 dik > glk-1 (1, k)k) =^ (4.29)min{g1-1 (i, k), gk_1 (i, 1) + dlk} VO i k — 1 due < glk-1(1,Chapter 4. Inference Network for the All-Pair Shortest Path Problem^54gii(k,j) VO < j < k — 1 dki > glk-1 (k , 1)minfik-1 (k,j),dv+gk_1 (1,A} `d0j < k — 1 dki < glk-1 (k,l)A significant amount of operations can be saved when dik > glk-1 (1, k) or did >is satisfied.While the INFERENCE considers all node pairs and all intermediate nodes in eachiteration, DANTZIG only considers node pairs whose indices are less than k in the k-th iteration. The former requires at most (loge (N — 1)1 iterations, whereas the latterrequires N iterations.4.5 Unifying Some Established Algorithms Using an Inference NetworkMany all-pair shortest path algorithms can be implemented in the inference networkby defining corresponding unit and site functions. This section will summarize thesefunctions. For more details, refer Appendix D.2.4.5.1 Floyd's AlgorithmThe inference network implementation for FLOYD needs N 2 units. The unit and sitefunctions are defined as:^(k)^..(k-1) (k)1= min{u13 9 31 fic^(k)^(k-1)^(k-1)^S ilk Us.k^Ukj(4.31)(4.32)Initial unit values are u17 1) = dij . After N steps, u!7-1) gives the shortest path lengthfrom node i to node j. Compared with INFERENCE, FLOYD only updates one specificsite (indexed k) at each unit for each iteration; but it takes N instead of flog 2(N — 1)1iterations to find all the shortest paths.glk (k,j). (4.30)Chapter 4. Inference Network for the All-Pair Shortest Path Problem^554.5.2 Revised Matrix AlgorithmRevised matrix algorithm (RMATRIX) updates matrix elements asynchronously to avoidunnecessary operations. Two distinct processes are defined: (i) a forward processc40 =^min^{c4r)^}0<l<N-1flwhere p = j < 1j > 1 =(4.33)and (ii) a backward process46) = 0 < /n<linN — 1{c1(1) + dinf 1 < j^If 1< iwhere p =^q= (4.34)b 1> j^b l> iwhere 4.11) is the initial given distance. Yang [93] shows that the shortest paths can beobtained by one forward process followed by on backward process, one backward processfollowed by one forward process, three forward processes, or three backward processes.The inference network for RMATRIX contains N2 units. Initially, 49 = dip Sitefunction for both forward and backward processes are the same:sill)(k-1)^(k-1)=^ (4.35)Unit functions for a forward process is:min^{4,,i} when k = f (i, j)0<l<N-1(k) =^ (4.36) (k-1) otherwisewhere1max{i,j}(m^j} — 1)ax{i,^. . .}f (i, i) -^^+ minft,j-1-2Unit functions for a backward process is:kmin^{s (o) } when k = b(i,j)(k)-0 <1 < N —1 - —(4.37)(4.38)(k-1)tiii otherwiseChapter 4. Inference Network for the All-Pair Shortest Path Problem ^56whereb(i, j) (max{N — i , N — j} — 1)(max{N — N — j} — 2) min{N— i,N— j} (4.39)2The network updates two units at a time. To obtain all the shortest paths, the units ofthe network have to be updated in a specific order for N(N — 1) (one forward and onebackward process) or 1N(N —1) iterations (three forward or three backward processes).4.5.3 Dantzig's and modified Dantzig's AlgorithmsSize of the inference network for DANTZIG is N x N. The unit function for DANTZIG is(2k) = min{u!r 0<-1) ,^mink-1 "{sr}} j < i = k or i < j = k< (2k-1)Uij otherwisemin{ ur,s rui;^1)}^i < k and j < k(2k+i) min{0,^min^{42;`+1) }} i j = k0<1<k-1(2k)uii^otherwise(4.40)(4.41)where 0 < k < N —1, 4T 1) = d12 , and st:1 = ulT -1) .1411) V 0 < r < 2N. In the 2k-thiteration, 2k units are updated, k sites at each unit are involved in the minimisation; inthe (2k + 1)-th iteration, a total number of k2 +1 units with a total of k(k 1) sites areminimized. The total number of synchronous iterations required is 2N.Corresponding to the 2k-th iteration in DANTZIG, the inference network implemen-tation for MDANTZIG requires k iterations. In each of these iterations, 2k units areupdated; and at each of these units, only one site is activated. So MDANTZIG requires atotal number of 1.N(N + 1) iterations.Chapter 4. Inference Network for the All-Pair Shortest Path Problem^574.8 Comparison with Established Algorithms4.8.1 Parallel Processing with Optimal HardwareA variety of algorithms have been proposed for the shortest path problem, most of whichare initially designed for sequential computation. It is interesting to observe that inmany cases, an efficient sequential algorithm does not naturally lead to an efficient one inparallel. In this section, we will compare the completion time for various algorithms underthe assumption that each algorithm can be executed on the hardware that optimizes itsperformance. In the next subsection, we will compare the efficiency of some algorithmson a specific hardware — the Connection Machine. It is assumed that our goal is to findall the shortest paths between any two nodes.For single pair algorithms, such as DIJKSTRA and A*, N(N — 1) processors can beused. Each processof works for one shortest path between two distinct nodes. Since theyare initially designed for finding the shortest path between a single pair of nodes, nointer-processor communication is needed. The completion time for the all-pair algorithmis determined by the processor which takes the longest time to find the one-pair solution.Single-source algorithms, for example YEN, SPIRA, and MOFFAT, are designed to findthe shortest paths from one specific node to all other nodes. N processors can be usedto find all the shortest paths. There is no interprocessor communication among the Nprocessors. The completion time is determined by the processor which has the heaviestwork load.All-pair algorithms are designed to find all shortest paths simultaneously. N2 proces-sors can be used. Rich communication between the processors is necessary. For FLOYD,MATRIX, RMATRIX, DANTZIG, MDANTZIG, in each iteration, each processor either per-forms an addition and a comparison or does nothing. The completion times is determinedby the number of synchronized iterations they require.Chapter 4. Inference Network for the All-Pair Shortest Path Problem ^58Systolic algorithms, for example ROTE and ROBERT, are to be executed on theirown systolic structures. Both the number of processors required and the number of timeunits required are determined by the systolic array.The synchronous inference network algorithm requires N(N — 1) processors. Eachprocessor has 2(N — 2) outgoing and 2(N — 2) incoming links, and performs the unitfunction and the simultaneous site functions. At most [log2 (N — 1)1 iterations are re-quired.Table 4.3 shows the numbers of processors required, number of links for each processor,and theoretical completion times for various algorithms. to is used to stand for the timeto perform a basic operation (addition or comparison), and tc is the time for a processorto send or receive a basic datum from another processor. Under the assumption that eachalgorithm can have its optimal hardware, to and tc are the same for all the algorithmsand are invariant with problem size N. The inference network algorithm has the shortestcompletion time since it requires the least number of synchronous iterations.The inference network can also solve the shortest path problem in continuous-time.The neural network approach [19], the Hopfield network approach [83], and the connec-tionist approach [25] all use gradient descent in decreasing their network energy, thereforedo not guarantee the solutions to be optimal. On the contrary, the inference networkalgorithm (Algorithm 3 on page 45) is derived from the simple intuitive procedure offinding the shortest paths, and it is an analog algorithm that guarantees an optimumsolution.4.6.2 Performance on Connection MachineNow let us compare the efficiencies on a specific kind of computer — the ConnectionMachine. Table 4.4 shows the number of time units required by some all-pair shortestpath algorithms when using 0(N2 ) processors, assuming there are only four links on eachChapter 4. Inference Network for the All-Pair Shortest Path Problem ^59Algorithm # of Proc. Link/Proc. Execution TimeDIJKSTRA N(N — 1) 0 0(1N2t0)A* N(N —1) 0 < 0(1N2t.)YEN N 0 3 oN toSPIRA N 0 N2log2 Nt,,MOFFAT N 0 Net,MATRIX N2 4N rlog2N1(2Nto + tc)RMATRIX N2 4N 2N2(2Nto + tc)FLOYD N2 4N N(2to + tc)DANTZIG N2 4N 2N(2to + 4)MDANTZIG N2 4N < 2N(2to + tc)ROTE (N + 1)2 6 (7N — 2)(2t. + to)ROBERT N(N + 1) 4 (5N — 2)(24, + tc)Algorithm 1 (page 44) N(N — 1) 4(N — 2) < flog2N1(2t0 + tc)Table 4.3: Execution time for shortest path algorithms executed on their optimal hard-ware configuration. a in RMATRIX is 2 or 3.processor. Each time unit is approximately the time to perform one addition and oneminimisation and to propagate two data.Algorithm ROBERT INFERENCE-CM MATRIX FLOYD DANTZIGtime units 5N — 2 < Nlog2 (N — 1) N2 N2 N2Table 4.4: The number of time units required to obtain the shortest paths when imple-mented on the Connection Machine.ROBERT, FLOYD and DANTZIG algorithms consider one specific node at each iter-ation, hence their processor operations are time-dependent. INFERENCE-CM (mappingof INFERENCE on the Connection Machine, refer Appendix D.1) and the MATRIX sharethe same kind of time-independent PE operations. All but ROBERT occupies a squareprocessor array on the Connection Machine. ROBERT occupies a rectangular geometrywhich is twice as large as the other's square array. Considering all these factors and theChapter 4. Inference Network for the All-Pair Shortest Path Problem ^60number of time units required, the most efficient algorithms should be INFERENCE-CMor ROBERT. The performance of INFERENCE-CM, in comparison with ROBERT, is givenin Table 4.5. For a randomly generated distance matrix and N up to 512, INFERENCE-CM gives shorter CM busy times except for N = 4. The worst-case CM busy times forthe inference network (iterate exactly (log2 (N — 1)1 times), are close to but still shorterthan that of ROBERT.Within the problem size we tested, INFERENCE-CM outperforms ROBERT. Thereare three reasons for this: processor operation for INFERENCE-CM is time-independent;INFERENCE-CM has a smaller virtual processor ratio; and INFERENCE-CM usually takesless than iN log2(N — 1)1 time units. When N increases further, however, according tothe theoretical prediction, ROBERT will eventually be faster than INFERENCE-CM.N 4 8 16 32 64 128 256 512random 0.0874 0.1131 0.1759 0.2958 0.7012 1.8825 8.2125 37.1443worst 0.0874 0.1181 0.1759 0.3248 0.7971 2.1143 8.8626 46.9628Robert 0.0828 0.1688 0.3753 0.8965 2.3678 8.0302 36.9367 209.620Table 4.5: CM busy time for INFERENCE-CM (random), the worst case of INFER-ENCE-CM (worst), and Robert and Trystram's algorithm (Robert) to solve the shortestpath problem given random distance graphs.4.7 DiscussionAlthough many algorithms have been proposed for the shortest path problem, the infer-ence network algorithm has shown a special feature — it works in synchronous discrete-time, asynchronous discrete-time, and continuous-time domains.In synchronous discrete-time, the algorithm was shown to take the least number ofiterations and was mapped to the Connection Machine efficiently. For a large systolicChapter 4. Inference Network for the All-Pair Shortest Path Problem^61array, it is a technical headache to synchronize all the processing elements. The infer-ence network algorithm was proved to converge in asynchronous updating as well as insynchronous mode. Therefore, global synchronization was not necessary. Simulation ofthe asynchronous algorithm was conducted and convergence was guaranteed as long aseach unit has a chance to update.The inference network was also proved to work in continuous-time using an analogcircuit. The analog parallel implementation was shown to solve the N-node problem in aspeed determined by the time constants of the first order system. A physical continuoustime inference network for a 10-city shortest path problem is being built.Although neural network algorithms can also solve the problem using analog tech-niques [83, 19], they do not guarantee an optimal solution, due to the use of gradientdescent. The inference network algorithm was proved to produce the optimal solution inboth discrete and continuous time domains for all problem sizes.Chapter 5Inference Network for the Transitive Closure ProblemThis chapter outlines the application of the discrete and continuous-time inference net-work to the transitive closure problem. Simple binary and operation is required at eachsite; and each unit performs multi-input or operations. The time complexity of the al-gorithm was shown to be bounded by log 2 N. The algorithm was proved to converge incontinuous-time domain as well in discrete-time. It is demonstrated that the convergenceis independent of the problem size, and the optimal solution is guaranteed. Theoreticalanalysis and numerical simulation of the circuit implementation was conducted. The al-gorithm was shown efficient and easy to implement. Using basic logic gates, the solutionwas obtained in nanoseconds range.5.1 The ProblemThe transitive closure of a graph indicates whether two nodes in the graph are connectedby a path. A graph G has a set of nodes V, and a set of arcs E connecting some ofthese nodes. A directed graph G can be represented by its connection matrix A = [a;J]in which = 1 or 0 means there is or is not an arc from node i to node j. The graphG*, which has the same node set as G, but has an arc from v to w if and only if there is adirected path from v to w in G, is called the transitive closure of G. G* can be similarlyrepresented by its connection matrix A*.Many systolic algorithms were proposed for the problem [71, 70, 52, 40, 39, 46, 65]which require about N2 processors synchronized with a clock to perform operations in62Chapter 5. Inference Network for the Transitive Closure Problem^63pre-defined orders. Their time complexities are usually around 5N to 7N time units.Tamassia [82] proposed an algorithm to solve the problem in O(log N) time using Nlog Nprocessors in exclusive-read exclusive-write PRAM model. Toptsis [84] introduces a par-allel algorithm for highly scalable multiprocessors. If a perfect hashing function is avail-able, the speedup over sequential algorithm is proportional to the number of processors.Wang [88] proposes a constant time algorithm which solves the problem on N6 processors.Transitive closure is a basic problem in artificial intelligence, and have many appli-cations in database management systems, redundancy elimination, image labeling, andgraph theory, etc.The inference network algorithm for the problem can be executed in both discrete andcontinuous time domain. In its discrete time version, the algorithm is straightforward andthe time complexity is bounded by log2 N. Compared with other discrete-time transitiveclosure algorithms, the inference network one is not superior. However, the power of thealgorithm is in its continuous-time implementation. For the continuous-time version, thenetwork is composed of simple logic gates, and its completion time is in nanosecondsrange. The network can be implemented using NAND gates. Simulation of this NAND-gate circuit is given. The network is easy to assemble; clock and global synchronizationare not needed.5.2 Unit and Site FunctionsThe inference network for the transitive closure of a directed graph has N(N — 1) units,each unit has N — 2 sites. All unit and site values fall into range [0,1]. u(i, j) = 1indicates that a path from node i to node j has been found; u(i, j) = 0 otherwise.Similarly, s(i, j, 1) = 1 (or s(i, j,1) = 0) shows a path from node i to node 1 then to nodej has (or has not) been found. An initial unit value is 1 if the two corresponding nodesChapter 5. Inference Network for the Transitive Closure Problem^64are connected by a direct arc in the graph. All link weights are 1. The sites and unitscan be updated asynchronously, synchronously with a clock, or in continuous-time. Inthe discrete version, the site and unit functions can be as simple as the following: eachsite performs binary ands(k) (i,j, I) ^VO < i j 01 < N^(5.42)and a unit performs (N — 1)-input oru(k)(i , j) = u(k-1)(i , j) (11.9(k) ( i, j,^ (5.43)In the continuous time version, the following site and unit functions may be used (referFigure 5.12):1^ut (i,^> Vh,th(1, j) > Vh,t > tO^Tsst(i,i,^= t—tn (5.44)1)tit(i,^> vh, tti(1,i) >^t < to0^ut (i,l) < vh or ut(1,i) < vhwhere 0 < vh < 1 and to is the time when both ut (i, I) and ut(/, j) just exceed Vh 1^ut-6t(i,i) = 1 or t > to + ruut(i,i) = 1=41^31 st (i,j, 1) > vh, t < to + Tuu (5.45)0^V1 St(i 1 :7, 1) .5_ vhwhere to is the time when the first site value exceeds vh.5.3 Convergence ConsiderationSince both unit and site functions are non-decreasing, and their values are upper-bounded,the network must converge eventually. For the discrete version, if we define an N x N ma-trix U(k) = [u(k)(i, j)] and u(k)(i, i) E 1 Vi, k, Eq. (5.42-5.43) can be effectively expressedbyA u(k-1) (1, j) )^(5.46)U(k) = U(k-1) 0 U(k-1) = [^u(k-1)(i, 0^Chapter 5. Inference Network for the Transitive Closure Problem^65to^ to4T, ^ 1*— TliI I^II Vt.-U(0) = 1Itit(i, 1) < Vh I Vt(i, 1) > Vh^Vi St(i li' l) < Vh1 31 st(i, j,1) > vhand / or 1^and and ut (i, j) = 0 1thg (/, j) < vh 1 ut(1,j) > Vh(a) (b)Figure 5.12: (a) Site and (b) unit functions for the transitive closure problem.which is equivalent tou(k)(i, j) = u(k-1) (i, j)^V^ (u(k-1) (i, I) A u(k-1) (1, inVI0i0jIf i ---■ j is an arc in the original graph G, define u(°)(i, j) = 1. After the first iteration, ifu(1)(i, j) = 1, there is a path from i to j composed of one or two arcs in G. In the seconditeration, all directed paths containing at most 4 arcs are found and the correspondingunit values are updated to 1. Repeating this procedure p times, all the paths containingat most 21' arcs can be found. In other words, a path containing q arcs can be found inat most rlog2 (q)1 iterations. Since the longest open path in an N-node graph containsat most N — 1 arcs, all the paths between any two nodes can be found in at most niterations, wheren = rlog2 (N — 1)1^ (5.47)and the corresponding unit values in U(n) contain the solution — U(n) equals to theconnection matrix of transitive closure G*. Therefore, the number of iterations requiredfor a synchronous discrete network to arrive at the solution is upper bounded by Eq.(5.47).Eq.(5.44-5.45) for the continuous-time version are similar to Eq.(5.42-5.43), exceptthat the transition periods from value 0 to 1 are taken into account. The convergencerate is now determined by the turn-on time r, and ru as well. Since r, and ru for differentChapter 5. Inference Network for the Transitive Closure Problem ^66sites and units can be different, unit values do not change simultaneously as they do inthe discrete version. To measure the 'distance' between a network state and the finalstable state, we can define the following network energy E(t):E(t) = EE(1- max fut (i, j), ut(i, * ut (1,j)})2^(5.48)joi^V/00jThe solution corresponds to the global minimum state, where E(t) equals to the num-ber of 0's in the connection matrix of the transitive closure G*. Since 0 < ut (i, j), ut(i,1)*ut(1,j) <1E(t) EE(1-ut(i,j))2^(5.49)i0isincedE(t)^OE(t) dut(i,i)^5. )50E L^dt^. . . aut(z 1)^dt (JO.ago < ^a EE(1-ut(i,i))2^aut(i,i)^out(i, j)^i#i= —2(1 — ut(i,i)) 5. 0^(5.51)hencedE(t)^ dut(i,j) EE —2(1 — ut(i,j))^ (5.52)dt dtEq.(5.52) shows that the network energy does not increase as long as au--i-Ezdude > 0 — unitvalues do not decrease, which is guaranteed by Eq.(5.44-5.45).Now we will see that the network has to arrive at the correct solution after Eq.(5.44-5.45) have been applied. At the minimum state of the network,dE(t) 0dtholds. In Eq.(5.50), since 8Elt) dt < 0 for all i and j, Eq.(5.53) is equivalent to^atte(i,j) ^—Vi,j OE(t) = 0 or dut (i,j) 0Out(i,..7)^dt(5.53)Chapter 5, Inference Network for the Transitive Closure Problem^67that is, from Eq.(5.51),dui(i,j) Vi,j ut (i,j) = 1 ordtAt this stable state, a unit either keeps its original 0 value or has become 1. All the sitevalues are also either 0 or 1. Since Eq.(5.44-5.45) were applied before the stable statewas achieved, the following statements are true for the stable state (t = oo):1. uo(i,i) = 1^ucc.(i,j) = 1;2. uco (i, /)^1 A u.„(1, j) = 1^(i, j) = 1;3. uo(i, j) = 0 A V1, t (ut(i, < vh or ut(1,j) < vh)^u.(i,j) = 0.From the first two statements, by mathematical induction, we know that for all i and jwith a directed path in between, u,,(i,j) = 1; from the last statement, we can concludethat if there is no directed path from i to j,^= 0, or in other words, for all i andj with uc,o(i,j) = 1, there is a path from i to j. According to the definition, u cc (i,gives the solution to the transitive closure problem. The above analysis also shows thatunit function can be any non-decreasing monotonic functions.A computational framework for solving the transitive closure problem can be formu-lated as follows:1. Construct the N(N-1)-unit continuous-time inference network, where the dynamicbehavior of each individual unit and site is governed by Eq.(5.44-5.45).2. The network has a complex but regular interconnection structure, where each unitsends its output to a set of 2 * (N — 2) units, and receives at the same time theoutputs from this same set of units at its N — 2 sites.3. Initialize the network units with u o(i, j) = 1 if there is a direct arc from i to j ingraph G. Otherwise, set uo(i,j) = 0.Chapter 5. Inference Network for the Transitive Closure Problem ^684. The network will converge to a global minimum at a rate depending on the transi-tion time of each unit. The converged output j) of unit (i,j) corresponds tothe element ct7i in the connection matrix A*.5.4 Implementation Using Logical CircuitsAlthough the Connection Machine can be used to implement discrete-time inferencenetwork algorithms, a simple special purpose inference network can be built for thetransitive closure problem, with NAND gates as its major components. From Eq.(5.42-5.43), we can see a unit performsu(i,j) =^j) V (Vs(i, j,1))^u(i,j) A (As(i,j, /))V/^ VIwhere s(i,j,1) u(k-1)(i,l) A u(k-1)(1, j)A 2-input NAND gate, for example the kind in a 74LS00, can be a site, refer Fig-ure 5.13(a). It has a simplified mathematical model of Eq.(5.54) with turn-off time Ts ,see Figure 5.14(a). An (N-1)-input NAND gate with a mathematical model of Eq.(5.55)can be used for each unit, see Figure 5.13(b) and 5.14(b). For a small-size problem withN < 15, a 13-input NAND gate 74S133 can be used as a unit. If N is large, a unit mayconsist of several m-input NAND gates, based onu = si V s2 V Vsm=siA As„, Vsm+1A- As2m V V e Asm= :11A - Agm 3m+1A- A 79- 2 m • • • • • • A .3 mSince each site has two inputs coming from two distinct units, the fan-in capability of thegates is sufficient. However, the output of a unit has to go to 2(N — 1) sites, a regularNAND gate may not have large enough fan-out for a unit. If this is the case, additionaldriving gates may be used at the output end of units.Chapter 5. Inference Network for the Transitive Closure Problem^69ut(i, j)st (i, j, 1)l7ijtit(a) (b)Figure 5.13: Essential components for a unit.The following equations describe the state of the inference network. Site gates perform0 ut (i, I) > vh,ut(1,i) > vh, t > to -I- rsst (i, j, 1) =1 (5.54)I)^t^tout(i,^> vh,ut(1,j) > vh,^<^+ Ts1 tit(/'^< vh or ut(1,i) <where to is the time when both ut(i, 1) and ut(/, j) just exceed vh — the minimum valueto be considered as 'high', and Ts is the turn-off time of the NAND gate. Each unit isgoverned by 11^ut_st (i, i) = 1 or t> to + ruut(i, j) =^1.- - 3/ st(i, j, /) < vi, t < to + T.1-.0^V1 st(i,i,l) >(5.55)where to is the time when the first input value goes below vi — the maximum valueto be considered as 'low', and ru is the total effective turn-on time of the entire unit.Eq.(5.54-5.55) are very similar to Eq.(5.44-5.45) except sites produce s t(i,j, 1) instead ofst (i,j, /), which enables the use of popular NAND gates. With a procedure similar tothat in the last section, it can be proved that the solution given by this implementationis also correct. The transition time for each unit and site may be different. The onlyrequirement to guarantee a valid solution is that during the transition period, the outputof a gate changes monotonically.In order to give an idea of the completion time, let us assume the transition time foreach site and unit are Ts and r. respectively. A site gate takes (1 — vl)T, time to turn= Vhru1Vh1t2(b)Figure 5.14: Site and unit time delay.0 0r- TuChapter 5. Inference Network for the Transitive Closure Problem ^70off, and a unit gate takes vhru time to turn on, refer Figure 5.14. If Ts or ru are identicalfor all sites and units, the total completion time for the network to get the solution isabout flog2(N — 1)1((1 — v1)r, vhru). Use the inference network described above foran N = 10 problem, with 180 74LS00 (a total of 720 2-input NAND gates), 90 invertersin 74LS02, and 90 74S133 (13-input NAND gates, only 9 inputs of each gate are used),the network is expected to arrive at the solution within 5Ons. Here we use the typicaltransition time for 74LS00 and 74S133 as 9.5ns and 3ns respectively, and vh = 0.6 andvi = 0.2 respectively.This implementation overcomes a major limitations of systolic approaches — globalsynchronization — by updating the units and sites in continuous-time domain. Comparedwith the processing elements in a systolic array, each unit in the inference network onlyconsists of a number of NAND gates. Physical implementation of the continuous-timeinference network is much simpler than a systolic array. For comparison, a systolicalgorithm running on the Connection Machine takes about 10 seconds CM elapsed time,including 0.2 second CM busy time for an N = 10 problem. The potential problem withthis implementation, limited fan-out of a gate, may be overcome by using additionaldriving gates.Besides the existence of a path between two nodes, if we are also interested in the pathitself, a path indicator can be attached to each site. Figure 5.15 shows the circuit for sucha path indicator. Under the condition that none of the site or unit value decreases (theIND^INski/^ ui;Figure 5.15: A path indicator for the transitive closure problem.Chapter 5. Inference Network for the Transitive Closure Problem^71sufficient condition for convergence), an indicator at a site will turn on only when thesite value becomes 'high' while the unit value is still 'low'. Therefore, when the networkis stabilized, if the value of unit (i,j) is 'low', there is no path between the two nodes.If unit (i,j) has 'high' value, and none of the path indicator is on, there is a direct pathfrom i to j. If indicator of site 1 at unit (i,j) is on and unit (i, j) has 'high' value, thereis a path from node i to node j through node 1. The path from i to 1 and from 1 to j canbe derived recursively based on the path indicator at the corresponding units. Table 5.6shows how the path indicator works.sui uiit — 8IND siii ui;t + 8INDnoteIND IND0 0 1 1 i 0 1 0 indicator turns on0 0 1 1 0 T 0 1 indicator turns off0 1 0 1 1. 1 0 1 indicator keeps off1 0 1 0 1 T 1 0 indicator keeps onTable 5.6: States of unit, site and path indicator.Chapter 5. Inference Network for the Transitive Closure Problem ^725.5 An Example(a)^ (b)^ (c)Figure 5.16: An example of a 4-node graph (a), the intermediate result (b) and thetransitive closure (c).To illustrate the algorithm and implementation of the network, let us go through thefollowing N = 4 example. The connection matrix of a 4-node graph, Figure 5.16(a), is1 1 1 0 00 1 1 0 (5.56)1 0 1 1‘ 0 1 0 1 iAn N = 4 problem requires 12 units, each with 2 sites. A unit is a 3-input NAND gateand a site is a 2-input NAND gate. Figure 5.17 shows the complete circuit. Initial valuesare applied to the inputs of the inverters at t = 0. It takes the transition time of aninverter and a 3-input NAND gate for 5 unit outputs become 'high' — corresponding tothe 5 arcs in Figure 5.16(a). These high unit values are then sent to other units. 5 otherunit outputs become high after approximately the transition time of two NAND gates.The paths found so far are show in Figure 5.16(b). After another transition periodof two NAND gates, the last 2 unit outputs become high, and the transitive closure,Figure 5.16(c), is obtained. Figure 5.18 shows the unit outputs as functions of time. ItChapter 5. Inference Network for the Transitive Closure Problem^73takes about 10 nanoseconds to obtain the final result.In Fig. 5.18, unit outputs go up to 'high' in the sequence predicted by Fig. 5.16.However, due to the capacitance and inductance distributed among circuit elements, thewaveforms are not as smooth as an ideal output from a digital system. No over-shootingof unit outputs is observed. Down-shooting of less than 0.25V occurs at some unitswhich does not affect the overall updating of the network. Smoother waveforms can beachieved by optimizing the layout of the units within the chip and wiring of elementsin each unit. Additional built-in diodes at the output of units can regulate the down-shooting of waveforms to be within an acceptable range.5.8 DiscussionThe inference network was shown to provide a platform to solve the transitive closureproblem. The discrete version algorithm was shown to produce a solution within atmost (log(N — 1)1 synchronous iterations. The analog network was composed of logicgates which operated in non-synchronized fashion. The completion time for the analogimplementation was shown in nanoseconds range. A potential problem for a large sizenetwork is the fan-out limitation of the gates. Additional driving gates may be used toincrease the fan-out ability.Since a logic gate is formed by some analog circuit, and the logic circuit solution tothe transitive closure problem does not require a clock, analog circuits (or analog simpleneurons) can be used to substitute the logic gates for sites and units. Therefore, thecircuit for the transitive closure problem becomes indeed an analog circuit (or analogneuron network).The transitive closure problem can also be solved using systolic array or constant-time algorithms. They are synchronous approaches: the calculation at each processingChapter 5. Inference Network for the Transitive Closure Problem^74element has to follow some predesigned orders which are derived through tedious analysisof the data dependencies among processors. The inference network structure is obtainedfrom simple optimization procedures, therefore, network algorithm for transitive closurecan be derived directly from the problem formulation.I ^It ^•^ri)•^r)Chapter 5. Inference Network for the Transitive Closure Problem^75Figure 5.17: Circuit for an N = 4 transitive closure problem.n2.51.5B.554.53.5Chapter 5. Inference Network for the Transitive Closure Problem^76i/Out,...e/out i— %Oillout/out/out/out/outoutrn "vonn voltsIn voltsn voltsto voltsin voltsin voltsin voltsin volts(n voltsvolt Isola, template control fits tx in seconds;Figure 5.18: Hspice simulation of the inference network for a 4-node transitive closureproblem.Chapter 6Inference Network for the Assignment ProblemsThis chapter discusses the behavior of the inference network when it is used in opti-mization through gradient descent of an energy function — an approach used by manyneural network algorithms. The application of the inference network to the assignmentproblem was discussed. The dynamics of the network was shown similar to that of theHopfield net for the traveling salesman problem (TSP). The inference network has lesslinks than the Hopfield net, however, the eigenvectors for the two networks were provedsimilar. Due to the nature of the assignment problem and the use of a decreasing weightfor the objective function, the network was shown to converge for a much larger range ofproblem size than the Hopfield net for the TSP. For all the test cases used with size upto 128, it was demonstrated that the network converges to a near-optimal valid solution.6.1 The Assignment Problem6.1.1 The ProblemThe assignment problem is a classical combinatorial optimization problem. Consider Nobjects to be assigned to N persons. Both persons and objects are denoted by an integer0, • • • , N — 1. Given (0 < i , j < N) as the benefit of assigning object j to person i,the objective is to assign each person a distinct object so that the resulting one-to-onecorrespondence maximizes the total benefit.The Hungarian method, proposed by Kuhn in 1955 [38], is a traditional solution to77Chapter 6. Inference Network for the Assignment Problems^ 78the problem and a special case of linear programming [66]. It requires 0(N4 ) operations.In 1979, Bertsekas [5] proposed the auction algorithm which guarantees a global optimalsolution with a worst-case sequential complexity of 0(NM log(NM)), where M is thelargest integer benefit. Wein and Zenios [89] developed a hybrid implementation ofthe auction algorithm on the Connection Machine to solve large assignment problems.Kosowsky and Yuille [37], motivated by statistical physics, solve the problem using acontinuous dynamical system. Kim [34] uses a modified Hopfield's model to solve theassignment problem, and claims achieving better stability and accuracy. (The paper isin Korean.)The assignment problem has many applications. In VLSI layout, it mostly occur insemi-customer design styles such as gate-array or standard-cell design. In gate assign-ment, the slots are library cells and the circuit elements are gates. Supposed each cellcan accommodate exactly one gate and each gate has a set of legal locations to preservethe function of the circuit, the assignment is to minimize cost. Similarly, optimal assign-ment is also used for determining pin locations in global and area routing. Refer [51] fordetails.8.1.2 Optimization FormulationThe assignment problem can be formulated into a constrained optimization problem. Avalid solution satisfies the constraint of one object for each person. The valid solutionwhich maximizes the total benefit is defined as an optimal solution. Let us define 0 <uji < 1 for all 0 < i, j < N. Ili; =1 means person i is assigned to object j. ui; = 0 meansperson i does not get object j. Based on these definitions, Ei Doi %pa = 0 guaranteesthat person i does not get two distinct objects. Similarly, Ei Doi ujitth = 0 guaranteesthat object i is not assigned to two distinct persons. Ei tiq = 1 and Ei uij = 1 ensurethat only one person gets object j, and person i only gets one object. If ti q's are putChapter 6. Inference Network for the Assignment Problems ^ 79into an N x N array, then a valid solution is an array with exactly one 1 in each row andcolumn, all the rest elements are 0. The objective function to be minimized is:E = AELEuijuil -^F CE[(Etiii — 1 )2 + (Etsii — 1)2]10.i^ j—D E E u,ir,i^(6.57)where 0 < ujj < 1, 0 < i , j,1 < N, A, B, C, D > 0 and ri; > 0 Vi, j. In Eq.(6.57), thesum of term A, B, C gets a minimum of 0 when the constraint is satisfied. Term D isthe total benefit. Due to the symmetry of the constraint, the first two terms should beequally weighed. Therefore A = B. A, B and C must not be zero. If A and B are zero,Eq.(6.57) may get a minimum state where all unit values are close to N. If C is zero, theminimum state of Eq.(6.57) is ui; = 0 Vi, j, which is not a valid solution either.The assignment problem can have some further constraints. For example, a certainperson must or must not get a certain object. If person i has to get object j, equalsto 1 constantly, which reduces the problem dimension by one. If person i must not getobject j, uu = 0 can be set initially and be kept unchanged for the entire calculation.Neural network approach is also used in other optimization applications. Sriram [74]introduced a neural network solution to an NP-complete two dimensional assignmentproblem. Its energy function is the sum of three terms. Two of the terms representthe constraint, and the third one measures the quantity to be optimized. Constants inthe energy function are determined experimentally. Wacholder [86] introduced a neuralnetwork-based optimization algorithm for the static weapon-target assignment problem.The similarity of all neural network approaches for constrained optimization is that theirenergy functions are composed of terms for constraints and the quantities to be opti-mized. However, distinct energy functions correspond to different dynamic behaviors ofthe network.Chapter 6. Inference Network for the Assignment Problems ^ 806.2 Unit and Site Functions for OptimizationThe inference network for the assignment problem consists of N 2 units, and N sites oneach unit; unit value ui; has the same interpretation as in Eq.(6.57). The algorithm isbased on gradient descent — to decrease the value of the objective function Eq.(6.57)monotonically. The following site and unit functions can guarantee that AE = E(k+1) —E(k) < 0 is satisfied for small changes. The site 1 at unit (i,j) has a site function ofsijl =^U:1For all units (i,j), the unit function is defined as(6.58)AO' + KdAun (6.59)where1^x> 1i(x)= x 0<x<1 (6.60)0 x < 0Aut!) = (A + B)u;ir — EgA^C)te^(B C)41 )) + 2C + r11 (6.61)2Aufir —^(A^+ 2C + (6.62)A, B, C, D are the constants in Eq.(6.57), Kd > 0 is a constant determining themagnitude of the change. B = A is used in Eq.(6.62). D < 0 is also allowed.Summarize Eq.(6.59-6.61), it can be observed that each unit, together with its sites,act very similar to a neuron in the Hopfield net. See Figure 6.19. Each neuron in theHopfield net is connected with all other neurons. However, each unit in the inferencenetwork is only connected with the units in the same row or column.Chapter 6. Inference Network for the Assignment Problems ^ 81Figure 6.19: Unit and sites in the inference network act like a neuron.Continuous-time unit and site functions for the assignment problem can be similarlydefined:uii = f(Kcuij)^ (6.63)where vij satisfiesd vi d t = (A + B)ui; — E((A + C)ui/ + (B Outi) -I- 2C —2 ri;^(6.64)1= 2Aui; — >2(A C)siii^—ri •^ (6.65)and f(x) and sift are defined the same as in Eq.(6.60) and Eq.(6.58).6.3 The Decreasing Objective FunctionThe value of the objective function decreases monotonically for Kd > 0 and any valuesof A, B, C, and D. According to the definition of function f(x) in Eq.(6.60),if 0 < u(i.1!) KdAu(i ) < 1, ur) — u!") = KdAunif 141. ) KdAu!") > 1,^u!I1+1) — u!,) = 1 — u!") > 0 & KdAt41. ) > 1 — u!,) > 0;if 4) KdAuT < 0,^uLk+1) — use) = _ulie) < 0 & KdAuT < —ufir < 0.Chapter 6. Inference Network for the Assignment Problems ^ 82In any of the cases,4+1) — ttn dAUT 0holds. The derivative of the objective function Eq.(6.57) can be derived:d E = —2 ((A + B)ui; — E((A C)uit (B + C)uu)+ 2C + rii)d uij^ 1(k+1)^(k) iWhen uji changes slowly, (uii^— u• • s sufficiently small)SJE(k+i) — E(k)^v-■^dE (ti!+.1)t dui;Notice that(6.66)(6.67)2d E KdAu!k) = —2Kd ((A + B)utj — E((A nua +^n^2uti) + 2C + —rii) 5_ 0d ui; Using Eq.(6.66), we can getd E , (k+i) n(9) < 0 V i,.ikui; —^—we thus proved thatE(k+1) - E(k) < 0 (6.68)It can also be proved that network energy decreases for continuous-time updating aswell. According to the definition of function fix) in Eq.(6.60),thereforewhen 0 < < 1 u,, = Kcvsiwhen vii > 1^uii = 1when vu < 0^uii = 0Lt_th. = K LL.hd t^c d tIL( dug- = 0d ti- — 0•d tdEdt = L'Ll d^d :=LIc4" 1-ddu dt_ _ s7using Eq.(6.67) and Eq.(6.64), we haved ui;dE < 0d tChapter 6. Inference Network for the Assignment Problems^ 83The sigmoid function-1 (1 -I- ex - eX) (6.69)2^er e-rcan be used instead of the piece-wise linear function Eq.(6.60) and still keep Eq.(6.68)hold. However, Eq.(6.60) makes the analysis of the network dynamics easier and resultssimilar performance as using the sigmoid function, as long as Kd is kept small.8.4 TSP, the Hopfield Model and Comparison8.4.1 The Hopfield Model for the TSPThe traveling salesman problem (TSP) has the following simple description: given acomplete digraph of N nodes with an N x N matrix II4 > 0 defining the length ofarc (i,j), find a minimum length circuit (or tour) which goes through each node exactlyonce. Distances dii ire symmetric, that is, dii = di;. Also assume cli; = 0. The lengthd(T) of a tour T is given by d(T) = E(1,i)ET The problem has been studied extensively for the past few decades and many algo-rithms have been proposed for its exact solution. Since it is an NP-complete problem,one is therefore also interested in approximate algorithms which take less time to get agood, but not guaranteed optimal, solution. Hopfield and Tank [30] gave 0 < vii < 1the following interpretation for all 0 < i,j < N: vij = 1 means city i is the j-th stop inthe salesman's tour; and vki = 0 means city i is not the j-th stop. A valid tour is onewith exactly one 1 in each row and column of matrix [vii]. The energy function to beminimized isE = jEEEvxivzi + —B E E Evz,v,+ c (EEv,-Ny_2x i i0i^2 , x vox^x •+D—2EEE ( vv,i+1X if*T i dzliv.i_ + vo_i ) (6.70)Chapter 6. Inference Network for the Assignment Problems ^ 84where A, B, C, D > 0. The first and second term indicate that only one 1 is allowed ineach column and row. The third term means the total number of l's must be N. Thefourth term is the tour length.8.4.2 Comparison of the Assignment Problem with the TSPAlthough a lot of studies and improvements have been made to the original Hopfield net,we will compare the inference network with it due to the similar appearance of the energyfunctions. From Section 6.1.2 and Section 6.4.1, we have observed some similaritiesand differences between the assignment problem and the TSP, and their optimizationformulations; they can be summarized as follows:Similarities:• Constraints can be formulated into the same restrictions: one 1 in each row andeach column;• The A and B terms in objective function Eq.(6.57) is the same as those in energyfunction Eq.(6.70);• Both Eq.(6.57) and Eq.(6.70) are to be minimized;• Both use gradient descent which may end up in local minimum state;• When D term is ignored, the eigenvalues of the connection matrices for both prob-lems are similar, as we will see later and in [2].Differences:• Each valid assignment has one and only one distinct set of ?Ai; values; however, asingle tour corresponds to 2 different sets of vi; values, since a circular tour can beinterpreted in two opposite directions;Chapter 6. Inference Network for the Assignment Problems ^ 85• The C term in Eq.(6.70) restricts the total number of neurons on; C term inEq.(6.57) specifies the number of on-neurons in each row and column; D terms for total benefit and total tour length are different due to the interpretationof neurons.Because of the similarities of the two problems, many dynamical behaviors of thenetwork for the assignment problem are similar to those of the Hopfield model for theTSP. Using a negative D, the objective function Eq.(6.57) for assignment can also beinterpreted for the TSP, in which u ij = 1 means city j is immediately after city i in thesalesman's tour. Since this formulation often yields disconnected sub-circles, additionalconstraints prohibiting sub-tours must be used. Xu [92] studied this approach.The major reason responsible for the Hopfield model's invalid tours does not exist inthe similar model for assignment. A fundamental problem with the Hopfield TSP modelis its degeneracy. Since a tour can be clockwise or anti-clockwise, it has 2 distinct tujvalid states. The network aims at one tour but proceeds towards the 2 correspondingstates, and finally it often ends up with an invalid state. In the assignment problem, eachvalid assignment corresponds to a unique u jj array, hence the network evolves towards aparticular state.The constraint that there are exactly N l's in the matrix [uij] is represented differentlyin the neural models. The Hopfield model has a tendency of getting two or zero l's in arow or column [90] while the total number of l's is still N. C term of Eq.(6.57) specificallyrestricts the number of l's in each row or column to be one. This reduces the chance ofassigning one object to two persons or two objects to one person.Chapter 6. Inference Network for the Assignment Problems^ 866.5 Dynamics of the Network6.5.1 Connection MatrixLet u be an N2 x 1 vector and ui; be its (iN + j)-th element. Eq.(6.59-6.62) can berepresented by the following matrix operation:u(k+1) = f(u(k)^KdPu(k) + KdQ)^(6.71)where P is an N2 x N2 symmetric matrix. Its element at row iN + j and column rN +is (when B = A)PiN-1-jo"Ni-s = 2,46(i — r)6(j — s) — (A + C)45(i — r) — (A + C)O(i — s)Q is an N2 x 1 vector, its element at row iN + j is(6.72)qiN+j =^D+ —ri;2and(6.73)f(x) = f((zi,x2,• • • ,x.)t)= (f (xi), f (x2), • • • , f(x.)Yx=045(x) = {1 (6.74)0 x# 0function f (x) is defined in Eq.(6.60).The determinant of IsI — PI, though lengthy, can be derived (see Appendix E.2 fordetails):IsI — PI = (s + 2(N — 1)A + 2NC)(s — 2A)(N-1)2 (.9 + (N — 2)A + NC)2(N-1)Therefore, P has three distinct eigenvalues:= —2(N — 1)A — 2NC^ (6.75)Chapter 6. Inference Network for the Assignment Problems ^ 87A2 = 2A^ (6.76)A3 = —(N — 2)A — NC^ (6.77)since A, C > 0, we have Al < 0, A2 > 0 and A3 < 0. The eigenvector corresponding toAl ise1 = (1, 1,^, 1)t^(6.78)Or1 1^1el= (—N' —N' ' —N Y (6.79)The eigenvectors corresponding to A2 and A3 form (N — 1)2- and 2(N — 1)-dimensionalsub-spaces 322 and 323 respectively. For any A, C > 0, Al A2 A3, therefore,ei 1 322 1 323The objective function Eq.(6.57) can be written using matrices P and Q as:E = —utPu — 2Qtu + 2C^ (6.80)8.5.2 Eigenvalues Move the Network Towards the Valid Sub-spaceSince Eq.(6.71) contains non-linear function f', it is hard to analyze its dynamic behavior.However, when Kd is small enough to keep J.' (x) x, Eq.(6.71) can be simplified intou(k+1) = u(k) KdP11(k) KdQ^ (6.81)Let u be decomposed into three components u 1 , u2 and u3 corresponding to directionel , sub-space 322 and R3 respectively.U = U 1 -F U2 + U3^ (6.82)where u 1 = (u • é l )ei . Since A 1 , A2, and A3 are eigenvalues, we havepu(k) = Aiul(k) A2u2(k) A3u3(k)^ (6.83)Chapter 6. Inference Network for the Assignment Problems^ 88Q in Eq.(6.81) has component of^hence, Q can not be pre-decomposed into the threesub-spaces. In order to analyze network dynamics, let us assume D = 0 for the moment,hence Q = 2Ce1 = 2C/Ve1 . With these assumptions and decomposition, we have, fromEq.(6.81),u1(k+1) = ul (k) + Kd ul (k) + 2CNKde1 = (1 + KdAi )ul(k) + 2CNKda1^(6.84)u2(k+1) = u2(k) + KdA2u2(k)^ (6.85)u3(k+1) = u3(k) + KdA3u3(k) (6.86)On the other hand, according to Eq.(6.80), we have,E = —Ailu1 12 — A21u2 12 — A3 1u3 12 — 4CNIu1 1 + 2C^=^2CN )2 A21u2 12 A31u312 4C2N2 + 2C^(6.87)AlRecall that Ai , A3 < 0, and A2 > 0, in order to minimize E, what are required are^1.1111 1 = _2CN^2CN— lAil2. 1u2 12 increases with the increase of k, and3. 1113 1 2 decreases with the increase of kFrom Eq.(6.84), we can get^i (k+i)^ 2CN el) 2C Nu^= (1 + KdAi )k (u1(°) +Ai^Ai elfor small Kd, we can have —1 < KdA i < 0, hencei (c..)^2C Nu = —e1Aitherefore,lui (c.)I .2CNlallChapter 6. Inference Network for the Assignment Problems^ 89which ensures that the first requirement will eventually be satisfied. Since A2 > 0,Eq.(6.85) ensures 1u 2 I increasing with time. Similarly, the negative A3 ensures lu3 1 de-creasing with time.For a valid solution, Ei Ei ti,(7) = N holds, hence,u1(c°) el = u(00) el = —N (u(c°) el) = —N N = 1and u2(°°) will be very big (without the non-linear function f (x)) and u3(°°) will be zero.The sub-space formed by eigenvectors corresponding to A2 contains valid solutions, andthe sub-space formed by eigenvectors corresponding to A3 contains invalid solutions.6.5.3 The Role of the Non-Linear FunctionWhen Kd is set sufficiently small, KdAtdjk. ) can be very small. If 0 < uLk. ) < 1 also holds,Eq.(6.81) will describe the exact behavior of the network. In this case, vector u movesin the direction towards the valid solution space. Only when u is sufficiently close toa corner of the N-dimensional hypercube, the non-linear function has its impact. Butsince KdAu is kept small, the result f(u KdLu) can only be slightly different fromu KdLlu. Therefore the result vector still points very close a hypercube corner. hiother words, the solution is almost determined before the non-linear function starts toaffect the change; the non-linear function only keeps lui within a certain range.6.5.4 The Complete Q TermD = 0 was assumed in the last two subsections. However, D term is a major part ofvector Q. Since this term is related to rii , it may have components in all of el, 322 and 323subspaces. For a positive D, this term moves the vector u in a direction in favor of largerii , although this movement can be either towards or away from the valid solution space.However, if the speed of moving the vector towards the valid space is properly rated withChapter 6. Inference Network for the Assignment Problems^ 90the speed towards a large benefit, this term can move the vector approximately withinthe valid space in order to arrive at a near-optimal solution. Like all the other gradientdescent algorithms, vector u will fall into local minimum if constants A, B, C and D arenot properly given.6.5.5 Initialization, Updating Rate and TerminationIdeally, initial values u!P do not affect the final solution, since the solution is determinedby I u1 I ±L, 1u2 1 —4 00, 1u3 1 0, non-linear function f(x), together with theD term implication. A simple and unbiased initialization is 49 = k for all i,j. Thisinitialization can keep Du;,; roughly invariant with N, so that Kd can be almost the samefor different N's.A uniform initialization may cause problems for some specific benefit matrices. Forexample, when the benefit matrix [r12} has two rows or two identical columns, the problemhas multiple symmetric solutions. In order to break the symmetry, small noise may beadded to the initial values. For example,(o)^1ui _— —N *s (6.88)where s is a random number distributed between 0.9 and 1.1. A large value of Kd canspeed up the converging process. However, a too large Kd may introduce the non-lineareffect too early and cause oscillation. Kd should keep KdAuii within a small percentageof the initial values 49 so that non-linearity does not occur too early.Three different criteria are used to determine when a solution is achieved or whetherthe calculation should be terminated. The first of these is the discovery of a valid solution.Since uki represents assignment, a valid solution is obtained if and only if there is exactlyone neuron in each row and column with high value and all the rest have low values. Todetermine whether a particular neuron is high or low, a threshold rule can be applied.Chapter 6. Inference Network for the Assignment Problems^ 91•For example, those neurons with values greater than Th = 0.65 are considered high, andthose below Ti = 0.30 are considered low.The second criterion is to check whether the network is still changing. If the totalchange of the uij values between two successive iterations is less than 7'2 = 10 -4 , theneurons are considered stabilized, and the network reaches a stable final state.The final termination criterion is a time-out test. If after T3 = 1500 iterations, neitherof the above criteria has been satisfied, the system is considered divergent.6.6 Determining ConstantsFrom Eq.(6.57-6.62), we know that constants A, B, C, and D are related. B can be setequal to A due to the symmetry of the constraint. In Eq.(6.62), if we assume B = A,uki = N vi, j, Du;, — 2AV- can be obtained. In order to get a D invariant withN, we choose A = B = NN 1 .From the previous discussion, we see that constants A and C determine the eigen-values, Eq.(6.85-6.86) show that vector u moves towards the valid sub-space with aspeed IA2 Kdi = 2AKd, and moves away from the invalid sub-space with a speed ofIA3Kd I = ((N — 2)A + 2NC)Kd. If D = 0 and Kd is sufficiently small, any positive A andC lead to a valid solution. From Eq.(6.57), if 5 is too large, then C term overweighs Aterm, hence the network may arrive at a stable state where all u ii 's are close to k. Onthe other hand, if 5 is too small, the network may converge to the all zero state, whereA and B terms are minimized and C term is not. Simulation shows that the ratio 2 canvary in a certain range to obtain a valid solution. However, the range is approximatelyanti-proportional to N.Term D moves the vector u towards the direction in favor of the large rij's. However,this direction may or may not be the direction of the valid sub-space. If rij are givenChapter 6. Inference Network for the Assignment Problems ^ 92such that the direction of large ru's is close to the direction of invalid sub-space, a largeD may force the vector stay in the invalid space. If the direction of large ru is close tothe direction of valid sub-space, but still with some difference, a not-to-large D leads toa solution at a corner of the hypercube; whereas a too-large D bends the vector awayfrom the corner. The larger D is, the farther the vector is from the corner. However,the optimization problem is meaningless if D is too small. From simulation we noticedthat the direction of the solution vector u is usually determined at the early stage of thecalculation. In the beginning of the calculation, the large D term contributes by movingthe vector towards larger total benefit state; when the calculation is near the end, itoften tends to draw the vector away from a hypercube corner. Based on this observation,we start with a large Do and reduce its value by a small percentage in each iteration:D := pD, where 0 < p < 1. With a decreasing D, the network can always arrive at avalid solution, since When D term is negligible, the minimum state of Eq.(6.57) is alwaysa valid assignment.The value of D is also related with the distribution of ru. In a valid solution, NN-1of the elements are expected to settle down at 0. Suppose benefits ru have probabilitydistribution of g(r), and satisfies f f.g(r)dr = NN-1 , suppose further E uit = 2,and uu = k, the solution Do from1 DoAuk; = 2A 1-Tr 2(A + C) + 2C + r... = 0can be a guideline in determining initial value Do.8.7 Simulation ResultsThe overall performance of the algorithm is satisfactory. For all the examples we usedwith N varies from 2 to 128 and even distribution of benefits ru, the network alwaysconverges to a valid assignment as long as the constants are chosen based on the aboveChapter 6. Inference Network for the Assignment Problems^ 93guidelines. For a particular set of benefits rij, the solutions corresponding to differentvalues of 5, 1--,(44 are usually the same. The value w sometimes changes the solution forlarge problems, but the total benefit does not change significantly. For all the examples wetested with N < 12, the inference network algorithm gives the same result as exhaustivesearch.8.7.1 Small Size ExamplesThe following example with N = 10 illustrate some properties of the network. Thebenefits^are evenly distributed in [0, 1]:0.053 0.914 0.827 0.302 0.631 0.785 0.230 0.011 0.567 0.3500.307 0.339 0.929 0.216 0.479 0.703 0.699 0.090 0.440 0.9260.032 0.329 0.682 0.764 0.615 0.961 0.273 0.275 0.038 0.9230.540 0.443 0.837 0.368 0.746 0.469 0.505 0.328 0.480 0.4240.678 0.139 0.763 0.959 0.707 0.242 0.663 0.759 0.332 0.4550.685 0.716 0.136 0.720 0.832 0.751 0.681 0.106 0.379 0.7190.381 0.919 0.163 0.219 0.639 0.261 0.040 0.144 0.941 0.8720.569 0.972 0.364 0.684 0.931 0.423 0.927 0.594 0.182 0.6110.401 0.868 0.680 0.538 0.940 0.512 0.289 0.621 0.970 0.6680.693 0.352 0.940 0.208 0.571 0.579 0.821 0.963 0.724 0.762Using exhaustive search (takes about one hour on Sun/3), the optimal assignment withtotal benefit of 9.053 was found:0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 10 0 0 0 0 1 0 0 0 00 0 1 0 0 0 0 0 0 0Chapter 6. Inference Network for the Assignment Problems ^ 940 0 0 1 0 0 0 0 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 00 0 0 0 0 0 1 0 0 00 0 0 0 1 0 0 0 0 00 0 0 0 0 0 0 1 0 0The inference network algorithm gives the above optimal solution almost instantly.For this example, we used initial value 49) E V], and D decreases by 0.6% in eachiteration (p = 0.994), the ranges of C, Do, Kd that keep the network to converge are:0.05 < C < 9.9 with A = B = N 1 ,Do = 4A, p = 0.994, Ka = 0.01750.5 < Do < 11 with A = B =N 1' C = Tv- , pp=0.994, Kd = 0.010.0005 < Kd < 0.014 with A = B = N C= 75 Do = 4A, p = 0.994With A = B = NN 1 , C = N, Do = 4A, and Ka = 0.01, D's decreasing rate p can bewithin the range of p E [0.75,1] to keep the solution optimal. Ka is the main factordetermining converging speed. Figure 6.20 shows the number of iterations required forthe network to settle down with respect to different values of the constants.Additional constraints, such as somebody has to get a certain object or somebodymust not get a certain object, can be introduced to the algorithm without much extraefforts. The same non-zero Kd is used for all the neurons which do not involve anyadditional constraints; and Kd = 0 is used for all the neurons with additional constraints.If person i has to get object j, set u• 9 = 1; if person i must not get object j, set tet?) = 0.In the above N = 10 example, if we specifically require um = 0, u6,8 = 0 and u3,2 = 1,u8,5 = 1, our algorithm gives the following result with total benefit of 7.886:Chapter 6. Inference Network for the Assignment Problems ^ 950 0 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 0 10 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 00 0 0 1 0 0 0 0 0 01 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 00 0 0 0 0 1 0 0 0 00 0 0 0 0 0 0 1 0 0If the benefit matrix [rii] has two identical rows, the assignment for the two cor-responding persons can be exchanged without changing the total benefit. These twoassignments are considered symmetric. Similarly, if there are two identical columns inthe benefit matrix, there will also be two symmetric assignments. Even two rows orcolumns have only part of them identical, it is still possible to have symmetric optimalassignments. For example, the following matrix (a) has two optimal assignments (b) and(c), both having total benefits of 3.0.0.4 0.6 0.9 0.1 0 0 1 0 0 1 0 00.4 0.6 0.9 0.1 0 1 0 0 0 0 1 00.1 0.2 0.4 0.8 0 0 0 1 0 0 0 10.7 0.4 0.2 0.9 1 0 0 0 1 0 0 0(a) (b) (c)If initial values are all identical, we will get u01 = u 1; for all j and all iterations, therefore,the network will not arrive at a valid solution; instead, it will stay at a state with0 < 71 < uoi = un,u02 = u12 < Th < 1, U23 = 1.0, U30 = 1.0, and all the rest areChapter 6. Inference Network for the Assignment Problems^ 960. However, if small amounts of noise are applied to ut? ) with Eq.(6.88), the network isobserved to arrive at either solution (b) or (c) quickly.The number of iterations required for the neural algorithm to arrive at its solutionis mainly determined by the distribution of benefits ri; 's rather than their range (as forthe auction algorithm). If larger ri; values are scatted in different rows and columns, thenetwork arrives at the solution fast. On the other hand, if large benefits are concentratedin some rows and columns, the convergence speed is slower. For the following randombenefit matrix (a)0.053 0.914 0.827 0 1 00.302 0.631 0.785 0 0 10.230 0.011 0.567 1 0 0(a)^(b)with A = B = N 1 , C = Do = 4A, p = 0.994, and Kd = 0.01, the optimal assignment(b) is obtained in 70 iterations. Using the same constants for another benefit matrix (c),0.1^0.80.9 0.850.16 0.87(c)^(d)^(e)^(.0^(g)it takes 168 iterations to arrive at the optimal assignment (d). Since matrix (c) is a trickyexample with three near-optimal valid assignments (e), (f), and (g). If the noise on theinitial values are too big, the network may stabilized at a sub-optimal state.The number of iterations required increases gently with N. Figure 6.21 shows theresults. The range of C is also related with problem size N. The larger N is, the smallerthe C range will be to keep proper ratios of the A, B, C terms. Figure 6.22 shows themaximum C values for different N's which yield valid solutions.0.11 0 0 1 0 1 0 0 1 0 1 0 00.9 1 0 0 1 0 0 0 0 1 0 0 10.17 0 1 0 0 0 1 1 0 0 0 1 0Chapter 6. Inference Network for the Assignment Problems^ 978.7.2 Solve Large Problems on the CMSimulations had been conducted for the cases of N = 64 and N = 128 on the 8Kand 16K Connection Machine respectively. The algorithm is outlined in Algorithm 4.Table 6.7 and Table 6.8 gives the CM time and total benefits obtained. All benefits r iiare evenly distributed in [0,1]. By using a largest converging Ka, the CM time can bemade shorter. Changing D's decreasing rate p may result in a different assignment withslightly different total benefit. Statistically, given i evenly distributed numbers in [0, lbthe mathematical expectation of the largest number is If we pick the largest numberin the first row and then delete the column it is in, pick the largest number in the secondrow and delete the column it is in, and so on, the mathematical expectation of the totalN ibenefit is EN =^TI:T. For N = 64 and N = 128, the expectations are E64 = 60.24and E128 = 124.56, respectively. Although the inference network algorithm does notguarantee an optimal solution, the average total benefits we obtained are greater thanthese expectations. For comparison, optimal total benefits from the Hungarian methodare also given.Algorithm 4 : Assignment on the CMfor all PE, u[i,j]:= uo, where u0 Ewhile (terminating criteria not met) (refer Section 6.5.5)BEGIN PARALLEL for all i,jh[i, (—i — j)mod(N)] = u[i, v[(—i — j)mod(N), j]:= u[i,j]FOR k = 0 TO k = N —IBEGINupdate u[i, j] using Eq.(6.58-6.62)propagate: 14i, j] := h[i, (j — 1)mod(N)] v[i,j]:= v[(i —1)mod(N),j]ENDChapter 6. Inference Network for the Assignment Problems^ 98END PARALLELrandom problem 1 2 3 4 5 6 7CM busy time 31.5 37.7 37.9 31.4 37.7 33.8 40.6benefit-Inference 61.832 62.367 62.047 62.232 61.168 61.634 61.461benefit-Hungarian 62.450 62.367 62.693 62.443 62.666 62.294 62.559random problem 8 9 10 11 12 13 14CM busy time 41.7 32.4 38.6 32.0 45.8 43.4 38.0benefit-Inference 61.717 61.656 61.109 61.598 60.093 62.127 61.492benefit-Hungarian 62.653 62.551 62.220 62.509 62.594 62.613 62.633random problem 15 16 17 18 19 20 averageCM busy time 41.5 30.5 39.3 36.5 35.0 31.2 36.8benefit-Inference 61.417 61.614 60.900 61.065 61.263 62.002 61.5317benefit-Hungarian 62.294 62.460 62.357 62.443 62.486 62.659 62.4957Table 6.7: CM busy times (in seconds) and assignment results for 20 random N = 64problems using inference network (A = B = N i , C = ri,Do= 4,p = 0.996, Kd = 0.008)and corresponding result from the Hungarian method.random problem 1 2 3 4 5 6 7CM busy time 118.8 112.4 94.7 121.3 101.6 115.1 99.2benefit-Inference 124.846 125.232 125.295 124.723 124.165 124.697 124.983benefit-Hungarian 126.503 126.478 126.318 126.509 126.276 126.463 126.187random problem 8 9 10 11 12 13 14CM busy time 109.6 120.6 99.4 106.7 117.3 133.9 114.4benefit-Inference 124.131 124.851 125.742 125.293 124.835 125.710 124.180benefit-Hungarian 126.323 126.227 126.368 126.486 126.465 126.478 126.517random problem 15 16 17 18 19 20 averageCM busy time 120.4 135.3 117.1 111.8 113.5 124.6 114.4benefit-Inference 124.493 124.825 124.880 124.584 124.104 125.109 125.0321benefit-Hungarian 126.293 126.464 126.362 126.411 126.386 126.587 126.4051Table 6.8: CM busy times (in seconds) and assignment results for 20 random N = 128problems using inference network (A = B = A,c=i,,D0= 4,p = 0.998, Kd = 0.007)and corresponding result from the Hungarian method.Chapter 6. Inference Network for the Assignment Problems ^ 998.8 DiscussionThe auction algorithm can be parallelized. Different from the Hungarian method andthe inference network algorithm, the complexity of the auction algorithm depends onthe range of the benefits: its complexity is O(NM log(NM)), where M is the largestinteger benefit. The auction algorithm is efficient at the beginning when many peoplebid for assignments. However, it usually has a long 'tail' when only a few people are leftunassigned. This tail results in a relatively long completion time. Wein etc. [89] proposeda hybrid auction algorithm for the CM in which tails are truncated and processors aredynamically assigned to the active bidders. The algorithm can shorten the tailing effectwhen N is large (for example N > 256) and virtual processor ratio is greater than 1. Itcan solve an N = 1000 or N = 2000 problem with maximum benefit M = 1000 on the16k CM in 22.2 and 96.3 seconds respectively. Therefore, the hybrid auction algorithm ismore efficient than the inference network algorithm for a large size assignment problem.In all our simulations with N up to 128, the inference network algorithm was shownto produce valid assignments and the results were usually close to optimal; the smallerthe problem size was, the more likely an optimal solution would be reached. 20 differentbenefit sets for each problem size N were tested. The results obtained are given inTable 6.9. The optimum solutions in the table were obtained from the Hungarian method.problem size N 4 8 16 32 64 128# of optimal solutions obtained 20 20 13 2 1 0average difference from optimum 0.0% 0.0% 0.5% 1.1% 1.5% 1.1%Table 6.9: Solution quality vs. problem size. 20 distinct random examples were used foreach N.Chapter 6. Inference Network for the Assignment Problems^ 100The computational complexity of the Hungarian method is 0(N4). It takes abouttwo minutes for the Hungarian method to produce an N = 128 assignment on a Sparcstation. With a much slower clock rate (7 MHz), the inference network algorithm on theCM also takes about two minutes.The Hopfield's model for the traveling salesman problem presented an approach forconstrained optimization different from traditional AI methods. Unfortunately, it oftenyields invalid solutions and does not work well for large problems (for example N >15) [90]. Despite the similarities, the inference network algorithm for the assignmentproblem did not yield invalid solutions as the Hopfield model did for the TSP, due to theformulation of the problem and the setting of the constants (Simulations were conductedfor random problems with size up to 128).Both the assignment problem and the TSP are formulated to find an on-off neuronpattern to optimize an objective function. In the inference network, each valid assignmentcorresponds to a single state satisfying the constraint — exactly one neuron on in eachrow and column. However, the Hopfield model for the TSP is degenerated: in the Hopfieldnet, each valid tour corresponds to two distinct states, and both satisfy the constraint.Therefore, although the Hopfield net is a great idea for combinatorial problems, the TSPmay need to be remodeled into some other constraints on neurons.The D term in the objective (energy) function measures the quantity to be optimized.Therefore this term must be significantly large to achieve the goal of optimization. Onthe other hand, large D terms lead to divergence; a small D is advantageous for thenetwork convergence. A compromise between these two aspects is to use a decreasing Dterm. Our simulation has shown that using a decreasing D, the network converges forall the test cases with problem size up to 128. Since the Hopfield net has similar eigencharacteristics to the inference network, it can be expected that the Hopfield net canalso have better convergence if a decreasing D is used. However, a decreasing D will notChapter 6. Inference Network for the Assignment Problems ^ 101solve the degeneracy problem of Hopfield's model for the TSP.cf 10 4.*E 3.00ozB 2.4i400Z200.00Eo1:1i 180.od 160.00Z140.002.00^6.00 10.00 c(b)2.00^6.00320.00E 280.00oIsV 240.00 P,t 200.00dZ 160.00120.00 D10.00Chapter 6. Inference Network for the Assignment Problems^ 1020. 0.00 4.00^8.00 12.00 Kd*10-3(a)(c)Figure 6.20: (a) A = B = NNi , C = R, Do = 4A, p = 0.994, the larger Kd is, the fasterthe network converges; (b) A = B = A, Do = 4A, p = 0.994, Kd = 0.01, the networkconverges slowly when C is small, and converging speed is almost the same within acertain range of C; (c) A = B = is-fi, C = R, p = 0.994, Kd = 0.01, the larger Do is,the faster it converges.Chapter 6. Inference Network for the Assignment Problems^ 1038006000.1=g 400200010^20^30^40Figure 6.21: Number of iterations vs. problem size. All benefits r e; are randomly dis-tributed in [0, 1]. A = B = N 1 , C^Do = 4A, Kd = 0.01, u!`)) E [V, kl] Figure 6.22: Maximum C for the algorithm to converge vs. problem size. All benefitsare randomly distributed in [0, 1]. A = B = #7, Do = 4A, Ka = 0.01, tiT E^W-].Chapter 7Conclusions7.1 SummaryThis thesis introduced a novel parallel computing platform — the binary relation infer-ence network, which embodies time-efficient parallel computing structures to perform aclass of optimization problems. Candidates competing for the optimum were explicitlyrepresented by the sites at a unit; the selection of the optimum was conducted by the unitfunctions. The infere=nce network algorithms were derived directly from suitable problemformulations.Sites in the inference network perform binary deduction and produce candidates.Units collect the candidates and make decisions based on the optimization criteria. Thecomplete inference network is composed of N2 units, each unit has a fan-in of N anda fan-out of 2N. The network has logarithmic convergence rate for the graph problemsdiscussed. Each individual unit is a simple computing device. The inference network is auniform structure with a high degree of redundancy, since the nature of the optimizationproblem is to make decisions based on multiple choices. The connectivity of the networkunits grows with problem size N.Inference network algorithms in both discrete and continuous-time domains for theshortest path problem, the transitive closure problem, and the assignment problem werediscussed. The synchronous inference network algorithm for the shortest path problemwas mathematically equivalent to the divide and conquer version of the sequential matrix104Chapter 7. Conclusions^ 105algorithm for the problem. Compared with all other shortest path algorithms discussedin Chapter 4, the inference network was shown to solve the problem with the leastnumber of synchronous iterations. Therefore, it will be the fastest if every algorithmruns on the kind of hardware that optimizes its performance. Implementing the discretealgorithm on the Connection Machine resulted in a shorter completion time than one ofthe fastest systolic algorithms for N < 512 (the maximum size investigated). It was alsodemonstrated that the network can unify many existing all-pair shortest path algorithmsby defining different unit functions. The inference network algorithm, which was based onsimple procedures for finding the shortest path, was proved to produce optimal solutionfor synchronous, asynchronous, and continuous-time updating, and its convergence wasindependent of the problem size.The inference network algorithm for the transitive closure problem was derived di-rectly from the problem formulation. The solution was obtained efficiently using combi-national logic without a clock. No gradient descent was involved. Theoretical analysisproved that the convergence of the algorithm to the optimal solution was independentof the problem size. The structure of the continuous-time network and simulation of itsperformance were given.The dynamic behavior of the inference network and the impact of its energy con-stants were discussed through the example of the assignment problem. The constantsin the energy function (corresponding to link weights) were determined for the problem.Compared with results from exact solutions, the assignment results from the inferencenetwork were found to be near optimal for all the test cases with N up to 128. Comparedwith the Hopfield model for the TSP, the inference network was shown to converge for alarger size of problems, due to the use of a decreasing weight for the objective functionas well as the formulation of the problem.The inference network was also applied to data consistency checking (Appendix A),Chapter 7. Conclusions^ 106consistency labeling (Appendix C), and matrix inversion (Appendix B). For data con-sistency checking or consistency labeling, the inference network finds or eliminates allinconsistency. Among the applications discussed, the assignment problem was solvedusing a different strategy from the other problems. For the assignment problem, the in-ference network aims at decreasing the network energy through gradient descent, whichis very similar to other neural network optimization approaches. For other problems(shortest path, transitive closure, data consistency checking, and consistent labeling),the inference network performs simple optimization procedures for the problem, and thedecreasing of the network energy was just a result of optimization.For all the applications discussed except matrix inversion, synchronization of the unitswas not require. In matrix inversion, the inference network was used as a systolic array.7.2 ContributionThe major contribution of this thesis is the architecture of the inference network. Thenetwork was designed to implement the optimization procedure in a direct manner. Eachdecision making unit in the network always has access to all candidates competing forthe optimum and these decision-making units can operate asynchronously. Once a com-putation has been started, the decision making units actively optimize their values untilan optimum is reached.For the class of graph problems considered in this thesis (shortest path, transitiveclosure, consistent checking), the inference network competes with previously formulatedneural network models. The inference network has the advantage of requiring no trainingor weight determination. Additionally, the inference network units have much smallerfan-in and fan-out than the neurons of a typical neural network. These features arefundamental to cost-effective hardware implementation.Chapter 7. Conclusions^ 107The inference network has shown some of the advantages of a neural network andsome of the advantages of a typical parallel processing array. Like many neural networks,the inference network does not require a rigorous synchronization and its units are simplecomputational engines. When applied to the assignment problem, the inference networkbehaves like a Hopfield-type neural network. Similar to a parallel processing array, theinference network can achieve optimal solution bounds for many problems (for example,shortest path, transitive closure, and consistent labeling) and the inter-processor linkscarry no weights. It can also be used synchronously to perform matrix inversion like asystolic array.The inference network was shown to work in both synchronous and asynchronousmodes, in both discrete-time and continuous-time domains. Traditional AI algorithms,systolic array algorithms, and dynamic programming algorithms concentrate on the de-tailed recurrence formulations. They are usually discrete-time algorithms pursuing exactsolutions. A systolic array must have a clock to synchronize all the processing elementsto ensure the correctness of the results. The inference network for most of the problemsdiscussed was shown not to require rigorous synchronization, therefore a clock was notnecessary. It was shown that the inference network can be built using simple integratedcircuits (for shortest path), logic gates (for transitive closure) and simple neurons (forconsistency labeling). The analog inference network operating in a continuous-time do-main was shown to be easy to implement since it does not need a clock and it does notneed one set of circuits for each bit of data.Many neural network optimization approaches define an energy function in the formof a weighted sum and pursue the solution by decreasing the energy in gradient descent.Optimal solutions are not guaranteed and, if the energy function has more than oneweight, determination of the weights in the energy function (and consequently the linkweights in the network) is not a easy job. For all the applications discussed in theChapter 7. Conclusions^ 108thesis except the assignment problem, no gradient descent was involved in the inferencenetwork.The inference network architecture was derived from simple optimization procedures.For problems in which the optimization of a binary relation is always advantageous to theoptimization of other binary relations, (for example the shortest path, transitive closure,data consistency checking, and consistent labeling), the network remains active until allthe units reach their optimum and the global optimal solution is obtained. In these cases,network links carry no weights.The detailed structure difference between the inference network and neural networksor other parallel computing models is that the inputs to an inference network unit aregrouped into sites, and the evaluation of each site is independent of the evaluation ofother sites. This gi'ves a unit the freedom to update the unit value whenever a newcandidate for the optimum is produced.7.3 Open Issues and Future WorkOne of the open issues about the inference network is the number of units it requires. Forexample, classical neural network approaches for the N-city traveling salesman problemrequire N2 neurons [30]. The maximum neural network model [49] solves the K-partitesubgraph problem with N * K neurons. The neural network approaches for the shortestpath problem also use N2 neurons [19,83]. A question that arises is whether it is necessaryfor the inference network to have N2 units for a problem of size N (e.g. a graph with Nnodes). Each unit in the inference network represents a binary relation. If the objectiveis to obtain a value for each of the N2 binary relations concerning N nodes, having lessthan N2 units means that the final results must be obtained in some order and somekind of synchronization is required. Therefore, it is not very likely that the number ofChapter 7. Conclusions^ 109units in the inference network can be reduced while preserving its asynchronous feature.When the problem size is large, the divide and conquer scheme may be used to solvea large problem in a number of iterations. The transitive closure problem is an examplethat can be tackled by a divide and conquer scheme. After applying the inference networkto a subgraph of the original problem, all the connected nodes can be merged and treatedas one single node in the rest of the graph. Therefore, a smaller inference network can beapplied to a larger graph and the transitive closure of the original graph will be obtainedafter using the inference network iteratively.Another open issue is what other problems can be solved on the inference network?To shed some light on this issue, the problems of the shortest and longest path will beexamined. The shortest path and the longest path problems can be similarly representedby a set of binary rdiations. The shortest path problem can be solved on the inferencenetwork while the longest path cannot. The basic reason is not that the former is apolynomial-complexity problem and the latter is an NP-complete [21] problem. In theshortest path problem, the shortest path length from node i to j always equals to thetotal of the shortest path lengths from i to k and from k to j for all k's. However, thelongest path from i to j is often not the concatenation of the longest path from i to kand from k to j for any k. The question is which problem can be naturally representedby binary relations and solved through binary operations on these relations (as discussedin Section 2.8).Can the inference network be applied to an NP-complete problem? Although noneof the applications discussed in this thesis are NP-complete, this does not exclude thepossibility of applying the inference network to an NP problem. However, since the infer-ence network consists of a polynomial number of processors, if a polynomial completiontime is expected to solve an NP-problem, the solution can not be guaranteed optimal.The author has attempted to implement the algorithm proposed by Press [68] for theChapter 7. Conclusions^ 110traveling salesman problem on the inference network. The algorithm combines simulatedannealing with route rearrangement. The inference network supports the communicationrequirement for the kind of route rearrangement used. However, since the algorithm of-ten converges to a local minimum and route rearrangement is sequential in nature, theperformance of the inference network for this algorithm is not outstanding. It remainsopen to find efficient mapping of NP-complete problems on the inference network.Is it possible for the inference network to produce an optimal solution to an NP-complete problem if non-polynomial time is allowed? This also remains open for furtherstudy. For the polynomial-complexity shortest path problem, Section 4.3.3 has shownthat with our definition of unit function and network energy, the energy function is adecreasing concave function with only one minimum. This allows the inference net-work to produce the optimal solution to the shortest path problem in asynchronous orcontinuous-time domains. For an NP-complete problem, the shape of the search space isso complicated that it is very hard to define the unit functions of the inference networkso that the network energy has only one minimum.The inference network can be applied to dynamic programming, often a process ofsequential decision making. Computing shortest path is a typical problem that can besolved by dynamic programming [11]. Since the shortest path problem can be solvedefficiently on the inference network, this opens up the possibility of applying the non-sequential inference network to some dynamic programming problems. However, whetherand how a specific problem can be solved in a non-sequential way requires further study.Some future work which may be done to the inference network includes:• Use the inference network platform for logical analysis that can be representedby an AND/OR graph. Chapter 1 discussed the relation between the inferencenetwork topology and an AND/OR graph. In artificial intelligence, many logicalChapter 7. Conclusions^ 111reasoning problems can be represented by an AND/OR graph. Therefore, if thereasoning procedures can be parallelized, it is possible to use the inference networkto carry out the analysis.• Derive other optimization applications for the inference network, for example:— Weapon-Target Association: The costs of using any of N weapons for any ofN targets are given. The objective is to minimize the total cost of associatingthe weapons to the targets. The problem is similar to the assignment problemexcept that an advanced weapon may be able to kill several targets and aparticularly dangerous threat may require targeting by several weapons;— Minimum Spanning Tree: It is a fundamental problem in artificial intelligenceand intimately related with the assignment and traveling salesman problems;• Map the inference network onto a 3-dimensional (instead of current 2-dimensional)processor array in the Connection Machine, where the third dimension correspondsto the sites at each unit. This will increase the number of virtual processors re-quired, however, site calculation will be parallelized. Refer Section 3.3;• Solve the traveling salesman problem by adding additional constraints to the as-signment algorithm to eliminate sub-tours, refer Xu [92];• Design the general or special purpose inference network using VLSI circuits andtest its performance;• Relate the algorithms to practical applications in robotics, image processing, VLSIlayout, and control etc.Bibliography[1] A.V. Aho, J.E. Hoperoft, and J.D. Ullman, The Design and Analysis of ComputerAlgorithms, Addison-Wesley Pub. Co., 1974[2] S.V.B. Aiyer, M. Niranjan, and F. Fallside, 'A Theoretical Investigation Into thePerformance of the Hopfield Model', IEEE Trans. on Neural Networks, Vol. 1(2),pp. 204-215, 1990.[3] H.M. Alnuweiri, 'Constant-Time Parallel Algorithms for Image labeling on a Re-configurable Network of Processors', Technical report CICSR-TR92-004, Dept. ofElectrical Engineering, Univ. of British Columnbia[4] D.H. Ballard and C.M. Brown, Computer Vision, Prentice-Hall, Englewood Cliffs,NJ, 1982[5] D.P. Bersekas, 'The Auction Algorithm: A Distributed Relaxation Method for theAssignment Problem', Annals of Operations Research, vol. 14, pp. 105-123, 1988[6] G. Broome11 and J. Robert Heath, 'Classification Categories and Historical Devel-opment of Circuit Switching Topologies' Computing Surveys, 15(2) 95-133, June1983[7] W.K. Chen, Theory of Nets: Flows in Networks, John Wiley & Sons, Inc. 1990[8] Cormen, Leiserson and Rivest, Introduction to Algorithms, McGraw-Hill, 1990[9] G.B. Dantzig, 'All Shortest Routes in a Graph', Operations Research House, StanfordUniv. Tech. Rep., pp. 66-3, (1966)[10] E.W. Dijkstra, 'A Note on Two Problems in Connexion with Graphs', Numer. Math.,1, (1959) pp. 269-271[11] S.E. Dreyfus and A.M. Law, The Art and Theory of Dynamic Programming, Aca-demic Press, Inc., 1977[12] A. El-Amawy, 'A Systolic Architecture for Fast Dense Matrix Inversion', in IEEEComputer, Vol. 38, No. 3, 1989[13] B.A. Farley, A.H. Land and J.D. Murechland, 'The cascade algorithm for finding allshortest distances in a directed graph', Management Sci., 14, pp. 19-28 (1967)112Bibliography^ 113[14] J.A. Feldman and D.H. Ballard, `Connectionist Models and Their Properties', Cog-nitive Science, Vol. 6, pp. 205-254, 1982[15] J.A. Feldman, M.A. Fanty, N.H. Goddard, and K. Lynne, 'Manual of RochesterConnectionist Simulator', 1986.[16] J.A. Feldman, M.A. Fanty, N.H. Goddard, and K. Lynne, 'Computing with Struc-tured Connectionist Networks', IEEE Trans. Computer, pp. 91-102, March 1988.[17] R.W. Floyd, 'Algorithm 97, Shortest Path', Comm. ACM, Vol. 5 (1962), pp. 345[18] S. Fortune and J. Wyllie, 'Parallelism in Random Access Machines', Proc. of 10thAnnual ACM Symp. on Theory of Computing, pp. 114-118, Mar. 1978[19] T. Fritsch and W. Mandel, 'Communication Network Routing using Neural Nets -Numerical Aspects and alternative Approaches', in IEEE, 1991[20] Tutorial text book, The 2nd Int'l Conf. on Fuzzy Logic and Neural Networks, July17-22, 1992, IIZuka, Japan[21] M.R.Garey and D.S. Johnson, Computers and Intractability: A guide to the Theoryof NP-Completeness, W.H. Freeman and Company, 1979[22] N.H. Goddard, M.A. Fanty, and K, Lynne, The Rochester Simulator, Tech. Rep.233, Computer Science Dept., Univ. of Rochester, 1987.[23] L.J. Guibas, H.T.Kung, and C.D. Thompson, 'Direct VLSI Implementation of Com-binational Algorithms', in Proc. Caltech Conference on VLSI, California Instituteof Technology, Pasadena, 1979, pp. 509-525[24] S. Hammering, 'A Note on Modifications to the Given's Plane Rotation', in J. Inst.Math. Appl. Vol. 13, pp. 215-218, 1974[25] R. Helton, 'A self-adaptive connectionist shortest-path algorithm utilizating relax-ation methods', Proc. Int'l Joint Conf. on Neural Networks, vol. 3, pp. 895-901,1991[26] W.D. Hillis, The Connection Machine, The MIT Press, 1985[27] , `Connectionist Learning Procedures', in Artificial Intelligence, Vol. 40, pp. 185-234,1989[28] J.J. Hopfield, 'Neural Networks and Physical Systems with Emergent CollectiveComputational Abilities', Proc. Nat'l Acad. Sci. USA, Vol. 79, pp. 2554, 1982.Bibliography^ 114[29] J.J. Hopfield, 'Neurons with Graded Response Have Collective Computational Prop-erties Like Those of a Two-State Neurons', Proc. Nat'l Acad. Sci. USA, Vol. 81, pp.3088, 1984.[30] J.J. Hopfield, and D.W. Tank, 'Neural Computation of Decisions in OptimizationProblems', Biolog. Cybern., Vol. 52, pp. 1-25, 1985.[31] J.J. Hopfield, and D.W. Tank, 'Computing with Neural Circuits: A Model', Science,Vol. 233, pp. 625-632, 1986.[32] T.0 Hu, 'Revised matrix algorithms for shortest paths', SIAM J. Applied Math., 15(1967), pp. 207-218[33] Y.N. Huang, J.P. Cheiney, 'Parallel computation of direct transitive closures', Proc.of 7th Intl Conf. on Data Engineering, pp. 192-199, 1991[34] G-H Kim, H-Y Hwang, and C-H Lee, 'A modified Hopfield network and its applica-tion to the layer assignment', Trans. of the Korean Institute of Electrical Engineers,vol. 40, Iss. 2, pp. 234-237, 1991, in Korean[35] S. Kirkpatrick, C.D. Gelatt, Jr., and M.P. Vecchi, 'Optimization by Simulated An-nealing', Science, 220, pp. 671-680, 1983[36] H. Kita, 0. Sekine and Y. Nishikawa, 'A design Principle of the Analog NeuralNetwork for Combinatorial Optimization: Findings from a Study on the PlacementProblem', poster presentation during the Int'l Joint Conf. of Neural Network, July,1991, Seattle.[37] J.J. Kosowsky and A.L. Yuille, 'Solving the Assignment Problem with StatisticalPhysics', Int'l Joint Conf. on Neural Networks, vol. 1, pp. 159-164, July 1991, Seattle[38] H.W. Kuhn, 'The Hungarian Method for the Assignment Problem', Naval ResearchLogistics Quarterly, 1955[39] S.Y. Kung, and S.C. Lo, 'A spiral systolic architecture /algorithm for transitiveclosure problems', IEEE Int'l Conf. on Computer Design, ICCD'85, New York, USA,pp. 622-626[40] S.Y. Kung, P.S. Lewis, and S.C. Lo, 'On Optimally Mapping Algorithms to SystolicArrays with Application to the Transitive Closure Problem', IEEE Intl. Symposiumon Circuits and Systems, vol. 3, 1986[41] K.P. Lam, 'Using Hybrid Techniques for a Continuous-Time Inference Network',Proc. of the IEEE Symposium on Circuits and Systems, pp. 1721-1724, 1991.Bibliography^ 115[42] K.P. Lam, 'A Continuous-Time Inference Network for Minimum-Cost Path Prob-lems', Int'l Joint Conf. on Neural Networks, July 1991, Seattle, pp. 367-372[43] K.P. Lam, and C.J. Su, 'A Binary Relation Inference Network', Proc. of the FifthInt'l Parallel Processing Symposium, Anaheim CA, April 30 - May 2, 1991, IEEEPress, pp. 250-255[44] K.P. Lam, 'Time-range Approximate Reasoning for Microprocessor Systems Design',CICSR Technical Report TR 90-4, The University of British Columnbia, Feb. 1990,33 pages, accepted for publication, Proc. of IEE, Part E, 1992[45] K.P. Lam and C.J. Su, 'Inference Network for Optimization Applications', Proc. ofthe second Int'l Conf. on Fuzzy logic and Neural Networks, July 1992, pp. 189-192,Iizuka,, Japan[46] H. Lang, 'Transitive Closure on an Instruction Systolic Array', Systolic Array Pro-cessors, Proc. of Int'l Conference on Systolic Arrays, 1988[47] Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart, andWinston, 1976[48] K.C. Lee, Y. Takefuji, and N. Funabiki, 'A Maximum Neural Network for the MaxCut Problem', Intl Joint Conf. on Neural Networks, July 1991, Seattle, pp. 379-384[49] K.C. Lee, N. Funabiki and Y. Takefuji, 'A parallel Improvement Algorithm for theBipartite Subgraph Problem', IEEE Trans. on Neural Networks, vol. 3, No. 1, 1992[50] F.T. Leighton, Introduction to Parallel Algorithms and Architectures, Morgan-Kanfman, 1992[51] T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, John Wiley& and Sons Ltd, 1990[52] P.S. Lewis and S. Y. Kung, 'Dependence Graph Based Design of Systolic Arrays forthe Algebraic Path Problem', Proc. of Ann. Asilomar Conference on Signal, Systemand Computers, pp. 13-18, 1986[53] V. Li, Knowledge Representation and Problem Solving for an Intelligent Tutor Ad-visor, M.A.Sc. thesis, Univ. British Columbia, 1990.[54] W-M. Lin and V.K. Prasanna, 'Parallel Algorithms and Architectures for DiscreteRelaxation Technique', in IRIS No. 274, Institute for Robotics and Intelligent Sys-tem, Univ. of Southern California, Los Angeles, CaliforniaBibliography^ 116[55] A.K. Mackworth, 'Consistency in Network Relations', Artificial Intelligence, 8, pp.99-118, 1977[56] A.K. Mackworth, 'Constraint Satisfaction', Encyclopedia of Artificial Intelligence,Ed. by J. Shapiro, Wiley Publishers, vol. 1, pp. 205-211, 1987[57] B. Marh, `Semiring and Transitive Closure', in Technische Universitlit Berlin, Fach-bereich 20, report 82-5, 1982[58] B. Marh, 'Iteration and Summability in Semirings', in Algebraic and CombinationalMethods in Operations Research (Proc. of the Workshop on Algebraic Structure inOperations Research) Ed. by R.E. Burkard, R.A. Cuninghame-Green, and U. Zim-mermann. Ann. Discrete Math. vol. 19, pp. 229-256, 1984[59] D. Marr, Vision, W.H. Freeman, San Francisco, 1982[60] J.S.B. Mitchell and D.M.Keirsey, 'Planning Strategic Paths thought Variable TerrainData', SPIE applications of artificial Intelligence, Vol. 485, pp. 172-179, 1984[61] C. Mead, Analog VLSI and Neural System, Addison-Wesley, 1989[62] J.J. Modi, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford,1988[63] A. Moffat and T. Takaoka, 'An all pairs shortest path algorithm with expected time0(n2 logn)' , SIAM J. Computing, 6, pp. 1023-1031, 1987[64] N.J. Nilsson, Principles of Artificial Intelligence, Spinger-Verlag, 1980[65] F.J. Niiiiez and M.Valero, 'A Block Algorithm For the Algebraic Path ProblemAnd Its Execution On A Systolic Array', Systolic Array Processors, Proc. of Int'lConference on Systolic Arrays, pp. 265-274, 1988[66] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms andComplexity, Prentice Hall, Inc., 1982[67] J. Pearl, Heuristic: intelligent search strategies for computer problem solving,Addison-Wesley Pub. Co., 1984[68] William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling,Numerical Recipes, Cambridge Univ. Press, 1986[69] M.J. Quinn, 'Analysis and Implementation of Branch-and-Bound Algorithms on aHypercube Multicomputer', IEEE trans. on Computers, Vol. 39 No. 3, 1990Bibliography^ 117[70]Y. Robert and D. Trystram: 'Systolic Solution of the Algebraic Path Problem', inSystolic Arrays: Papers Presented at the First Int'l Workshop on Systolic Arrays,Ed. W. Moore, A. McCabe and R. Urquhart, 1986[71] G. Rote, 'A Systolic Array Algorithm for the Algebraic Path Problem (ShortestPaths; Matrix Inversion)', in Computing Vol. 34, pp. 191-219[72]L. Shastri, Sementic Nets:Evidential Formalization and Its Connectionist Realiza-tion, Morgan-Kaufman, Los Altos/Pitman Pub., London, Feb, 1988[73]P.M. Spira, 'A New Algorithm for Finding All Shortest Paths in a Graph of PositiveArcs in Average Time O(n2log2n)),' SIAM J. Comput. 2 (1973), 28-32[74]K.B. Sriram and L.M. Patnaik, 'Neural Network Approach for the Two-DimensionalAssignment Problem', Electronics Letters, vol. 26, No. 12, 1990, pp. 809[75]H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Pub. Co.,1987[76]C.J. Su and K.P. Lam, 'Implementation of All-Pair Shortest Path Algorithms with aBinary Relation Inference Network', 10 pages, submitted for publication, February1991[77]C.J. Su and K.P. Lam, 'Evaluation of an Inference Network Algorithm for the Short-est Path Problem', 15 pages, submitted for publication, May. 1992[78]C.J. Su and K.P. Lam, 'A Wavefront Array for the Algebraic Path Problem', 16pages, submitted for publication, June. 1991[79]C.J. Su and K.P. Lam, 'Systolic Mapping of the Inference Network for Shortest PathProblems', in Proc. of hit 'l Joint Conf. of Neural Network, Singapore, Nov., 1991,pp. 917-922[80]C.J. Su and K.P. Lam, 'Dense Matrix Inversion on Connection Machine', 18 pages,to be submitted.[81] Y. Tabourier, 'All shortest distances in a graph. An Improvement to Dantzig's In-ductive Algorithm', Discrete Math., 4 pp. 83-87 1973[82) R. Tamassia, J.S. Vitter, 'Optimal parallel algorithms for transitive closure and pointlocation in planar structures', Proc. of ACM Symposium on Parallel Algorithms andArchitectures, pp. 399-408, 1989[83] S.C.A. Thomopoulos, L. Zhang, and C.D. Wann, 'Neural Network lnplementationof the Shortest Path Algorithm for Traffic Routing in Communication Networks', inProc. of Int'l Joint Conf. on Neural Networks, vol. 2, pp. 591, 1989Bibliography^ 118[84] A.A. Toptsis, 'Parallel transitive closure computation in highly scalable multipro-cessors', Proc. of Intl Conf., Advances in Computing and Information, pp. 197-206,1991[85] Technical Summary or User's Guide of the Connection Machine from the ThinkingMachine Corporation, 1991[86] E. Wacholder, 'A Neural Network Based Optimization Algorithm for the StaticWeapon-Target Assignment Problem', ORSA Journal of Computing, Vol. 1, No. 4,1989, pp.232[87] D.L. Waltz, 'Waltz Filtering', Encyclopedia of Artificial Intelligence, Ed. by J.Shapiro, Wiley Publishers, vol. 2, pp. 1162-1165, 1987[88] B.F. Wang, and G.H. Chen, 'Constant Time Algorithms for the Transitive Closureand Some Related Graph Problems on Processor Array with Reconfigurable BusSystems', IEEE Trans. on Parallel and Distributed Systems, TPDS-1, 4, 1990, pp.500-507[89] J.M. Wein, and S. Zenios, 'Massively Parallel Auction Algorithms for AssignmentProblem', 1990, Third Symposium On the Frontier of Massively Parallel Computa-tion, IEEE Computer Society Press, pp. 90-99[90] G.V. Wilson and G.S. Pawley, 'On the Stability of the Travelling Salesman ProblemAlgorithm of Hopfield and Tank', Biological Cybernetics, vol. 58, pp. 62-70, 1988[91] Kenny Wu, 'Analog Implementation of a Continuous-Time Inference Network for5 cities Shortest Path Problems', Elec 475 project report, University of BritishColumbia, April, 1991[92] X. Xu and W.T. Tsai, 'Effective Neural Algorithm for the Traveling Salesman Prob-lem', Neural Networks, Vol. 4, pp. 193-205, 1991[93] L. Yang and W.K. Chen, 'An extension of the revised matrix algorithm', IEEE Proc.Int. Symp. on Circuits and Systems, pp. 1996-1999 (1989)[94] J.Y. Yen, 'Finding the Lengths of All Shortest Paths in N- Node Nonnegative-Distance Complete Networks Using 1N 3 Additions and N3 Comparisons', J. Assoc.Comput. Mach, 19 (1972), pp. 423-424[95] U. Zimmermann, 'Linear and combinational optimization in ordered algebraic struc-tures', (Chapter 8), in Discrete Math. Vol. 10, 1981Appendix AData Consistency Checking Using the Inference NetworkThe inference network can be used to check consistency of a set of relations, derive binaryrelations from a consistent relation set, infer binary relations from an inconsistent relationset based on the reliability of the given relations. We will consider symmetric relations— R(i, j) = R(j, i) V i , j — since these algorithms are mainly used in time or locationreferencing applications. Because of the symmetry of the binary relations, the inferencenetwork has N(N — 1)/2 units.For N nodes, there are up to N *(N — 1) binary relations in existence representing therelations between any two of them. The number of independent binary relations requiredto uniquely define all the binary relations in the network is (N — 1). Given some of thebinary relations, the number of iterations required for the inference network to obtainall the remaining ones is determined by the structure and consistency of the relations.A set of binary relations can be kept in the inference network, unit (i, j) for relationR(i, j). A set of relation is consistent if the inference network representing the relationsis consistent; a network is consistent if for all the units, all site outputs for its differentsites are identical to the unit output.Given inference rules, The consistency of the relations can be checked. Consistencymay be declared only when all the units and sites are defined. In checking the consistency,119Appendix A. Data Consistency Checking Using the Inference Network ^120unit output is evaluated according to:Previous unit output if unit output is defined andall site outputs are identical to it.Unit output = Site output^if unit output is undefined and^(A.89)site outputs are all identical.Inconsistent Message all other cases.A site output is defined as the deduction of the two inputs from it's incoming links. Ifno Inconsistent Message is obtained after [log2 (N — 1)1 iterations, the relation set isconsistent. Here N is the number of nodes involved in the relation set.When the given relations are inconsistent, at some units, sooner or later, the outputsfrom its (N — 2) sites will be different. The network must have a conflict resolutionmechanism in order tb proceed further inference. There are various possibilities on conflictresolution, among which some reasonable ones are:• using an average operator(minimizing high frequency noise),• using the max operator(corresponding to ORing the inference results),• using the min operator(corresponding to ANDing the inference results),• using a weighted average operator.Given initial relations and their corresponding certainty factor (reliability, hardnessof the constraint), an inference network can1. arrive at a stable consistent state, with each unit output being identical to all itssite outputs;2. arrive and stay at a metastable inconsistent state, with at least one unit outputdifferent from some of its site outputs;Appendix A. Data Consistency Checking Using the Inference Network^1213. become unstable with its degree of inconsistency growing bigger and bigger whenthe conflict resolution schemes are used.As an example, we define site and unit functions as:(k+1)^()^Siii^= Utik^(k)^ (A.90)and= (1 — rii)* atr rij *ur)^(A.91)in which^(k) _ ^1^N-1^(k)avi — ,9 E (A.92),where rii is the certainty factor of relation R(i, j) (represented by ui1). If R(i,j) is fullyconstrained, r,i = 1. Otherwise, ri; varies within [0,1).A global energy function can be defined to measure the network consistency. For theinference network involving N nodes (N > 3), unit (i,j) (0 < j < i < N — 1) is one ofthe C(N, 2) units in the network and site 1 (0 </Oij<N-1) is one of the (N — 2)sites attached to unit (i,j), the energy function is defined as:N-1 i-1 N-1^E (k) = E E E (s!'",-")— un 2^(A.93)1=1 j=0 /=0,10i0jwhere E(k) is the network's global energy after k iterations, sr) is the output of site 1at unit (i, j) after (k + 1) iterations, 49 is the output of unit (i,j). The value of srl)is the deduction from te and 49. The value of 47) is the result of conflict resolutionoperated on its site outputs.Series {E} shows the state of network:1. The network converges, if {E} decreases monotonically;2. The network diverges, if {E} increases monotonically;Appendix A. Data Consistency Checking Using the Inference Network ^1223. The network converges to a metastable inconsistent state, if {E} converges to anon-zero constant;4. The network converges to a stable consistent state, if {E} converges to zero.When units are updated asynchronously, the network always converges. Given a singlechange at any unit (x, y) in a stabilized network, using the definitions (A.93 - A.92), itcan be derived thatAE(k) = E(k) — Elk-i) —3 * (N — 2) * (1 — rx2v )* (avikv) — ti(kv-1))2 < 0 (A.94)holds only when (1) rzy = 1, i.e. R(x,y) is fully constrained or (2) aviky) = ulky-1), i.e.no change.For synchronous updating, after the k-th iteration, the change of network energy canbe derived as:N-11-1^N-11-1 j-1AE(k) = E E AE1,9 + 6 * E EE(4,)60) + (4;04:0 + ce)40)i=1 j=0^1=2 j=11=0N-1 i-1E E —3 * (N — 2) * (1 — 71) * (a4k.) _ 4-0)21=1 i=oN-1 1-1+6* E EE(49d!iik) ctiiodfk.) dfs. )49)1=2 j=1 1=0AE < 0 does not always hold for general synchronous updating. AE can be greaterthan 0, which means for some given combinations of initial relations and certainty factors,the network may diverge.(A.95)Appendix BInference Network Algorithm for Matrix InversionThe inference network algorithm for matrix inversion is developed based on Grout's algo-rithm [68]. The processor array is composed of N x N PEs connected in a two dimensionalgrid, as shown in Figure 3.11. There are three types of PEs in the array. Type A, typeB, and type C refer to PEs whose coordinates satisfy i > j, i = j and i < j respectively.The matrix elements are copied to the processing elements in a pre-designed way so thatwhen data are propagated in the array, the effect are just like the data organization inFigure B.23. It takes 5N — 4 time units to complete the inversion. Each processingelement requires a local memory 'c' to calculate a summation and a control counter l'to trigger and complete the calculation and propagation. Initially, the control counterfor PE [i, j] is set to k := —i — j. Algorithm 5 gives the detail. Result of the inversionis kept in c[i, j] after 5N — 4 time units. For more in-depth discussion about an N x Nsystolic array for matrix inversion, see author's paper [80, 78].Algorithm 5 : Matrix Inversion on the CM:BEGIN PARALLEL for all i,j (initialization)h[i, (—i — j) (mod N)]:. v[(—i — j) (mod N), A:. ai;k[i, j] := —i — j^c[i, j] := 0END PARALLELFOR count =1 TO count = 5N — 4 DOBEGIN PARALLEL for all i,jh[i,j]:= h[i, (j —1) (mod N)]^v[i,j]:= v[(i — 1) (mod N), j]123Appendix B. Inference Network Algorithm for Matrix Inversion^124lc[i, j] := k[i, + 1IF (0 < k[i, j] < min{i, j})c[i, j] = c[i, j]^h[i, j] * v[i, j]ELSE IF (k[i, j] =^j})IF (type = A)^h[i, j] := c[i, j] := (h[i, j] — c[i,ELSE IF (type = B) h[i, j]:= v[i, := c[i, j]:= v[i, j] — c[i, j]ELSE^v[i, := c[i, j] := v[i, j) — c[i, j]ELSE IF (k[i, = max{i, j})IF (type = A)^v[i, j] := c[i, j]ELSE IF (type = C) h[i, j] := c[i, j]ELSE IF (k[i, j] = N + min{i, j})IF (type y B)^j] := v[i,j] := c[i, j] := 11v[i, j]ELSE IF (type = C) c[i, j] := h[i, j] * v[i, j]ELSE IF (N + min{i, j} < k[i, j] < N + max{i, j})c[i, j]:= c[i,^h[i, j]* v[i,j]ELSE IF (k[i, j] = N + max{i, j})IF (type = A)^v[i, j] := c[i, j] := —c[i.j]ELSE IF (type = C)^c[i,j]:=^v[i,ELSE IF (k[i, j] = 2N + min{i,j})IF (type = A)^h[ij] := c[i, j]ELSE IF (type = C) v[i, j] := c[i, j]ELSE IF (k[i, j] = 2N + max{i, j})IF (type = A)^c[i, j] := h[i, j]* v[i, j]ELSE IF (2N + max{i, j}) < k[i, j] < 3N)c[i , j] = c[i , j]^hEi , ji * v[i,j]END IFAppendix B. Inference Network Algorithm for Matrix Inversion^125END PARALLELFigure B.23: Data flow in calculating A -1 . Each parallelogram contains data for onephase. * is an element in the original matrix; * is an element in L or U; o is an elementin L-1 or U-1.Appendix CA Weightless Neural Network for Consistent Labeling ProblemsThe objective of a consistency labeling problem is to label a set of N objects using a set ofM labels which satisfy some compatibility constraints. Only unary or binary constraintsare considered here. A unary constraint prohibits an object to have a certain label. Abinary constraint restricts the combination of the labels for two objects. A weightlessneural network is proposed here to eliminate all inconsistent labels for each object basedon unary and binary constraints.The inconsistency elimination algorithm is based on the sequential algorithm dis-cussed in [54]. Unary constraints are used to determine candidate labels for each object.Binary constraints are used to set up initial restriction of labels between object pairs.These restrictions are passed to other related object pairs and subsequently reduce can-didate labels for objects.The proposed neural network consists of 5i-IP" clusters of neurons, denoted as G(i, j)(1 < j < i < N). Each cluster G(i, j) with i > j detects inconsistent labels betweenobject i and j. Initial unary and binary relations are used to set up initial states of thenuerons in cluster G(i, i) and (G(i, j) (for all i and j) respectively. If any inconsistentlabel is found for object i (or j) in cluster G(i, j), a signal is sent to cluster G(i, i)(or G(j, j)). When cluster G(i, i) receives the signal indicating an inconsistent label forobject i, it activates G(i, j) (j = 1,2, , i — 1) and G(j,i) (j = i 1,i + 2, , N) toeliminate the inconsistent label from the candidate labels for object i.There are M independent neurons in cluster G(i, i). The initial state of neuron m126Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^127(1 < m < M) is determined by the unary constraints for object i. If 1„, is a unary-consistent label for object i, neuron m in cluster G(i,i) is initially set 'on'. Neuron mwill turn off if label 1„, is later found inconsistent for object i. If all neurons in a clusterG (i , i) are off, the labeling process will be terminated since there is no consistent labelfor object i. When all the inconsistent labels are eliminated, if neuron m in cluster G(i, i)is on, object i can be labeled lm .Each neuron cluster G(i, j) (i > j) consists of two layers of neurons. The lower layerhas M x M neurons. Neuron (p, q) (1 < p, q < M) has two inputs from neuron p incluster G(i, i) and neuron q in cluster G(j, j). Neuron (p, q) is initially off if simultaneouslabeling of 11, for object i and /9 for object j is prohibited by a binary constraint. Afterthe network starts to update, neuron (p, q) remains 'on' if and only if both neuron p incluster G(i, i) and neuron q in cluster G(j, j) are on. The upper layer has 2M neurons,each has M inputs from a row (or a column) of neurons in the lower layer and an outputto a neuron in cluster G(i , i) (or G(j, j)). If all neurons in a row (or a column) in thelower layer are off, the corresponding neuron in the upper layer will be turned off, and asignal will be sent to cluster G(i, i) (or G(j, j)) indicating an inconsistent label for objecti (or j).The network will settle down when all inconsistent labels are eliminated. No oscil-lation can occur since neurons can only switch from 'on' to 'off' after initialization, orremain in an 'on' state or an 'off' state. The correctness and completeness of the elim-ination process is ensured by the corresponding sequential algorithm [541. Links in thenetwork only transmit an on/off state, therefore, all link weights can be set to one.Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^128Figure C.24: Interconnected clusters form a weightless neural network for consistencylabeling problems.Figure C.25: A diagonal cluster G(i, i) in the weightless neural network.Group G(i,j)-i. > jWn nrt-i^p,q Mneuron(PA)^RVpCVqG(j,j)Appendix C. A Weightless Neural Network for Consistent Labeling Problems ^129Figure C.26: A non-diagonal cluster G(i, j) (i > j) in the weightless neural network.Appendix DAlgorithms Related with the Shortest Path ProblemD.1 Mapping the Shortest Path Algorithm onto the Connection MachineChapter 3 has shown that the Connection Machine is an efficient platform availableto map the inference network algorithm on to. After initializing the processor arrayaccording to Eq.(4.13), the shortest paths can be obtained in at most {log2 (N — 1)1iterations, or, Nilog 2 (N — 1)1 time units. As discussed in Section 3.3, each iterationconsists of updating the data flows, evaluating site values, and calculating unit values.Mapped onto the CM, INFERENCE becomes:Algorithm 6 : Shortest Path on the CM (to be referred to as INFERENCE-CM)for all PE, u[i,j]:= u(°)(i,j)count := flog2 (N — 1)1while (count > 0)BEGINBEGIN PARALLEL for all i, jh[i, (—i — j)mod(N)J := u[i,j], v[(—i — j)mod(N), = u[i, j]yold[i,j]:= u[i,j]FORk=0 TOk=N-1BEGINupdate: u[i, j] := min fu[i , j], h[i, j] v[i, j]}propagate: h[i,j]:= h[i, (j — 1)mod(N)J v[i,j]:= v[(i —1)mod(N),A130Appendix D. Algorithms Related with the Shortest Path Problem^131ENDEND PARALLELif Vi,j u[i,j] = u old[i,j], break;count := count —1ENDIt is not difficult to prove the equivalence of INFERENCE-CM and INFERENCE bycomparing the corresponding unit values after each iterations.D.2 Unifying Some Established Algorithms Using an Inference NetworkAll-pair shortest path algorithms can be implemented by defining proper unit and sitefunctions in an inference network. In the following, we shall describe several algorithmsand derive their inference network implementation for finding all shortest paths in ageneral directed graph.D.2.1 Floyd's AlgorithmFloyd's [17, 11] algorithm (FLOYD) constructs optimal paths by inserting nodes whenappropriate, into more direct paths. After k iterations, function fk(i, j) evaluates to theoptimal path length between i and j, where the intermediate nodes belong to the set ofnodes {1, 2, ... , k}. Its recurrent relation is=^ fk_1(k,j)}^(D.96)with boundary condition fo(i, j) = clii. The total number of basic operations required isAppendix D. Algorithms Related with the Shortest Path Problem^132Since FLOYD requires fk (i, i) to detect negative cycles, its inference network imple-mentation needs N2 units. The unit and site functions are defined as:= min {u(k-1) (i, j), 3(k)^k)}^(D.97)u k-1 (i, k) u (-i)(k, j )=^)^k^ (D.98)Initial unit values are u(°)(i, j) = dij. When there is no negative loop, the network settlesdown after k = N, and u(N) (i,j) gives the shortest path length from node i to node j,and diagonal units (1, i) have zero values. In the presence of negative loops, there will beat least one 'diagonal' unit having a negative value. Iteration has to stop after exactly Nsteps, in order to obtain shortest paths which do not traverse the same arc for more thanonce. Compared with INFERENCE, this algorithm only needs to update one specific site(indexed by k) at each unit for each iteration; but it takes N instead of n < Fog2 (N — 1)1iterations to find all the shortest paths.D.2.2 Matrix AlgorithmsMatrix algorithm (MATRIX) defines a matrix binary operation 0 called minaddition:W = A B = [wii = min(aik bki)] where A = [aid], B = [kJ]^(D.99)If DI = D is the length matrix [4], then the elements of DN gives the shortest pathlengths, where DN is the minaddition of N D's:DN =D0D0D0•0D0D^(D.100)This algorithm is valid when there are no negative loops in the network. Minaddition isassociative, for example,D4 =D 0D0 D0D = D2 D2^(D.101)Appendix D. Algorithms Related with the Shortest Path Problem ^133Eq.(4.14-4.15) uses this property to make the inference network converge in no more than(7og2 (N —1)1 iterations. In fact, the INFERENCE algorithm is intimately related with theMATRIX algorithm, although the former is derived directly upon an inference networkplatform. It is not difficult to observe that, if we define another matrix binary operation1 asW = AIB = [wij = minfaii,ba where A =^B = [bii]^(D.102)then Eq.(4.14-4.15) can effectively be represented asD(k) D(k-1)1(D(k-1) D(k-i))^(D.103)Revised matrix algorithm (RMATRIX) updates matrix elements asynchronously toavoid unnecessary operations. Two distinct processes are defined: (i) a forward process^{1 j<1^{1 i<ip =N1 mm Ice d(?) } where p =^q ='^<l< ^ij^f j>1^f i>1and (ii) a backward process(D.104)db) = min le + di;) }1<l<N where p =If 1<jb 1>j q =(D.105)[93] shows that the shortest paths can be obtained by one forward process followed by onbackward process, one backward process followed by one forward process, three forwardprocesses, or three backward processes. So the total number of basic operations requiredis 4N(N — 1)(N — 2) or 6N(N — 1)(N — 2).The inference network for RMATRIX contains N(N —1) units (diagonal units are notnecessary). The matrix elements are updated in an order shown in Figure D.27. Sitefunction for both forward and backward processes are the same:8(k)(i , 0 = u(k-1)(i , 0 + u(k-1)(i, j)^( D.106)Appendix D. Algorithms Related with the Shortest Path Problem ^13412471 3582 3 64 5 6 107 8 9 1010 9 8 710 6 9 632853 17421Figure D.27: The order of updating length matrix in a forward (left) and backward(right) processUnit functions for forward process is:minfu(k-00,i),^min^.{s(k)(i,j,1)}} when k = f(i,j)u(k)(i,i)=^l< 1 < N,10 id 03j)otherwise(D.107)wherey(i,i)= max{i,j}(max{i,j} — 1) + miniim2Unit functions for backward process is:(D.108)1U(k) (i, j) =wheremin{u(k-1)(i,j),^min^.{s(k)(i,j,l)}} when k = b(i,j)1<l<N,i0i,10)u(k--1)(i , j) otherwise(D.109)max{N — i,N — j}(max{N — i,N — j} +1)^b(i,j) —^ 1-min{N—i,N—j}1-1 (D.110)2The network updates one or two units at a time. To obtain all the shortest paths, theunits of inference network have to be updated in a specific order for N(N-1) (one forwardand one backward process) or IN(N — 1) iterations (three forward or three backward).Appendix D. Algorithms Related with the Shortest Path Problem^135D.2.3 Dantzig's AlgorithmDantzig's algorithm [9] (DANTZIG) constructs shortest paths using arcs whose nodesnumbered between 1 and k for the k-th iteration. gk(i,j) (i < k,j < k), the shortestpath length from i to j in the k-th iteration, is obtained from gk-i(i,j), dik and dki. [11]gives the algorithm in the recurrent form:gk(i,k) =^min^fgk_1 (i,1)-1- go(1,k)} (1 = 1,2,...,k -1),^(D.111)l<1<k-19k(k,j) = l< 1<kmin -1 {go(k, 0 + gk_i (I,j)} (j = 1, 2, ... , k - 1),^(D.112)gk(k, k) = min{0, 1 < rirtnk _ l igk(k,l) + gk (1,k)}}^(D.113)=^ gk(k,j)} (i,j = 1,2,...,k -1) (D.114)where go(i, j) =^for all i and j. The total number of basic operations required is2N2(N - 1).The structure of inference network for DANTZIG is the same as that for FLOYD, whichhas N2 units. The unit function for DANTZIG isu(2k-1-1) (i , j) =u(2k+2)(i , j) =1 minfoo(i,u(2k) A i < mink -1 {3(2k+00, j, l)}} j < i = k or i < j = kj)^ otherwise(D.115)min {u(2k-1-1)(i, j), s(2k+2)(i, ^k)}^i < k and j < kmin{°' 1 < Tinnk - 1 1.5(2k+2)(i' j' 1)}}^= j = k^(D.116){^j)^ otherwisewhere 2 < k < N, u(4)(i,j) = dii , and s(r)(i,j, = + ti(''-1)(/, j) V 5 < r <2N + 2 In the (2k +1)-th iteration, 2(k -1) units are updated, k - 1 sites at each unit areinvolved in the minimization; in the (2k + 2)-th iteration, a total number of (k -1)2 +1units with a total of k(k - 1) sites are minimized. The total number of synchronousiterations required is 2(N - 1).Appendix D. Algorithms Related with the Shortest Path Problem^136Modified Dantzig's algorithm [81] (MDANTZIG) avoids unnecessary operations byfurther decomposing the first two recurrent relations into k — 1 recurrent steps. Define22(i, k) = g(k, j) = oo, gk(i, k) = (i, k) and gk(k, j) = -1 (k, j) can be obtainedfrom:k)^ V1 < i < k —1 dik > gt-1 (1, k)k) =^^ (D.117)k), gk_1 (i 1) + dik} V1 i < k — 1 dik < 4-1 (1, k){di-1 (k, j) Vi < j < k — 1 dk, ? gt-1 (k, 1)min {glk-1 (k, j), dki + gk_1(1, j)} V1 <j<k-1 dm<gik-1 (k, 1)(D.118)A significant amount of operations can be saved when dik > glk-1 (/, k) or dk, > gi-1 (k, 1)is satisfied.Corresponding to the (2k + 1)-th iteration in DANTZIG, the inference network imple-mentation for MDANTZIG requires k — 1 iterations. In each of these iterations, 2(k — 1)units are updated; and at each of these units, only one site is activated. So MDANTZIGrequires a total number of 2(N + 2)(N — 1) iterations.glk (k, j) =Appendix EDeterminants and EigenvaluesE.1 The Determinants of Some MatricesThe following matrix are all n x n.a b b ••• b a b—a b—a ••• b—ab a b ••• b b a—b 0 0b b a b b 0 a— b ••• 0••• • •^• • • • 0b b b ••• a b 0 0 a — ba(a—br-1 - b(b — a)(a — b)n-2 (n — 1)Si al al (a + (n — 1)b)(a — b)n-182 — al^0^0 ala2 82 a2 a2 0 82 — a2 0 •• • a2a3 a3 s3 a3 0 0 s3 — a3 a3an an an • • • Sn an — sn an — sn an — sn • • • Snn-1^n-1Sn H (si — ai)^a;^(si — ai)j=11=1^=111(Si — ai)^ai II (si — ai)i=1^j=1137Appendix E. Determinants and Eigenvalues^ 138s — al —al —a1 • • • —al s — al 0 0 •• • al—a2 s — a2 —a2 • • • —a2 0 s — a2 0 •• • a2—a3 —a3 s — a3 —a3 0 0 .9 — a3 . - - a3— an —an —an — Sn —Sn —Sn • • • 3 - an( s aosn-1 E ssn-2aii=1(s — Eai)srli=1E.2 Eigenvalues of the Inference Network Connection MatrixLet P be an n2 x n2 symmetric matrix. Its element at row iN + j and column rN + s ispi;,,., = ub(i — r)b(j — s) — vb(i — r) — vb(j — s) u,v 0where{1 x = 0b(x) =0 x# 0Matrix P can be divided into n 2 n x n submatrix:Pll P12 P13 PinP21 P22 P23 P2nP = P31 P32 P33 P3nPnl Pn2 Pn3 PnnAppendix E. Determinants and Eigenvalues—u — 2v —v —v • • • —v—v —u — 2v —v —vwhere Pii = —v —v —u — 2v • • • —v V1 <i<n—v —v —v • •• —u — 2v—v 0 0 • • • 00 —v 0 • • • 0and Pi.; = 0 0 —v 0 V 1 < i 0j < n0 0 0 • —vSi V V V 0 0 • • • 0 0V Si V 0 v 0 • • • 0 v 00^•^• • • • • • • •V V Si 0 0^• • • • 0 0 • • •0 0 • • • • • 00 v • 0 v S ^• • • v • • • 0 v • • • 0IsI — PI^=• • • • • • • • •0 0 • v v v^• • • si 0 0 • • • v• • • • • •0 0 v 0 0 si v0 v • • • 0 0 v 0 • • • v si • • • v• • • • • • • • • • • •0 0 • • v 0 0^• • • • • v v • • • si139Appendix E. Determinants and Eigenvalues^ 14032VVV32V•• ••• ••• •Vv3232VVV32V•• •• •• •• ••••VV32• •• •• •. • ••••32VVV32• •V• ••• •V• V• 32v 0 • • • 0 si v v • • • v 0 • 00 v . • • 0 v si • • • v • • • 0 v • • 0=• • • • • • • • • • • •0 0 v v v • • • si • • • 0 0 - • v• • • • • • • • •v 0 0 v 0 0 • • • si v • • • v0 v 0 0 v 0 • • • v si • • • v• • • • • • • • • • • •0 0 • • • v 0 0 v • • • v v • • • si0 0^0• • •Appendix E. Determinants and Eigenvalues^ 14182 V^ 32 • • •• • •^ 0 0^0 0 0^0^ ^0 0 0^ v .52^0^0^0^ ^0^0 0^ 0^•^0^33 V^•^V^• • •^0^0 • • • 00^v^0^v 33 v^. ^0^0 0• • •0^0 v^v^v • 33 • • • 0^0 0• • •^ 0 0^0 0^00 v 0^0 0^0... 0^0 v^0^0 0• • •• • •• • •33 V • • • V^ 33 • • • V^ V • • • 33n-182 V • • • V^Si V •^V^ 32 • • • V^V Si • • • V(E.119)V^V • • • 32 V V • • • Siwhere .si = s u 2v, s2 = 8 + U + (n 1)v, and s3 = s u v, using the determinantsderived in Appendix E.1,— PI =^u 2nu)(8 u) (n-1)2 (8 u nv)2(')Appendix E. Determinants and Eigenvalues^ 142P has three distinct eigenvalues:Al = —(u 2nv), its corresponding eigenvector ise1 = (1 , 1, . . 1)t = (1 , 1^1 xtN N" N )A2 = —u, its (n — 1)2 eigenvectors form a (n — 1) 2-dimensional subspace;A3 = —(u nv), its 2(n — 1) eigenvectors form a 2(n — 1)-dimensional subspace. SinceAl 0 A2 0 A3, el _L at2 _L R3.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A binary relation inference network for constrained...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A binary relation inference network for constrained optimization Su, Crystal J. 1992
pdf
Page Metadata
Item Metadata
Title | A binary relation inference network for constrained optimization |
Creator |
Su, Crystal J. |
Date | 1992 |
Date Issued | 2008-09-18 |
Description | Constrained optimization is an essential problem in artificial intelligence, operations research, robotics and very large scale integration hardware design, etc. Many constrained optimization problems are computation-intensive, and yet require fast solutions. This thesis presents a novel parallel computing network designed for some optimization problems which can be represented by binary relations and the calculation upon them. Traditional graph search techniques provide sequential solutions to constraint satisfaction, but their speed is limited by the computational ability of a single processor. Parallel processing offers faster solutions by dividing its work load among many processors. A parallel algorithm can be developed from its sequential counterpart by decomposing the unfinished task dynamically according to the availability of processors. These kind of algorithms usually have difficulty in distributing work load evenly among processors [69]. Many distributed parallel computing models have been proposed for various types of problems. A distributed computation model usually embodies the data structure for a specific type of problem [16]. Many of them work in the discrete time domain [71, etc]. Neural network approaches have been proposed for optimization problems [28], however, optimal solutions are often not guaranteed when the gradient descent scheme is used. The determination of the link weights in a Hopfield-type neural network can be elaborate and convergence for large problems remains a problem [2, 90]. This thesis proposes a new parallel computing network — a binary relation inference network. The network topology is set up to match conventional optimization procedures. The network transforms some optimization problems (shortest path, transitive closure, assignment, etc.) into network convergence problems and divides the work load evenly among all processors. Link weights are known. It can operate in synchronous discrete-time, asynchronous discrete-time, and continuous-time domains. Inference network algorithms are derived directly from problem formulation, and are effective for a large range of problem sizes. The inference network consists of interconnected components; each component is composed of a unit, a number of sites attached to the unit, and links from the unit to sites on other units. Each unit is a simple computational engine. The topology of the inference network matches naturally to many optimization problems, since its deduction sites produce candidates to compete for the optimum and its decision-making units select the optimum from the site values. Either directly or through a transformation mapping to a systolic structure, discrete inference network algorithms can be implemented on a commercially available parallel processing facility, such as the Connection Machine. Analog inference network can be built using integrated circuits which solve problems efficiently in the continuous-time domain. In this thesis, mathematical analysis and numerical simulation were conducted for the synchronous, asynchronous, and analog inference network. Various applications has been discussed. The inference network has shown to solve the all-pair shortest path problem efficiently in various updating schemes. The inference network algorithm for the transitive closure problem was derived straightforwardly from the formulation of the problem and the topology of the inference network. Efficient continuous-time solution was obtained from a non-synchronized logical circuit. The convergence of the network for the shortest path or transitive closure problems was shown to be independent of the problem size. It was demonstrated that the inference network can solve the assignment problem in a way similar to the Hopfield net's solution to the traveling salesman problem. However, the former was shown to converge for all test cases with problem size up to 128. |
Extent | 6572227 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | Eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2008-09-18 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065233 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1993-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
URI | http://hdl.handle.net/2429/2258 |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_1993_spring_phd_su_crystal.pdf [ 6.27MB ]
- [if-you-see-this-DO-NOT-CLICK]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0065233.json
- JSON-LD: 1.0065233+ld.json
- RDF/XML (Pretty): 1.0065233.xml
- RDF/JSON: 1.0065233+rdf.json
- Turtle: 1.0065233+rdf-turtle.txt
- N-Triples: 1.0065233+rdf-ntriples.txt
- Original Record: 1.0065233 +original-record.json
- Full Text
- 1.0065233.txt
- Citation
- 1.0065233.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
India | 19 | 0 |
United States | 8 | 2 |
China | 4 | 3 |
France | 3 | 0 |
Canada | 3 | 0 |
United Kingdom | 3 | 0 |
Indonesia | 2 | 0 |
Russia | 1 | 0 |
Germany | 1 | 0 |
City | Views | Downloads |
---|---|---|
Unknown | 27 | 0 |
Jakarta | 2 | 0 |
Sunnyvale | 2 | 0 |
Beijing | 2 | 0 |
Ashburn | 2 | 0 |
New Delhi | 1 | 0 |
Hangzhou | 1 | 0 |
Pune | 1 | 0 |
Redmond | 1 | 1 |
Delta | 1 | 0 |
Guangzhou | 1 | 0 |
Mountain View | 1 | 1 |
Kolhapur | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065233/manifest