i V L S I I M P L E M E N T A B L E A S S O C I A T I V E M E M O R Y B A S E D O N N E U R A L N E T W O R K A R C H I T E C T U R E Winston Ki-Cheong Mok B. A. Sc. (Hons.) University of British Columbia A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF A P P L I E D SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF E L E C T R I C A L ENGINEERING We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA April 1989 © Winston Ki-Cheong Mok, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of g ^ l ^ O T / ? ! C Art. t=M6, fN&&£//J£? The University of British Columbia Vancouver, Canada DE-6 (2/88) A b s t r a c t Neural network associative memories based on Hopfield's design have two failure modes. 1) Certain memories are not recallable as they do not lie in a local minimum of the system's Liapunov (energy) function, and 2) the associative memory converges to a non-memory location due to trapping by spurious local minima in the energy function surface. The properties of the first failure mechanism in the Hopfield and five related models were investigated. A new architecture eliminating such failures is proposed. The architecture is fully digital and modular. Furthermore, it is more silicon-area efficient than any of the analyzed models. VLSI circuits built using this architecture will be able to contain a large number of neurons and several circuits may be connected together to further increase capacity. ii Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgement l x 1 Introduction 1 1.1 Hopfield Networks 1 1.2 Overview of Thesis 3 2 Associative Memory Networks 4 2.1 Hopfield Network • • 4 2.2 Simplifying Models 7 2.2.1 Truncated Hopfield Model 7 2.2.2 Denker Model 7 2.2.3 Truncated Denker Model 8 2.2.4 Palm Models 8 2.2.5 JPL Model 8 2.3 Model Summary 9 2.4 Simulation Results 9 3 Unstable Memory Analysis 12 3.1 Hopfield Model 12 3.2 Truncated Hopfield Model 15 3.3 Denker Model 18 3.4 Truncated Denker Model 20 3.5 Analogue Palm Model 20 iii 3.6 Truncated Palm Model 21 3.7 Correlation between Neuron Signals 22 3.8 Distribution of Neuron Signal • 24 3.9 Software Tools 29 3.10 Results 29 3.11 Limitations Of Current Implementations 37 4 Proposed Associative Memory Architecture 40 4.1 Model Description 40 4.1.1 Threshold Determination 41 4.1.2 Memory Stability 44 4.1.3 frequency of Convergence 44 4.2 System Architecture 45 4.3 Neural Network Module Architecture 52 4.4 Simulations 55 4.5 Performance of Proposed Network 56 4.5.1 Effects of Threshold 59 4.5.2 Effects of Dilution 62 4.5.3 Effects of Memory Loading 64 4.6 Convergence Delay 64 4.7 Comparison with Existing Models 65 4.7.1 Convergence Speed 65 4.7.2 Memory Capacity 66 4.7.3 Network Size 67 5 Conclusions 68 5.1 Summary 68 5.2 Recommendations 69 References 70 A 72 iv B Notation 75 v List o f Tables 2.1 Variations on Hopfield Model 9 2.2 Frequency of Convergence to Closest Memory 11 3.1 Attainable Values of Signal S t 24 vi List of Figures 2.1 Schematic of a Hopfield Neural Network 5 2.2 Network energy as a function of Ut 6 3.1 Conditional pdf of Sjt and optimum decision boundary 14 3.2 Distribution of 5 t ; Analogue Hopfield model, JV = 128, m = 10, p = 0.5 27 3.3 Distribution of St] Analogue Denker model, N = 128,m = 10, p = 0.5 27 3.4 Distribution of St; Analogue Palm model, N = 128, m = 10,p = 0.1 28 3.5 Distribution of St in the Region of Integration 28 3.6 Predicted and Simulated Optimum Threshold; bipolar models, m = 10, p = 0.3 30 3.7 Predicted and Simulated Optimum Threshold; unipolar models, m = 10, p = 0.1 31 3.8 Predicted and Simulated Bit Error Rate; bipolar models, m = 10, p = 0.5 32 3.9 Predicted and Simulated Bit Error Rate; unipolar models, TO = 10,p = 0.1 32 3.10 Bit Error Rates; bipolar models, N = 128, m = 10, p = 0.3,0.4, and 0.5 33 3.11 Bit Error Rates; Palm models, JV = 128, TO = 10, p = 0.05, and 0.1 34 3.12 Predicted and Simulated Memory Error Rate; bipolar models, m = 10, p = 0.5 35 3.13 Predicted and Simulated Memory Error Rate; Palm models, TO = 10, p = 0.1 35 8.14 Predicted Capacity of Hopfield Networks; p = 0.5 36 3.15 Comparison with Published Results of Capacity of Hopfield model 37 4.1 Probability density function of signal Si for VJ = 0 and VJ = 1 43 4.2 Block diagram of associative memory co-processor system . . . 46 4.3 Frequency of convergence of proposed network (N = 1024 neurons, ThoFF = /* + 3<r) . . 50 4.4 Functional block diagram of neural network module 53 4.5 Performance of Typical Networks as a Function of Normalized Hamming Distance . . . . 57 4.6 Performance of Typical Networks as a Function of Normalized Knowledge of Target . . . 58 4.7 Predicted and Simulated Frequency of Convergence, h = 1, N = 1024, g = 64 58 4.8 Frequency of Convergence Vs. Hamming Distance; (N = 1024, g = 64, m = 80) 60 4.9 Maximum Frequency of Convergence Chi Vs. Threshold; (N = 1024, g = 64, TO = 80) . . . 61 vii 4.10 Normalized Radius of Convergence Vs. Threshold; (N = 1024, g = 64, m = 80) . . . 61 4.11 Maximum Frequency of Convergence Chi Vs. Dilution Factor; (J = 60) 63 4.12 Normalized Radius of Convergence hent Vs. Dilution Factor; (I = 60) 63 4.13 Normalized Radius of Convergence H„it Vs. Memory Loading; (I = 60) 64 4.14 Performance of the Truncated Hopfield and the Proposed Network (I = 60, D = 4.0) . . . 66 viii Acknowledgement I would like to thank my supervisor Dr. Dan Camporese for his support and input during this project. Also, the efforts of Dr. David L. Pulfrey in interfacing with the Natural Sciences and Engineering Research Council were deeply appreciated. I am indebted to Joe Poon for the use of his graphical software, and David Gagne for the programming help he has generously given. Finally, this thesis would not be possible without the support of my parents. This research was funded in part by the Centre for Integrated Systems Research and Microtel Pacific Research Limited. ix C h a p t e r 1 Introduct ion Recently, there has been considerable interest in the area of artificial neural network models. These models mimic biological nervous systems by having dense interconnection of simple non-linear computa-tional elements. It is hoped that by copying the architecture of biological systems, it will be possible to build systems approaching human performance in areas of speech and pattern recognition. Humans can perform these tasks easily, while the best current implementations using Von Neumann machines have had limited success. Investigation of neural networks began in the 1950's with the perceptron. Rosenblatt [19] showed that the perceptron was able to perform classification if the data set is separable by a single hyperplane. It was, however, unable to construct more complex decision boundaries. As a result, the exclusive-or problem was beyond its capabilities. This realization dampened neural network research for some years. Renewed interest in neural networks was sparked by the discovery of learning algorithms for back-propagation networks [20] and of a recursive architecture suitable for implementing associative memories. Back-propagation networks are multi-layer perceptrons. Current research on them lies mainly in the search of improved learning algorithms and suitable applications. Since they do not contain feed-back paths, the dynamic behaviour of back-propagation networks is straight-forward. On the other hand, each computational element (neuron) in a recursive neural network is connected to all other neurons. These networks are often referred to as Hopfield networks after John Hopfield who formulated a Liapunov function that describes their dynamic behaviour [7, 8]. 1.1 Hopf ie ld Networks The Hopfield network has been suggested as a plausible model for biological memory. It formally resem-bles the human cerebellum, where one finds dense interconnections of neurons of limited diversity. In both systems, artificial and biological, the large number of computational elements (neurons), individu-ally, appear to act randomly; but, collectively, they are able to perform complex computations. 1 Chapter 1. Introduction 2 In contrast to conventional memory models, where memory bits are stored explicitly and are physi-cally segregated, information in a Hopfield neural network associative memory [7, 8] is stored nonlocally in a common synaptic matrix using a distributed algorithm. The information is encodedin the strength of the connections. The dispersion of information over a wide array of elements affords the network a high degree of fault tolerance. However, due to aliasing, it is possible for some memories to be corrupted. Hopfield networks are fundamentally analogue in nature. In particular, they require high impedance resistors of various values. However, given current VLSI (Very Large Scale Integration) technology, such resistors are expensive in silicon area, and/or process complexity. Observing the robustness of the model, researchers have proposed various modifications in order to reduce the number of resistors required or replace them completely with switches [4, 5, 6, 14, 15, 21]. Prime applications of Hopfield networks are error correction and associative memory. An operation is said to have failed if the network converges to a point that is not one of the stored memories. This can happen in one of two ways; a) a spurious stable point lies on the trajectory between the initial probe and the target memory, or b) the target memory itself is not a stable point and its thus irretrievable. It is felt that the presence of unstable memories is of particular concern as the system will corrupt an otherwise error-free probe vector. The frequency of occurrence of both failure models is proportional to the number of stored memories, and inversely proportional to the size of network. Other researchers [1, 12] have concentrated on determining the asymptotic limit of the number of storable memories as the network approaches infinite size. In [3], Bottini studies the capacity of moderately large networks programmed using circulant matrices [2]. In this work, the author addresses the practical problems of determining the optimum neuron threshold and estimating the probability of unstable memories in finite sized Hopfield and related networks. Special attention was paid to networks sufficiently small to be implemented on a monolithic VLSI circuit. User programmable circuits with several dozen neurons and ROM based circuits with several hundred neurons have been reported [5, 6, 14, 21]. In addition, this work takes into account uneven distribution of TRUE and FALSE bits in stored memory vectors. The methods used are based on Bayes detectors [24] in classical communications theory. This thesis presents an associative memory model that does not suffer from unstable memories and is more silicon efficient than models mentioned above. Modules of the proposed model can be cascaded to form larger networks than are possible with a single monolithic circuit. Chapter 1. Introduction 3 1.2 Overview of Thesis This thesis has been arranged in the following manner. Chapter One provides the reader with a brief introduction to the field of neural networks and the failure mechanisms encountered in Hopfield networks. The Hopfield network and six related networks are presented in Chapter Two. The properties of unstable memories are analysed in Chapter Three. A proposed model with low VLSI complexity and immune from unstable memories is presented and evaluated in Chapter Four. Chapter Five contains the conclusions and recommendations from this work. Chapter 2 Associative Memory Networks 2.1 Hopfield Network Of the many biologically inspired models, the Hopfield network is one of the simplest. It has been demonstrated empirically that such a network can behave as an associative memory [7, 8]. The collective properties arise when there are large numbers of neurons. Hopfield networks offer a probable model to the actual mechanisms of biological memory. The schematic of Figure 2.1 illustrates an example of a Hopfield network. The network consists of N interconnected neurons. Each neuron, can have two states; ON (Ut = 1), and OFF (Uk = —1). The state of the system is given by the N-tuple (U\, U2,. •. Un) = U. The connection (synaptic) strength from neuron k to neuron t is denoted by T^. In figure 2.1, the strengths are controlled by resistors located at the crosspoints of the synaptic matrix. In operation, a randomly selected neuron, examines is inputs and decides on its next state. Random selection was shown in [18] to mimic the asynchronicity of operations in biological neurons. The decision rule and the associated energy function are chosen such that memories lie in local energy minima and every state change results in a non-positive energy change. If the energy function is bounded from above and below, the network is guaranteed to converge. Hopfield proposed that the chosen neuron updates its output by comparing the weighted sum of its inputs with its own threshold, fjt. If the sum is greater than the threshold, the neuron turns ON. If the sum is less than the threshold, the neuron turns OFF. Otherwise, the neuron remains unchanged. In other word, the output of the neuron is given by: The strength of the synaptic connections, T«, are set by the sum-of-outer-products rule. Given the ON Uh(t + 1)= Uk(t) OFF (2.1) 4 Chapter 2. Associative Memory Networks T1N T2N T — r iik l2k i—r l12 lll l21 J J u, TkN TNN T T 1 Nk 1 lk2 TN2 1 1— J • • • TN1 I N Figure 2.1: Schematic of a Hopfield Neural Network set of memories is [V*1, V*2,... t^m] , Tu is given by: r=l (2.2) An energy (Liapunov) function appropriate for (2.1) and (2.2) is shown in (2.3) below. It is bounded, and all state transitions generated by (2.1) result in a non-positive energy change. It can be shown that, on average, the energies at memory points are lower than at non-memory points. j N N N £(U) = + J>0i (2.3) »=i j=i »=i Chapter 2. Associative Memory Networks 6 Rewriting (2.3) explicitly in terms of Uk, and differentiating with respect to Uk, one gets: - N N N AT £0) = -\T*kUl - UkJ^TtM+Wi* -z2l2UJTJiUi + T,tiUi i*k i=ii=i i*k i*k and •^- = -TkkUk-^2TkiUi + tk if 8Uk i=l ijtk t=l (2.4) (2.5) Comparing (2.1) and (2.5), it is clear that a state transition is generated if the directional derivative predicts an energy reduction. However, the energy function given by (2.4) is parabolic and concave down. The derivative is, therefore, not always reliable in predicting the energy difference between the two states. This is illustrated in Figure 2.2 below. If (7* = — 1 (point A) currently, and the energy is as shown in the figure, neuron Jfe will be prevented by the positive ^from changing state to Uk = 1 (point B) even though the change would lead to a lower total energy. This problem is caused by a non-zero T i t term. If Tkk is set to zero arbitrarily, the energy function is linear with respect to each neuron, and the entire energy surface will be shifted up. However, the difference in energy between states remains unchanged. The small energy barrier surrounding certain states will be flattened. It is therefore customary to set Tkk to zero. In the following analysis, Tkk is assumed to be zeroed. Figure 2.2: Network energy as a function of Uk Chapter 2. Associative Memory Networks 7 2.2 Simplifying Models In the Hopfield model, the strength of synaptic connections range from —m to +m, where mis the number of memories stored. In an analogue implementation, resistances are used to effect the connections. Having programmable resistors with a wide range of values is prohibitively expensive with current VLSI technology. Consequently, various modifications to the basic Hopfield model have been proposed. Some reduce the number of resistors required, while others replace them with fixed resistances and switches. We will describe six such modifications in this section. 2.2.1 Truncated Hopfield Mode l The truncated Hopfield model was proposed by Sivilotti et al. [21], and implemented in a VLSI circuit. In this model, the synaptic strengths are set to +1 if the sum of outer product is positive, and —1 if the sum is negative. where +1 ; x> 1 T(x) = 0 ; x = 0 - 1 ; x < - 1 The synaptic matrix consists of a storage element and a polarity switch at each node. The resistance is provided by the channel resistance of the pass transistors. Positive and negative resistance is achieved by gating the true or complement output of the neurons, respectively. The associative memory integrated circuit (ASSOCMEM) built used standard VLSI technology, and contained 22 neurons. The neuron update rule, and energy function remain unchanged from the basic Hopfield model. 2.2.2 Denker Mode l In an effort to reduce the number of variable resistances required, and to demonstrate the fault-tolerance of the Hopfield model, Denker [4] simulated a circuit in which all positive elements of the synaptic matrix were set to zero. It was shown empirically that the network still performed as an associative memory, albeit with reduced efficacy. The neuron update rule and energy function of the basic Hopfield model remain applicable here. Chapter 2. Associative Memory Networks 8 The connection strengths are related to the memory vectors by Tki=V (2.7) where 0 x > 0 V(x)= 0 SB = 0 X x < 0 2.2.3 T r u n c a t e d Denker M o d e l Combining the ideas of [4] and [21], one arrives at the truncated Denker model. The synaptic strengths are generated using (2.6) and the positive elements are then set to zero. The neuron states are still represented by [—1, +1]. Fault-tolerance of the neural network architecture is amply demonstrated, as, in spite of the radical mutilation of the synaptic matrix, the resultant network still performs its intended function. 2.2.4 P a l m M o d e l s The next two models, analogue Palm and truncated Palm [16], use unipolar representations for neuron states. The ON state is denoted by Uk = 1» and the OFF state by Uk — 0. (Similarly for memory elements V^.) These models ease VLSI implementation, as only single-ended signals are required. However, these models require that memory vectors be diluted before storage, in order to reduce aliasing. Palm showed that for maximum storage capacity, the optimum weight (number of T R U E bits) of diluted vectors is given by Wgpt = log2N. However, he did not take into account the effect of dilution rate on the size of the region of convergence around each memory. After dilution, the analogue Palm model uses (2.2) to generate synaptic strengths, while the Palm model uses (2.6). 2.2.5 J P L M o d e l The JPL model was developed by Moopenn et al. at the Jet Propulsion Laboratory [14] and is a refinement of the Palm model. In this model, the memory vectors are first divided into, say, g groups of b bits. Each group is then expanded by a 6 => 2 l decoder. The expanded representation will contain (2.8) Chapter 2. Associative Memory Networks 9 exactly one T R U E bit pei group. The output of the decoder is concatenated to form a diluted memory vector V r . The vectors will contain g T R U E bits; that is, a weight of g. The synaptic matrix is programmed from the diluted vectors using: Instead of having a fixed threshold, Moopenn utilizes a variable analogue threshold that seeks to enforce the one T R U E bit per group syntax as given by tk = B. £ Ui (2.10) <€{<?„} {Gk} represents the set of bits in the same group as bit Jb. In [14], B was set to g/2. The threshold is a function solely of the state of neurons in the local group. Both J7< and tt are continuously variable analogue entities. 2.3 Model Summary The various models described are summarized in Table 2.1. Although the range of Tu for the analogue models are listed as ± o o their range for any specific implementation is only ± m , where m is the number of stored memory vectors. The reference column lists publications where the models were first introduced. Model Type Tki Range or Possible Values Possible Values of Uk and VtT Reference Analogue Hopfield — C O - " C O -1.+1 [7,8] Truncated Hopfield -1,0,+1 [21] Analogue Denker - O O - - - 0 -1.+1 [4] Truncated Denker -i,o -l.+l Analogue Palm 0 - - - O O 0,-1-1 Truncated Palm o,+i 0.+1 [16] JPL 0,-1-1 0,+l [14] Table 2.1: Variations on Hopfield Model 2.4 Simulation Results To verify that the Hopfield, Denker, and Palm models do indeed perform as associative memories, simulations were performed. The simulation programmes were written in the C programming language Chapter 2. Associative Memory Networks 10 [9] and executed on a Sun Microsystems model 3/260 computer. The programmes generated random memories and stored them in the trial networks using the appropriate sum-of-outer-products rule. In order to minimize correlation between memory vectors and elements within each vector, elements are generated individually via calls to the built-in UNIX function random. The network is then probed with vectors of fixed Hamming distance h to the nearest memory. The probe vectors are formed by randomly choosing a memory vector as base, and then randomly negating h of its bits. The Hamming distances of the probe to the set of memory vectors are computed. If the minimum Hamming distance is smaller than the desired value, another bit is selected randomly for negation. The process of distance computation, and bit negation is repeated until the desired Hamming distance is achieved. After an appropriate probe vector is formed, the Hamming distances between the probe and all the memory vectors are computed. The trial network state is then set to that of the probe. Then, neurons are selected in a round-robin fashion, and the update rule is applied to determine the output of the selected neuron. When no more state changes occur, the network has settled. The final state of the network is compared to the set of memories. Convergence is said to be optimum if the network settles to one of the memories with the minimum Hamming distance to the probe. Round-robin selection is used to minimize CPU load and to simplify detection of convergence. Empirical evidence suggests that round-robin selection has similar convergence properties to random selection. This result is expected as memory bits are generated randomly and are not a function of their location in the vector. Flexibility was a prime consideration in the simulation programmes. A large variety of parameters such as number of neurons, memories stored, full range/truncated T*{, etc. can be set via command line switches. A total of three simulation programmes were written; one for the Hopfield models, the Denker models, and the Palm models. Due to the continuously variable threshold in the JPL model, simulating it requires an analogue simulator. The discrete time algorithm used for simulating the other models will not suffice. Due to its high complexity, development of an analogue simulator is beyond the scope of this thesis. Therefore, the results in [14] will not be independently confirmed here. Trial networks with N = 128 neurons, and m = 10 stored memories were chosen to verify the functionality of the first six models. Each bit of a memory vector has a probability p of being ON, and q of being OFF (q = 1 — p). p was set to 0.5 for bipolar models and to 0.1 for unipolar models. The network was then probed with 40 random initial vectors that are chosen to have a Hamming distance h of 5 or 10 from the closest memory. For each value of Hamming distance used, 500 trial networks were Chapter 2. Associative Memory Networks simulated. The frequency of convergence to the closest memory is tabulated in Table 2.2. 11 Model Type Frequency of Frequency of Convergence (h = 5) Convergence (h = f0) Analogue Hopfield (p = 0.5) 0.99 0.99 Truncated Hopfield (p = 0.5) 0.95 0.94 Analogue Denker (p = 0.5) 0.78 0.77 Truncated Denker (p = 0.5) 0.60 0.61 Analogue Palm (p = 0.1) 0.64 0.53 Truncated Palm (p = 0.1) 0.71 0.60 Table 2.2: Frequency of Convergence to Closest Memory C h a p t e r 3 Uns tab le M e m o r y Analys i s As discussed in Chapter Two, decision rule (2.1) ensures a monotonic decrease in energy as given by (2.3). Thus, a system can never return to a state it has left previously. If the system is currently at one of the stored memories, and (2.1) generates a state change, that memory is unstable and irretrievable. We will derive the expression for the probability of unstable memories based on this observation. 3.1 Hopf ie ld M o d e l Rule (2.1) consists of two parts; Si = YA=I TkiVi and the threshold it- With the system currently at a memory Vr, the memory is unstable if VjJ" = 1 and St < <i, or if VJT = —1 and Si > it, for any bit k. If one assumes that the memories are random and independent, and the bits in each memory are also random and independent, the distribution of Si is approximately Gaussian (3.1) because of the large number of terms contributing to it. The probability density function of St in the truncated Hopfield model and the two Denker models can be similarly approximated. The characteristics of Si is then completely specified by the mean (/*) and variance (<r2). 1 fsk(*\vn = , - i K — M ) / » ] * TV (3.1) With the system at memory VT, and if the neuron in question has value V t r = 1, the expectations of St and Si are given by = E[Si|.T7 = l] = E N £ V T ( V 7 ) 2 I V T = 1 t=i Li** + E N i = i * = i = {N-1) + { N - l ) { m - l ) { 2 a 3 - l ) (3.2) and 12 Chapter 3. Unstable Memory Analysis 13 E[sl\v; = i] = E N \ineqk VT = l « = i (iV - 1){N - 2)(m - l)(m - 2)(2a6 - 1) + 2(N - 1)(/V - 2)(m - l)(2a 3 - 1) + (AT - 1){N - 2)(m - l)(2a 4 -1) + (N- 1)(N - 2) + (JV - l)(m - 1) + (N - l)(m - l)(m - 2)(2a4 - 1) + 2(JV - l)(m - l)(2a 3 - 1) + (JV - 1) (3.3) where « 6 = P 6 + ( 5 ) P V + ( ^ ) p V + g 6 , a 5 = p 6 + ( § ) P V + ( n P V , C«3 2 • 2 <*2 = p + g I P 9 The equations (3.2) and (3.3) result directly from expanding Tki by (2.2) and taking the expectation of the resulting terms individually, cn represents the probability that the product of t disparate memory bits has value 1. The bits takes value 1 with probability p, and —1 with probability q — 1 — p. On the other hand, if V£ = — 1, we have M _ = E [S t | VI = -1] = (JV - l)(m - l)(2a 3 -l)-(N-l) (3.4) and E [ S 4 2 | V 7 = - 1 ] = The variance is, of course, given by (JV - 1)(JV - 2)(m - l)(m - 2)(2a6 - 1) + 2(JV - 1)(JV - 2)(m - 1)(1 - 2a3) + (JV - 1)(JV - 2)(m - l)(2a 4 - 1) + (JV - 1)(JV - 2) + (JV - l)(m - 1) + (JV ~ l)(m - l)(m - 2)(2a4 - 1 ) + 2{N - l)(m - 1)(1 - 2a3) + (JV - 1) <r2 = E [ S t 2 ] - E 2 [ S t ] (3.5) (3.6) Chapter 3. Unstable Memory Analysis 14 With the system at a memory, decision rule (2.1) uses threshold <t to distinguish between two noisy signals. This is the classic case of Bayes detector in communications theory [24]. The optimum tk is given by the root of P• fsk(tt \v£ = i)-q-fsu(th\vi = -1) = 0 (3.7) The probability of a decision error can be found using K = P f" fsk(x | Vi = 1) dx + q f°° fSk(x | ^ = -1) dx (3.8) 0.2 —1 1 1 0 Figure 3.1: Conditional pdf of St and optimum decision boundary A memory is stable only if all its bits are stable. If one assumes that decision errors are uncorre-cted between bits, the probability that all N bits are error-free is therefore, [1 — be]N . Conversely, the probability of a memory error is given by m. = l - [ l - M * (3.9) The bit errors are in fact not uncorrelated. It will be shown in a later section that the signal received by neurons in like states are positively correlated. For example, if a neuron, currently has state FALSE, receives a larger than expected signal, other FALSE neurons will tend to receive large signals also. This trend also applies to T R U E neurons receiving lower than normal signals. Since all neurons have the Chapter 3. Unstable Memory Analysis 15 same threshold, a bit error occurring in one neuron makes bit errors in other neurons more probable. Consequently, (3.9) gives the upper bound on the memory error rate. To see the reason (3.9) yields over-estimates, consider the case where bit errors are strongly positively correlated such that bit errors always occurs in pairs. Since the bit error rate measures the number of bit errors in all the memory vectors taken in aggregate, fewer bit errors are available to contaminate other memories if bit errors are used in pairs. The level of correlation between signals received by arbitrary pairs of neurons is discussed in section 3.7. The five related models are analysed similarly in subsequent sections. For each model, relations for the first and second order expectations are derived. Expressions (3.8) and (3.9) are used to calculate the probabilities of decision error and memory instability respectively. 3.2 Truncated Hopfield Model In the truncated Hopfield model, the elements Tti of the transition matrix are limited to (—1,0,1). Again with the system at memory Vr, the signal Sk received by neuron k is given by (3.10) The expected values of Sk for Vkr = 1 and VjJ" = — 1 are shown below: = E[S t |V? = l] = (N-l)-E[V?-T(Vk1Vl + --- + VkrVT + --- + VkmV?i) | VZ" = 1] m - l (p-q) £ ( B K ( j t m - l ) - B x ( j , m - l ) ) + = ( N - l ) x i=r»**i p B K ( [ ^ \ , m - l ) + q B x { [ f \ , m - l ) (3.11) where K = a 2 = P2 + q 2 , 1 — a 2 = 2pq Chapter 3. Unstable Memory Analysis 16 and = E[S t |V? = -l] = {N-i)-E[v?T(viv> + '-- + Vrkv? + --- + vkmv?n) |V? = -1] = (JV - 1) x The expectation of Sf is of the form: m - l (p-g) £ ( B K ( j , m - l ) - J 3 A ( ; , m - l ) ) -p B ^ L f J . m - l J - g B ^ L f J . m - l ) E[St2] = E JV N i = i j = i <o=l \b=l (3.12) (3.13) Noting that the bits of each memory are identically distributed and mutually independent, we can write: E[5 t 2 |VT] = (JV - 1) • E (JV - 1)(JV - 2) • E (3.14) U = i v t = i Chapter 3. Unstable Memory Analysis 17 To avoid the discontinuity in 7~(), the expectation was performed in pieces over regions where 7~() equals 1 or —1. The resulting expressions are and E f o 2 | n = l] = ( N - l ) m - l vi=0 m - l + ( ^ _ l ) ( ^ - 2 ) ^ i ? , K m - l ) x ui=0 + 2pg P_i,i(w)P_i,i(w) + P M HPi, i («; ) -2P_i,i(u>)Pi,iH Pi . iHP_i ._ iH + p_ 1,1 H P i . - i H - P i , i H P i , - i H - P - L I H P . I , . ! ^ ) P i , - i H P i , - i H + P - I . - I H P . L . I H - 2 ? l r l ( w ) P . 1 , . 1 ( « ; ) + E[S t 2 |V7 = - l ] = ( N - l ) m - l 1 - 53 B f K m - * ) • bPo,-iH-i-gPo.iH] ui-0 m - l + ( J V - ! ) ( # - 2 ) £ _ » , ( u . , m - l ) x ui_0 2pg P_i , iHP_i , iH + Pi,iHPi,i(ti;) -2P_i .xHPi . iH Pi,i(t»)p_i,_i(w) + P _ I , I H P I , _ I H - P i , i H P i , - i H - P _ I . I H P . L _ I H P i , - i H P i , _ i H + P _ I , _ I H P _ I , _ I H -2Pi,_i(u;)P_i,_iH The expressions for Pa,t(w) and the derivation of (3.15) and (3.16) are shown in Appendix A. (3.15) (3.16) Chapter 3. Unstable Memory Analysis 18 3.3 Denker Model The Denker model is analysed along similar lines to that used for the truncated Hopfield model. Regions where V(x) = x are found, and the expectation sums are performed over each region. "With A = 2pq, the expressions for the Denker model are: E[S„ | VJ = 1] = ( N - l ) m-l 53 [{p-q)(m-2j) + 2q]Bx(j,m-l)+ [1 + Even (m)]gflA([fJ, m-l) (3.17) E[S2|V? = l]=-m-l (J\T-1) £ _ ? „ ( _ . , m - l ) x m i n ( « , | = f i j - l ) m_w_x p 53 53 (^*iw)Bj)(y,n»-ti;-l)(2a!-2y + m-2ti;) "=° y - m a x ( o , « + [ = ^ i ] - _ ) min(tK,[2^iJ) m _ w _ i q £ ^2 Bp(x,w)Bp(y,m-w-l)(2x-2y + m-2-2w) «=° v = m a x ( 0 , x + [ = S f i ] - W ) m-l (JV-l)(JV-2)53_^(ti»,m-l)x min(ti/,|=f±J) m-w-1 min^, ) m _ w _ j «2 53 53 53 53 ^( • .«w».m-»-i)x «=° y - m a x ( 0 , < r + p S f ± " | - w ) u = 0 «=max(0,u+ \ s f 1 " \ - w ) _? p(u, w)Bp(v, m - w - l)(2(x - y - w - l ) + m)(2(u - v - w - l ) + m) m i n ( « / , L = = _ J - l ) m-w-1 m i n ^ . L ^ J ) m - t » - l 2p? 53 53 53 53 B p ( » t w ) B , { y , m - w - l ) x "=° y-max(o,»+["=fI"|-iii) u = 0 ti-max(o,u+[=f±"|-_) _>p(tt, tu)i?p(t», m - ui - l)(2(sr - y - w) + m)(2(u - v - w - l ) + m) m i n ( i K , L ^ i J - l ) „ , _ „ , _ ! _ u n ( _ , L * ^ i J - l ) > + + 53 ^(»it")B,(y,TO-«;-l)x »=° v=max(0,B+[=fi]-tU) u = 0 v=max(u,u+ [s£±'] -vi) Bp(u, w)Bp(v, m - w - l)(2(x — y — w) + m)(2(u -v — w) + m) (3.18) Chapter 3. Unstable Memory Analysis 19 E[Si | Vfc = -1] = m-l E [ ( p - g ) ( m - 2 j ) - 2 p ] 5 A ( i , m - l ) -[1 + Even (m)] pB A ( [ f J , m - l) (3.19) and E [ S 2 | V ? = -1] = m-l ( N - l ) ^ 2 B p ( w , m - l ) x ui=0 min(«;, |=fij-l) m_„_i « E E Bp(x,w)Bp(y,m-w- 1) (2* - 2y + m - 2u>)2 + •=° V=max(0,*+[=fi]-tU) min(«/,|=rij) m-«/-l p E E Bp (a, w)Bp(y, m - w - 1) (2x - 2y + m - 2 - 2w) •=° v=max(0,x+[=Sji i"]_ w ) m-l ( N - l ) ( N - 2 ) ' £ i B p ( w , m - l ) x v>=0 min(u/,[=f±J) m_„,_i min(tU,L=T^J) m-tu-l P2 E X) £ E B p ( x , w ) B p ( y t m - w - l ) x *=° v=max(0,«+[=ri]-t») u = 0 v=max(0,u+ p T ^ ] - « " ) J5p(u, tu)5p(v, m - w - l)(2(x - y - w - l ) + ro)(2(u - v - w - 1) + m) min(i/,|=S=AJ-l) m - w - l min^ , J) m - w - l 2pq E E E E ^ ( * , i » W y , " » - « - l ) x «=° v=max(0,a!+p=£i"]-ti;) «=» «=max(o,u+f=fi"|-«) B p(u,n;)5p(t;,m- to - l)(2(se - y - to) + m)(2(« -1> — w - 1) + m) q2 E E E E * , ( « , « ) * F ( » , m - W - l ) * •=° y=max(o,e+["=±i"|-«i) «=° i<=max(o,u+p=£l"| _«,) Bp(u, w)Bp(v, m - w - l)(2(x - y - w ) + m)(2(« -v-w) + m) > + + (3.20) Chapter 3. Unstable Memory Analysis 20 3.4 Truncated Denker Model To analyse the truncated Denker model, the regions where T(VQ) yields -1 are located, and the expec-tations computed over each region separately. The expectations of St and Si (again with A — 2pq) are given by: m-l BP* I VZ = l] = {N-l) E[sk\v; = -i] = ( N - i ) (q-p) E Bx(j,m-l) + qBx([j\,m-l) m-l (q-p) E Bx{j,m-l)-pBx([j\,m-l) E [ S 2 | V T = I] = m-l (N - 1) £ £?„(_.,m - 1) [pP_ M M + !P-i,-iW] + m-l (N-l)(N-2)Y^Bp(w,m-l) w=0 P 2 P 2 I , I W + 9 2P 2 I,-I(™)-2pgP_1|1(_>)P_i1_i(u>) and E[S t 2 |V? = -1] = m-l (*-i)53 ^ («.« -!) [pP-i,-i W + gP-i,iH] + tu=0 m-l (tf-l)(i\T-2) £_»,(_•, m-l) t»=0 P ^ i . - l M + ^ i . i W " 2pgP_i,1(u.)P_i,_1(«;) 3.5 Analogue Palm Model (3.21) (3.22) (3.23) (3.24) The analysis of the analogue Palm model is very similar to that of the Hopfield model. To calculate the expectation of St, Tu is expanded according to (2.2) and the expectation of each term is summed. The expectation relations are: fi+ = E [St | V? = 1] = (N - l)p+ (N - l)(m - l)p3 = E[S t | VI = 0] = (N-l)(m-l)p3 (3.25) (3.26) Chapter 3. Unstable Memory Analysis 21 E[5 t 2|VT=l] = (JV - 1)(JV - 2)(m - l)(m - 2)p6 + 2(JV - 1)(JV - 2)(m - l)p4 + (JV - 1)(JV - 2)(m - l)p5 + (JV - 1)(JV - 2)p2 + (JV - l)(m - l)p3 +. (JV - l)(m - l)(m - 2)p5 + (JV - l)(m - l)p3 + (JV - l)p (3.27) and E[St2|VI = 0] = (JV - 1)(JV - 2)(m - l)(m - 2)p6 + (JV - 1)(JV - 2)(m - l)p5 + (JV - l)(m - l)(m - 2)p5 + (JV - l)(m - l)p3 (3.28) The expectation of Si may be very small (< 5). In those cases, the probability density function of St is approximated by the Poisson distribution (3.29). For larger E[Si], the probability density function is approximately Gaussian (3.1). For the Poisson distribution, the parameter A is equal to the variance. /sk(*|Vt') = ^ p (3.29) 3.6 Truncated Palm Model In the truncated Palm model, the elements in the synaptic matrix are given by (2.6). As with previ-ous analysis of models having discontinuous functions denning Tti, we calculate expectations here by summing over each region where T() has value 1. The expectation of St and 5 | are given by: /J+ = E [St | V2* = 1] = (JV - l)p (3.30) m-l M _ = E [St | VT = 0] = (JV - l)p £ Bp(w, m - 1) [1 - (1 - (3.31) w=l E [S2k | V? = 1] = (JV - l)p + (JV - 1)(JV - 2)p2 (3.32) and E[S||VT=0] = (JV-l ) p X;B PKm-l)[l-(l - p n + w=i m-l (JV - 1)(JV - 2)p2 ^ Bp(w,m - 1) [1 - (1 - p)«f (3.33) w-l Chapter 3. Unstable Memory Analysis 22 3.7 Correlation between Neuron Signals As discussed previously, our estimation of the memory error rate m, is based on the assumption that bit errors are uncorrelated. In this, section, the correlation of the signals received by two disparate neurons is investigated. The analogue Hopfield and Plam models are examined in detail. The truncated Hopfield and Denker models are expected to be similar to the Hopfield model and the truncated Palm model to its analogue counterpart. Consider first the analogue Hopfield network. With the network state at memory Vr, the covariance of the signals received by any two neurons, say neurons k and /, can be calculated. The correlation coefficient r, which measures the level of linear correlation between the two signals, is given by ratio of the covariance to the variance of the signals themselves. The mean and variance of St and Sj are identical because the memory bits are generated randomly and independently. ,2 _ <r|tJ| _ E[StS,]-E[StjE[S,j (3.34) The expectation of the product of the signals received by neurons k and / is given in (3.35) below. The terms Oj represents the probability that the product of * independent memory bits has value 1, and is defined in (3.3). The expectations of St are derived in section 3.1. B[5»S,] [(JV - 2)(JV - 3)(m - 1) + 3(JV - 2)(m - 1)] (2a3 - 1)+ (JV - 2)(m - l)(m - 2)(2a5 - 1) (JV - 2)(JV - 3) + 3(JV - 2) + m + (m - l)(m - 2)(2a4 - 1) (JV - 2)(JV - 3)(m - l)(m - 2) + (JV - 2)(JV - 3)(m - 1) (Vkr + V{)+ V{Vl+ (3.35) (2a6 -1)+ (JV - 2)(m - l)(m - 2) (2a4 -1) + 2(m - 1) + 3(JV - 2)(m - 1) (2a2 -1) It can be shown that if the two neurons have the same state (VkT = V7), the correlation coefficient is small and positive. The coefficient is small and negative if the two neurons are at opposite states. For example, consider two neurons k and I, both FALSE, in a network where memory bits are TRUE with Chapter 3. Unstable Memory Analysis 23 probability p = 0.5. The elements of (3.34) are given by: E[StSj] = JV(JV-2) + m E[Sk] = N - l E[S2k] = (JV-l)(m-l) + (JV-l) a The correlation coefficient is given by: JV(JV -2) + m-(JV - l ) 2 from (3.35) from (3.4) from (3.5) r = (3.36) (JV - l)(m - 1) + (JV - l) 2 - (JV - l) 2 JV - 1 Similarly, r = - 3 ^ if W *V?). It is true, of course, that a small correlation coefficient does not necessarily ensure independence of the two signals. It merely suggests the lack of linear dependence between the signals. For example, two variables related by a parabolic function would have zero correlation and yet be completely dependent. However, independent variables will always have zero correlation. Given the guaranteed independence of memory bits, the small correlation coefficient serves to confirm the pseudo-independence of the signals and thus bit errors between neurons. For the analogue Palm model, the expectation of the signal products is given by: E[SkS,] = (JV - 2)(m - l)(tn - 2)p5 + 2(JV - 2)(m - l)p3 ((JV - 2)(JV — 3)(m — 1) + (JV — 2)(m - l))p4 (m - l)(m - 2)p4 + 2(JV - 2)(m - l)p3+ (JV - 2)(JV - 3)p2 + 3(m - l)p2 + 3(JV - 2)p + 1 (JV - 2)(JV - 3)(m - l)(m - 2) + (JV - 2)(JV - 3)(m - 1) (VkT + Vf)+ (3.37) P6+ (JV-2)(m-l)(m-2) PB + (JV-2)(m-l) Consider, for example, a network with JV = 128, m = 10, and p = 0.1, and neurons k and I are both FALSE. Evaluating (3.35), (3.26), and (3.43) with the above parameters, the expectations contributing to the correlation coefficient r have values: E[SkSi) = 2.954 E[Sk] = 1.143 E[S2] = 3.827 from (3.37) from (3.26) from (3.28) Chapter 3. Unstable Memory Analysis 24 Therefore, the correlation coefficient r = 0.65. This sample calculation shows that there is significant linear correlation between signals received by neurons in the Palm model. Thus, if neuron k erroneously changes from state FALSE to TRUE due to an excessive signal Sk, the large correlation coefficient sug-gests that other neurons in the FALSE state will tend to receive large signals also. Memories containing bit errors tend to have more errors than the bit error rate would predict, (ie. Bits In Error > 6, • JV.) Since, bit errors tend to be concentrated in certain memories, the remaining memories must have below average bit errors if the aggregate bit error rates were to remain unchanged. Therefore, the memory error rates are less than predictions based on independent occurrence of bit errors. As will be shown in a later section, we are able to predict the bit error rate with reasonable accuracy, but grossly over-estimate the memory error rate. 3.8 Distribution of Neuron Signal During simulations, it was discovered that, under certain conditions, the signal St for the Hopfield and Denker models will only take on odd or even integer values. Put another way, every alternate value will have a frequency of occurrence of zero, even if the value is close to the mean. Superficially, this appears to prevent the use of continuous probability density functions to approximate the distribution of Sk. The cause of this behaviour must be investigated. Table 3.1 shows the values that Sk can attain. Model JV m sk Model JV TO sh Analogue Hopfield Odd Odd Even Truncated Hopfield Odd Odd Even Odd Even Even Odd Even Either Even Odd Odd Even Odd Odd Even Even Even Even Even Either Analogue Denker Odd Odd Either Truncated Denker Odd Odd Either Odd Even Even Odd Even Either Even Odd Either Even Odd Either Even Even Even Even Even Either Analogue Palm Odd Odd Either Truncated Palm Odd Odd Either Odd Even Either Odd Even Either Even Odd Either Even Odd Either Even Even Either Even Even Either Table 3.1: Attainable Values of Signal Sh Chapter 3. Unstable Memory Analysis 25 The constraints of Sk result from the construction of the synaptic matrix using the sum-of-outer-products rule. St is, naturally, given by m Sk = J2TkiVT • (3.38) <=i Expanding T« by its sum-of-outer-products equivalent, one gets sk = vtvM + vtvfvf + '-' + vtvzv; + --- + v1rvkmv1m WW + v2rvk2v? + ••• + VJVJVJ + • • • + vrv 2 m vi-iVkVU+vi-ivtv^+...+vsr_1vsr^,_i+• • •+v^v rv -^ Vk+iVivi+l + v;+1vk2vk2+1 + • • • + v; + 1 v; v? + 1 + • • • + v r ^ WW + V*NV2V2 + • • • + V^TYV^ + • • • + nVkmVJ? (3-39) Consider the case where N is even, m is odd, and the network state is currently at memory VT. Each row in (3.39) contains m terms. The rth term of each row is of the form VJT (V?)2. Since neurons states are constrained to be —1 or +1, (V7) 2 always evaluates to one, and thus, the rth term has value VkT. The remaining m — 1 terms are (—1, +1) random variables. As stated earlier, m is assumed to be odd. TO — 1, is therefore, even. Each row of (3.39) consists of the sum of TO — 1 odd terms and Vkr which is odd. Thus, the sum of each row is odd. There are N — 1 rows in (3.39). Row k is missing because Tkk is set to zero to smooth the energy surface, as discussed earlier. With N even, Sk is the sum of an odd number of odd terms. Therefore, Sk must be odd. Conversely, if TV is odd, Sk would be the sum of an even number of terms. Or, if m is even , each row would contribute an even intermediate sum. In those cases, Sk would be even. Analogous analysis can be applied to the other models to predict the parity of the neuron signals. As shown in Appendix A, the intermediate sum of each row is given by R,um = 2SB - 2u> - 2y + TO - 1 + Vk (3.40) w, x, and y are integer random variables and are defined in Appendix A. Since all random variables are multiplied by a factor of 2, R,Um can only change in steps of two. Therefore, Sk only takes on alternate values. However, the simple transform of dividing Sk by two would ensure that it will change in unity Chapter 3. Unstable Memory Analysis 26 steps. The approximation function must be parameterized with the statistics of the transformed Sk, and the arguments to the function must also be divided by two. The effect of the transform on the expectations are E | | ] = 5 S • .(Ml, and If the rows in (3.39) were independent and no row dominates, the distribution of 4?- would be approximated by the Gaussian distribution. This is a direct result of the central limit theorem in statistics. In the present case, the rows are identically distributed but not truly independent. It was felt that the level of independence was sufficiently high to satisfy the relatively lax requirements of the central limit theorem to warrant the use of the Gaussian distribution. To verify the assumption, simulations were performed to gather the frequency at which various values of Sjt occurs. In the simulations, 640,000 samples of Sk were taken. To generate the theoretical frequencies, a Gaussian distribution biased with mean and variance of 4^ - was used. Freq(Sk = x) = 640,000 • Gaussian (jis±, <r%^, |) (3.43) The results of the simulations and the theoretical predictions for the analogue Hopfield and the analogue Denker models are plotted in Figure 3.2 and Figure 3.3, respectively. It is evident that the Gaussian distribution offers a good fit with the experimental data. In fact, over much of the range, the predicted and simulated curves are indistinguishable. Equation (3.39) is also applicable to the Palm models. The neurons states are represented by values 0 and 1. Normally, the probability p of a memory bit being set to TRUE is low. For Vtr = 0, terras in (3.39) will have value 1 with an extremely small probability of p3. If the terms were independent, the probability density function of Sk is expected to be Poisson. However, as discussed earlier, the terms are not independent. Figure 3.4 shows the actual frequency of occurrence of various values of Sk and the Poisson predictions. It is clear that the Poisson distribution is not a good approximation over much of the range of values. The shape of the experimental data suggests that the Gamma distribution may be applicable. The Gamma distribution with parameters derived from the theoretical mean and variance is also plotted in Figure 3.4. Although the Gamma distribution offers a better fit to the experimental data over the full range of values, the Poisson distribution is superior over the region of the threshold. The first integral to calculate Chapter 3. Unstable Memory Analysis 27 - 200 0 2 0 0 Figure 3.2: Distribution of St; Analogue Hopfield model, JV = 128, m = 10, p = 0.5 Figure 3.3: Distribution of St; Analogue Denker model, JV = 128, m = 10, p = 0.5 the bit error rate in (3.8) is performed over this region. Figure 3.5 plots the Poisson distribution, the Gamma distribution and the experimental data. One can see that the Poisson curve lies closer to the data points. The Poisson distribute will be used to calculate the bit and memory rates. er 3. Unstable Memory Analysis in o CD 8< cz CD i _ =3 O " o CZ CD cr Poisson Gamma Data 2 1 0 Figure 3.4: Distribution of St; Analogue Palm model, N = 128, TO = 10, p = 0.1 uuuu 8 c 2 =3 6 0 0 0 - H O 4 0 0 0 CD 2 0 0 0 - I threshold Poisson Gamma Data 1 0 -+ *-1 5 Figure 3.5: Distribution of St in the Region of Integration Chapter 3. Unstable Memory Analysis 29 3.9 Software Tools The results of the previous sections were translated into a computer program analyse that calculates the theoretical threshold (tk), the bit error rate (6,), and the memory error rate (me): First, analyse computes the expectations for the model requested. The theoretical E[St] and E[5|] are used to pa-rameterize continuous probability density functions that approximate the distribution of St. For models with Sk constrained to be an odd or even integer, E[^] and E [(^)2] are used instead. Table 3.1 lists the constraints for models considered. Analyse finds the threshold tk by locating the root of (3.7) using the bisection method. It then rounds fj. to the nearest integer attainable by the signal Sk- This threshold is used in our simulations. Analyse calculates the bit error rate by evaluating (3.8) numerically using the trapezoidal rule. The limits of integration are extended by 0.5 to take into account the integer nature of St. This is commonly referred to as the "continuity correction" in statistics [17]. Due to the smoothness of the probability density function of St, only simple numerical algoritms are required. The models were simulated under a variety of conditions. Each trial network had N = 32, 64, 128, or 256 neurons and m = 5, or 10 memory vectors. The memories were generated randomly, one bit at a time. For bipolar models, the probability p that each bit was set to TRUE was 0.30, 0.40, or 0.50. For unipolar models, p was either 0.05 or 0.1. For each set of (m, N,p), 1000 trial networks were simulated. Memories were used one at a time to probe each network. The distribution of St, the bit error rate (fc«), and memory error rate (m«) were measured. The measured optimum threshold (tk) was extracted from the distribution of St. It is denned as the threshold which would yield the minimum number of bit decision errors. 3.10 Results The first objective was to verify that the expectation equations were correct. Sample averages of St and Si were obtained from simulations. These were found to be within 1.0% of theoretical values in all cases. Confidence in the correctness of the equations is, therefore, high. Having ascertained the quality of the relations describing the statistics of St, analyse can be used to calculate the theoretical threshold tk, bit error rate be, and the memory error rate me. The simulation results and theoretical predictions of the optimum threshold tk for bipolar models are plotted in Figure 3.6. The curves represent predicted values, while the marks represent simulation results, probability of a TRUE memory bit of p = 0.3 were plotted. Unbalanced probabilities of TRUE Chapter 3. Unstable Memory Analysis 30 and FALSE bits were chosen to demonstrate the extended range of applicability of our method of analysis. Previous researchers concentrated on balanced TRUE and FALSE bit distributions. The threshold for the p = q — 0.5 case is independent of the network size and memory load. Simulations with p — 0.5 and p = 0.4 were also performed and yielded equally good if not better agreement between predictions and simulation results as the p = 0.3 case. 50 100 200 Number of Neurons Figure 3.6: Predicted and Simulated Optimum Threshold; bipolar models, m = 10, p = 0.3 The results for the Palm models are plotted in Figure 3.7. The difference in thresholds between the analogue and truncated models is minimal. This is because p was small (0.1). The probability that an element of the synaptic matrix is increased by an amount 6 due to aliasing is p7S. Therefore, few elements are affected by the truncation operator. Despite the weakness in the approximation to the probability density function of St discussed in the previous section (3.8), the predicted thresholds fit well with the simulation data. Figure 3.8 shows the relationship between the bit error rate and the size of the network for the bipolar models. The memory load was held at ten vectors. There is, clearly, good agreement between the predicted rates and the simulation results. Due to the extremely small probability of error for the Hopfield network with 256 neurons, it was beyond the computational resources available to obtain a Chapter 3. Unstable Memory Analysis 31 20 100 200 Number of Neurons Figure 3.7: Predicted and Simulated Optimum Threshold; unipolar models, m = 10,p = 0.1 statistically significant measure of the error rate via simulations. Therefore, no simulation data point is available. In 5,120,000 simulation trials, consuming several hours of CPU time, only two bit errors were observed. Obtaining a large enough sample of bit errors would have required hundred of hours of CPU time. Furthermore, developing a random number generator capable of supplying the enormous number of pseudo-random numbers is beyond the scope of this thesis. Chapter 3. Unstable Memory Analysis 32 g 1e-05 -UJ 1o-07 -j 1e-08 -i 1e-09 Simulation Results: + Hopfield * Truncated Hopfield x Denker o Truncated Denker 100 200 Number of Neurons Figure 3.8: Predicted and Simulated Bit Error Rate; bipolar models, m = 10, p = 0.5 0.1 0.001 Simulation Results: * + Palm * Trunc. Palm •T-—~T——T— I 100 Trunc. Palm 200 Number of Neurons Figure 3.9: Predicted and Simulated Bit Error Rate; unipolar models, m = 10, p = 0.1 Chapter 3. Unstable Memory Analysis 33 The predicted and simulated bit error rates for the Palm models are shown in Figure 3.9. Agreement between prediction and simulation results is excellent for smaller networks but degrades for larger ones. The discrepancy is due to the mismatch between the approximation function and the actual distribution of St, as described in section 3.8. It is interesting to note that the performance of the truncated model is superior to that of the analogue model. This result, though counter-intuitive, is predicted by our analysis algorithm. It is speculated that the truncation operation effectively removes the noise contribution to elements of Tu, and thus, aides performance. Figure 3.10 shows the effect of varying the relative probability of TRUE and FALSE bits in the memory vectors. As one can clearly see, the performance of the network degrades rapidly with increasing imbalance in the two probabilities. Balanced p and q also has the benefit of maximizing the information content of each memory vector [24]. Therefore, memory vectors should be encoded such that there is, on average, an equal number of TRUE and FALSE bits. 0.1 g 0.001 UJ m 0.0001 0.01 x Denker o Truncated Denker 1e-05 0.3 0.4 0.5 Probability (p) of TRUE Bits Figure 3.10: Bit Error Rates; bipolar models, JV = 128, m = 10,p= 0.3,0.4, and 0.5 Following the suggestion of Palm [16], the probability p of TRUE bits in the Palm models was kept small to minimize aliasing. Therefore, the desirable range of p is rather narrow. The bit error rate for p = 0.05 and p = 0.1 is plotted in Figure 3.11. The simulation results agree well with the predictions. Chapter 3. Unstable Memory Analysis 34 0.01 o UJ CD CO DC t Simulation Results: + Palm * Trunc. Palm 0.001 0.06 0.08 0.1 Probability (p) of TRUE Bits Figure 3.11: Bit Error Rates; Palm models, N = 128, m = 10, p = 0.05, and 0.1 To determine the memory error rate, the trial networks are initialized to a stored memory, and then allowed to evolve. If any of the neurons changes state, the memory is unstable. Memories in each trial network are tested sequentially. The portion of unstable memories compared to the total number of memories tested gives the experimental memory error rate (m«). The predicted and simulation rates for the bipolar models are plotted in Figure 3.12. There is excellent agreement for the Hopfield models and good agreement for the Denker models. The predictions tend to over-estimate the memory error rate because the analysis algorithm assumes that bit errors are independent. It has been shown that the signal received by the neurons are positively correlated; therefore, over-estimation is to be expected. The error rates for the Palm models are plotted in Figure 3.13. The performance of the analytic model, in this case, is poor. The model drastically over-estimates the memory error rate. There are two causes to this problem. As discussed previously, the approximations of the probability density functions of Sk were relatively imprecise and tend to moderately over-estimate the bit error rate. Furthermore, the correlation between signals Sk received by various neurons is large and positive. Thus, the independence assumptions of (3.9) is invalid. In view of the simulation results, our model should be restricted to prediction of bit error rates of Palm models and not used for memory error rates. Chapter 3. Unstable Memory Analysis o 0.1 -CD •S 0 0 1 ^ to c ID ? 0.001 •8 . O 2 16-04 -C L 1O-05 X Xrunc. Denker Denker'^ ""-.^ . Hopfield , Trunc. Hopfield Simulation Results: + Hopfield * Truncated Hopfield x Denker o Truncated Denker ^ * I 100 -r~ 200 Number of Neurons Figure 3.12: Predicted and Simulated Memory Error Rate; bipolar models, m = 10, p o E a> to i °'1 o IP !5 CO X ) o 0.01 + * Simulation Results: + Palm * Trunc. Palm 100 Palm Trunc. Palm 2 0 0 Number of Neurons Figure 3.13: Predicted and Simulated Memory Error Rate; Palm models, m = 10, p: Chapter 3. Unstable Memory Analysis 36 Having established that the error rates for the bipolar models can be predicted with good accuracy, it is possible to use our analytic model to guide network development. For a particular application where the maximum tolerable memory error rate is known, the model can be used to calculate the level of memory loading that will meet the constraint. Conversely, given the target memory error rate, and memory loading, the model can be used to determine the size of network needed to satisfy these constraints. Figure 3.14 shows the number of memories that can be stored in Hopfield networks of various sizes at memory error rates of 10-1,10-2, and 10~3. 80 60 -O CO ca-cti o E CD 20 -me=10-3 500 Number of Neurons Figure 3.14: Predicted Capacity of Hopfield Networks; p = 0.5 1000 The curves in Figure 3.14 are virtually linear with a slight downward curvature. Other researchers [1, 12] investigating the capacity of Hopfield networks as the number of neurons approach infinity, have shown that the capacity is of the order: (3.44) The capacity relation is plotted with the constant error curve of me = 10"1 in Figure 3.15. To convert (3.44) from a proportionality relation to an equation, a scaling factor is found by least squares fitting Chapter 3. Unstable Memory Analysis 37 of the points defining the mt = 10 1 curve. This factor was found to be 0.736. Since both curves are arrived at using completely different methods, their close fit supports the validity of our analytic model. 8 0 0 —( • > • • 1 ' > ' ' 1— 0 500 1000 Number of Neurons Figure 3.15: Comparison with Published Results of Capacity of Hopfield model 3.11 Limitations Of Current Implementations Current VLSI implementations of neural network associative memories with modifiable synapses have all been of limited size. For example, the ASSOCMEM [21] had only 22 neurons while the integrated circuit (IC) produced by Moopenn et al. [14] had 32 neurons. The limiting factor is the size of the synaptic matrix. In the ASSOCMEM, the matrix occupied over 90% of the die core area. The matrix is large because conventional CMOS VLSI fabrication processes are inefficient for implementing random access memories (RAMs). They lack many features found in dedicated RAM processes such has high impedance pull-up structures, polysilicide gate material, and trench capacitors [23]. The designers of ASSOCMEM estimated that a non-programmable circuit could contain as many as 220 neurons. Using a simpler architecture and non-modifiable synapses, Graf et al. [5] produced an IC Chapter 3. Unstable Memory Analysis 38 with 512 neurons. However, using a fixed matrix is not a real solution as the resultant network would be restricted to a predetermined set of functions. The lack of expandability is a major weakness in current implementations. It is impossible to increase the power of the network by combining several smaller circuits. This is because adding more neurons requires augmenting the synaptic matrix with more rows. With the matrix embedded in the circuit and each row dedicated to a particular neuron, there is no convenient way to remap the rows to different neurons. In an analogue implementation of the Hopfield network, high valued "synaptic" resistances are re-quired. These resistances, when combined with the small parasitic capacitances form sufficiently long time constants so that the state differential equation (3.45) is formally similar to (2.5). A second reason for having large resistances is to reduce power dissipation. Unlike conventional digital circuits, where only a small portion of the IC is active at any one time, the entire matrix in a Hopfield network is always active. The network dissipates P = N • -ffi Watts. A modest network with 100 neurons and VDD = BVolts would , for example, require resistances of at least 125fcft, if power dissipation is to be limited to 1 Watt. However, standard fabrication services available to universities, through MOSIS in the United States, or CMC in Canada [25], do not inherently provide compact constructs for realizing such large resistors. N (3.45) where Gu = Tti X synaptic strength to conductance conversion factor vt = input voltage of neuron Jb Uk = output voltage of neuron k Ck — capacitance attributed to the input of neuron k As discussed in previous sections, memories in the Hopfield network and its variants can become unstable. When this occurs, an otherwise error-free probe vector will be contaminated. Unstable memory cannot be eliminated, only their frequency minimized, albeit at a cost of reduced capacity. The models Chapter 3. Unstable Memory Analysis 39 discussed also present a mixed interface to the rest of the system. The programming and probe vectors are digital while the threshold is analog. This complicates system design when the main function of the network is digital and yet an analogue input that is a non-trivial function of the network parameters (m,N,p) is required. Chapter 4 Proposed Associative Memory Architecture 4.1 Model Description The proposed model is based on the JPL model and is designed to address some of the limitations discussed in Section 3.11. The main features of the model are : • guaranteed stable memories, • fully digital implementation, • cascadable to increase capacity, • amenable to discrete time simulation algorithms, • adjustable threshold to take advantage of memory syntax, • use of external RAM permits more neurons per integrated circuit. In the proposed model, neurons are divided into groups. The "one-hot" memory dilution scheme of the JPL models is retained. The synaptic strengths are computed from the diluted memory vectors using sum-of-outer-products rule described by (2.9). The resultant synaptic matrix is stored in an external memory. Networks of different sizes can be accommodated simply by varying the depth and width of the memory. Instead of a continuum of values used in the JPL model, the threshold it here can take on two values; ThoFF if all the neighbours in all the same group as neuron k are all OFF, and ThoN if any of the neighbours is ON. The threshold depends only on the state of the neighbours and is independent of the state of neuron Jfe itself. If ThoN is chosen to be an unattainably large number, groups would be restricted to containing no more than one TRUE bit. The choice of ThoFF is a compromise between reducing spurious TRUE bits, and facilitating desired TRUE bits to turn ON when the current state is moderately far from a memory. Neurons are updated, cyclically, one at a time. The row of the synaptic matrix associated with the selected neuron is read in from the external memory. The signal St = YA=I TkiUi is computed digitally 40 Chapter 4. Proposed Associative Memory Architecture 41 and presented to an external comparator, which compares the signal with the threshold and determines the next state of the neuron. An external comparator can effectively service multiple network modules cascaded to form a large network. Digital techniques are chosen over analogue ones to enhance speed, and compatibility with available CMOS VLSI process technologies, and to ease system integration. The use of digital algorithms allows the construction of simple and fast simulation programs. Until further analytic tools describing the topology of the regions of convergence are developed, simulation is the only avenue to evaluate the performance of a network. Therefore, fast simulation software is vitally important. Due to the high level of fault tolerance of neural network models, noise immunity offered by digital implementations is of little benefit here. It is, definitely, of no hindrance however. The energy function of the proposed model is given by - JV N N k=lt=l t=l and rewriting it explicitly in terms of Uk, one gets j JV JV JV JV ^Uk) = --TkkU^-UkYT^i + ^ U k - Y Y U i T i i U i + X ) T I U I + X T I U I (4-2) »'=1 i=l »=1 »=1 t € {Gk> i*k i * {ahy t?k In (4.2) above, {Gk} denotes the set of neurons in the same group as neuron k. The energy function is parabolic in Uk and concave down. From the discussion in Section 2.1, Tkk can be set to zero with no lost of generality. Then, the energy function is linear in Uk. An update rule that ensures monotonic decrease in energy for (4.2) when neuron k changes state is listed below. i ; T2LiTuUi>tk Uh(t + 1)= Uk(t) ; Y^LiThiUi=tk 0 J Z^iTkiUi<tk (4.3) 4.1.1 Threshold Determination Keeping in mind the constraint that groups may only have one TRUE bit each, the maximum input to a neuron is equal to the number of groups g in the network. The minimum value oiThoN necessary to ensure having "one-hot" groups is, therefore, g. To determine a suitable value for threshold ThoFFi consider the case where the network is near a memory. In particular, neurons in g — 1 groups have states Chapter 4. Proposed Associative Memory Architecture 42 corresponding to the diluted memory vector Vr exactly. In the one remaining group, the TRUE bit is displaced. The network is at Hamming distance of two from Vr. Formally, then, we have Ui = V; ; V« : 1 VI = 0, Uk = 1 VJ, = 1, UV = 0 Ut would ideally evolve to 0, followed by Uf moving to 1. The high ThoN prevents Uk1 from becoming TRUE until Uk has reset. The expectations of the signal impinging on neuron k are given by = (9 "I) • [ ! - ( ! - P T ' 1 ] (4-4) and m-l E[Sl] = - l ) £ j ^ ( tD , m - l ) [ l - ( l - ! > ) • ] + w=l (ff - l)(ff - 2) £ Bp(w, m - 1) [1 - (1 - p)«]2 (4.5) 10 = 1 After neuron k has taken on state 0, the group containing neuron k would consists solely of FALSE neurons, and the remaining g—1 groups would be unchanged from the original assumption of correspond-ing to diluted memory V*r exactly. As previously discussed, bit k' of memory Vr is TRUE (VI/ = 1.) The signal received by the neuron is Sf = <=i ^ Vi^t- Consider one of the g — 1 neurons that have state 1, say neuron /. It contributes TfiUi to the signal St1- Expanding Ttn by its sum-of-outer-products definition and recalling that Ui = V{ = 1, one gets TViUi = r(vi,v,1 + vt2,v,2 + ... + vi,vr + -" + vmv/n)vf = T(V^ V,1-|-Vi2,V,2 + -..-|-l-l-|-..--l-VmVm)l = 1 (4.6) Since each of the g—1 TRUE neurons contributes 1 to the signal, St1 has value g — 1 deterrninistically. ThoFF must be set sufficiently far above E[Si] such that neuron k will not remain at value 1 due to noise, but below g — 1 so that neuron k' can attain state 1. A more stringent constraint on the upper bound of ThoFF is related to controlling the extent of the basin of attraction around memories. A large ThoFF implies a small basin, and vice versa. This is because, the signal Sk is proportional to the number of groups in the current state vector matching the target memory. Chapter 4. Proposed Associative Memory Architecture 43 E[Sy + * < ThoFF < 9-1 (4.7) Consider the case where all groups except one correspond to diluted memory Vr, and the remaining group, G, contains all FALSE bits. This is the penultimate configuration of the network prior to converging to any memory, regardless of the initial probe vector. The convergence frequency from any probe vector is limited by the convergence frequency from this configuration. The ability of the network to converge from this state is, therefore, of critical importance. For VJ = 0, the signal Sk impinging on neuron k is formed by summing many pseudo-independent Bernoulli random variables. Its probability density function fsh (x | VJ = 0) can, therefore, be approximated by the Gaussian or Poisson distribution, depending on the mean. On the other hand, given VJT, = 1, neuron k' receives a deterministic signal of 9 — 1. The probability density function is, of course, the Dirac delta function. Figure 4.1 shows the probability density function of 5t for VJ = 0 and VJ = 1. The network will fail to converge to a memory, 0.3 - | 1 1 Figure 4.1: Probability density function of signal Sk for VJ = 0 and VJ = 1 if any of the neurons in group G that should remain at 0 receives a signal greater than ThoFF, and is Chapter 4. Proposed Associative Memory Architecture 44 selected for update befoie the neuron that should change to state 1. The spurious neuron will become TRUE, and prevent the desired neuron from doing so. The probability that a neuron will spuriously receive a signal Si above ThoFF is given by *.= /" fs„(x\Vrk=l)dx (4.8) With randomly generated memories, the desired neuron can be at any position in the group with equal probability p = -fr. In order for a neuron to evaluate to TRUE, all neurons in G selected before it must evaluate to FALSE. Thus, a network will fail to converge to a memory from this penultimate state with probability J V / , As a compromise between convergence at the penultimate state, and extent of the region of attraction, ThoFF is chosen to be a few standard deviations of Si above E[Si]. That is, ThoFF = *(E[Si 2]-E 2[S t]) + E[Si] = xa + ii; 2<x<4 (4.10) 4.1.2 M e m o r y Stabi l i ty Memories in the proposed model are always stable. Absolute stability can be shown by considering the case where the network is at a memory. The neurons that are in the TRUE state will receive signal of g — 1, and have threshold ThoFF- Since ThoFF is always below g — 1 (See expression (4.7)), these neurons will, therefore, remain in the TRUE state. The signal received by the neurons in the FALSE state is immaterial. As discussed previously, Thou is set to an unattainably high value. Thus, the FALSE neurons will also remain in their original state. Since no neurons can change state, memories are unconditionally stable. 4.1.3 Frequency o f Convergence When a network is probed by an undiluted vector that is 1 Hamming distance from a memory vector, the diluted vector will have g — 1 groups corresponding to the diluted memory, and the remaining group differing. The single mismatched bit displaces the TRUE bit in one of the groups in the diluted form of the probe. To converge to the memory, the spurious TRUE bit must first become FALSE, followed by the correct bit becoming TRUE. Due to the high ThoNt all neurons in the group selected prior to the Chapter 4. Proposed Associative Memory Architecture 45 spurious TRUE bit will remain FALSE. In order to converge error-free, all the intervening bits selected between the spurious and correct TRUE bits must evaluate to FALSE. These bits will be FALSE with probability 1 — 6,. The correct TRUE bit can be in any of remaining y — 1 positions in the group with equal probability. Therefore, the network will converge to the memory with probability The 1 — be terms represents the probability of the spurious TRUE bit will evaluate to FALSE, and the t — 1 exponent represents the number of intervening bits that must remain FALSE if the correct bit is the i 1 * bit to be updated. This line of reasoning can be extended to probes that are further from the target memory vectors. However, the mathematical complexity increases rapidly with distance. For relatively close probes, one must take into account the sharing of groups by the mismatched bits. If the mismatched bits share common groups, the number of groups affected in the diluted vector is reduced. For example, if all the mismatches between the probe and the target memory reside in the same group, all except one group in the diluted vector will be unaffected. For probes with large Hamming distances to the target memory, one must take into account additionally the similarity of the probe to all the other memories. 4.2 Sys tem A r c h i t e c t u r e A block diagram of an associative memory co-processor system incorporating the proposed neural net-work modules (NNM) is shown in Figure 4.2. The co-processor system appears to the main processor as a memory mapped slave device. The co-processor occupies three contiguous blocks of memory in the main processor's address space. The first block is assigned to the local memory modules which store the synaptic matrix Tki, and can be write-only from the main processor. The second block is assigned to the network state registers in the NNMs. The main processor loads the probe vector and reads the final network states through these registers. The remaining block addresses the co-processor control register. Contiguous addresses are used to simplify DMA transfers from the main processor. At the discretion of the implementor, the co-processor can be purely passive, or interrupt the main processor when the network converges. The Motorola 68000 asynchronous memory interface protocol [26] is used to communicate between the two processor systems. At the heart of the co-processor system is a set of neural network modules. A co-processor imple-mentation can incorporate multiple NNMs which contain 128 neurons each. Each NNM sequentially (4.11) Chapter 4. Proposed Associative Memory Architecture 46 ActiveCopy NextCopy Copy Controller Memory Module(s) Local Memory Bus Neural Network Module(s) ISum ThSel CurrState Comparator Module Memory Interface 68000 Bus Control Bus Control Register NextState Ready Convergence Detector Figure 4.2: Block diagram of associative memory co-processor system selects a neuron as update candidate. At any one time, there are as many candidates as there are NNMs in the system. The NNM places the state of its candidate neuron on the CurrState signal. If any of the neighbours in the candidate's group is ON, the NNM sets its ThSel signal to UseThON, otherwise to UseThOFF. The size of groups is selectable among 32, 16, and 8 neurons. The neuron Chapter 4. Proposed Associative Memory Architecture 47 that will eventually be updated is the one residing in the NNM selected by the copy controller module. If neuron k is to be updated, it would reside in NNM copy c = (k -f 128) and have intra-NNM index I = (k modulus 128). Each NNM reads the rows of the synaptic matrix from the associated local memory module and performs the summation J2i T^Ui over neurons * within the module. This intermediate sum is made available to the comparator module via the I Sum bus. Since NNMs operate in synchrony, all candidate neurons share a common index /. Associated with each network module is a local high speed static memory. It contains the vertical slice of the synaptic matrix Tti associated with the neurons in the attached NNM. Ideally, the memory would contain as many words as there are neurons in the system and each word would be 128 bits wide, to match the number of neurons in each NNM. However, such wide word widths would require too many pins on the NNM to be practical. Memories are, therefore, 32 bits wide and low-order interleaved. In each neuron update cycle, four blocks of memory are read simultaneously and the data words time-multiplexed onto the local data bus. The data is re-assembled in a pipeline register in the NNM. This allows access at speeds approaching that of the 128 bit width configuration with lower pin requirements. The interface between the local memory and the NNM will be kept simple to maximize speed. The synaptic matrix is stored in ascending order of rows. The address supplied to the local memory modules consists of three parts; the active copy index, the intra-NNM index of the candidate neuron, and the low-order interleaf index. By choosing the number of neurons per NNM and the level of interleaving to be powers of two, the memory address can be formed by simply concatenating its three constituent parts. Within each update cycle, the interleaf index is stepped from 0 to 3 to retrieve the 128 slice of the synaptic matrix in 32 bit blocks. The other components of the address are altered once per cycle. The local memory is relatively small in size but has very stringent speed requirements. Within each neuron update cycle, the memory must be able to perform one read access and deliver the data in four parts to the data bus. The minimum memory cycle time is given by ieyelt = iaddr + <oeeeij + 4t,e(ect + 4t,ttup (4-12) where taddr is the time taken by the address issued by the NNM to settle, taeee„ is the address read access time of the RAM chips, t,elect is word selection delay, and ttetup is the set up time of the pipeline register inside the NNM. With current technology, memory integrated circuits with extremely short access time is readily available off-the-shelf. The Fujitsu MB81C68A-45-W 4096X4 bit CMOS RAM [27], for example, has both address access time and read cycle time of 45n«. Conservatively estimating Chapter 4. Proposed Associative Memory Architecture 48 taddr *° be 10ns, t,euet to be 5ns, and t,ttup *° be 10n«, a minimum neuron update cycle time of 115ns is achievable. Static memory devices are used to implement the local memory modules. The main reason is that the large capacity offered by dynamic memories is not necessary, as the local memory is no more than several thousand words deep. Furthermore, static memories tend to be faster, in terms of read access delay and in particular, cycle time. Due to internal precharge requirements, dynamic memories need a quiescent period between accesses, thus prolonging the cycle time. Simplicity of implementation also favours static memories. They do not require complex address strobes (eg. RAS/CAS) found on dynamic memories. The stringent timing requirements of these signals virtually dictate the use of delay lines in the implementation. The need for periodic refresh in dynamic memories is not a drawback here, however. In their normal operation, the NNMs will periodically read the entire memory space. The main processor programmes the co-processor by loading synaptic strengths through the memory interface module. The interface performs protocol translation between the Motorola 68000 bus protocol and local memory bus protocol, and distributes downloaded data to the appropriate copy of the memory. This interface is of low complexity as the local memory has a simple organization and is much faster than required by the main processor. The NNMs will be forced inactive during main processor accesses to the local memory to obviate the need for bus arbitration. Minimally, the memory interface module need only provide a unidirectional data path as the main processor does not read from the local memory. However, a bi-directional path is desirable to facilitate testing of the memory module by reading back the down-loaded information. The main processor controls the operation and checks the status of the associative memory co-processor by accessing the command register. The register contains fields for RESET, RUN/STOP, READY, ERROR, WEIGHT and THRESHOLD. Setting the RESET bit causes a reset to the co-processor system. The RUN/STOP field enables and disables the operations of the NNMs. When set to RUN, the NNM's are enabled and assume mastership over the local memory bus. The main processor sets the field to STOP when it wishes to load a copy of the synaptic matrix to the local memory or load a new probe vector to the NNMs. When stopped, the NNMs stop neuron updates and releases bus mastership of the local memory bus. The READY bit is set by the convergence detector when it senses that the network has converged to its final state. The ERROR field is reserved for reporting of error conditions in the co-processor system. The number of groups in each NNM is controlled by the WEIGHT field, and is selectable among 4, 8 or 16 groups, which correspond to group sizes of 32,16, and Chapter 4. Proposed Associative Memory Architecture 49 8 neurons, respectively. Higher weights are intended for smaller networks and lower weights for larger networks. The THRESHOLD field stores the threshold ThoFF used by the comparator module. The copy controller is a binary counter that determines the active NNM. That is, the copy of NNM containing the neuron being updated. Its modulus is the number of copies of NNMs present in the system. The counter is incremented by an NNM when its candidate selection counter cycles back to 0 from 127. Any NNM in the system will suffice as they all act in synchrony. The output of the controller forms the most significant bits of the address when the NNMs access the local memory modules. The comparator module uses the active copy information to select the appropriate CurrState and ThSel signals. Of course, only the NNM selected by the copy controller will update its candidate neuron according to the next neuron state signal Next State generated by the comparator module. The comparator module implements update rule (4.3), and places the result on signal NextState. Each NNM supplies the module with an intermediate sum ISum, a threshold selection signal ThSel, and a current neuron state signal CurrState. The comparator selects the active ThSel and CurrState using the active copy information generated by the copy controller module. The signal Sk is the sum of the intermediate sums, and tjt is set to either Thotr or ThoFF, depending on whether the active ThSel has value UseThON or UaeThOFF, respectively. The active current neuron state is needed only if Sk equals tk, in which case, it is transferred onto the signal NextState. As its name suggests, the convergence detector module monitors the network for convergence. It compares the output of the comparator module with the CurrState signal of the active NNM. It declares that the network has converged and sets the READY field in the control register, if all the neurons have had a chance to update their state, and no change occurred. Minimal hardware is required to implement this function. A loadable counter with modulus JV will suffice. At the end of each update cycle, the counter is either set to JV — 1 if the active neuron changes state,' or decremented by one if the neuron retains its previous state. When the counter reaches zero, the network has converged. The intermediate sum generated by each NNM is presented to the comparator module as a four bit binary number. Thus, sums ranging from 0 to 15 are representable. This representation is chosen to take advantage of the wide availability of fast 4-bit parallel adders [28], and is exact for NNMs with four or eight groups. On the other hand, if the number of groups (weight) was set to sixteen, truncation will occur. However, for networks with 128 or fewer neurons, the maximum weight is less than 15. Consequently, truncation cannot affect the result of the update rule for these networks. Simulations were performed on large networks of 1024 neurons using full and truncated intermediate sums. Figure Chapter 4. Proposed Associative Memory Architecture 50 4.3 shows the frequency networks converge to the closest memory vector under various weights and memory loadings. Under light memory loads (m = 16), simulations show that network performance is unaffected by truncation. The rate of convergence for full range and 4-bit representation of the sum are identical. For memory loading at or above m = 48, using 4-bit representation, zero convergence was observed if g = 128 (equivalent to 16 groups per NNM.) This is to be expected as truncation reduces the intermediate sums such that the maximum attainable signal is always below ThoFF- Consequently, all neurons will evolve to the FALSE state. Simulations were performed with full range representation to examine the effect of high weight and high memory load. The results for the low memory load cases are plotted in solid lines, and that for high load cases in dotted lines. It is clear that at memory loadings (m = 48) where truncation presents difficulties, low weight networks outperform high weight networks even if full range intermediate sums were used. Thus, weights of sixteen groups per NNM would be avoided with or without truncation. Therefore, using a four bit representation of the intermediate sums has minimal disadvantage, and significant advantage in being able to use off-the-shelf adders. 0 20 4 0 6 0 8 0 100 Hamming distance (h) Figure 4.3: Frequency of convergence of proposed network (N = 1024 neurons, ThoFF — V- + 3<r) The comparator module computes Sk by summing all the intermediate sums generated by the NNMs Chapter 4. Proposed Associative Memory Architecture 51 in the co-processor system. Using 4-bit parallel adders, the sum can be computed by a binary-tree structure of adders. The tree has number of levels L = log2(j^g)- The height of structure and, therefore, delay varies only as the logarithm of the number of neurons. Taking worst case delay times of the TTL 4-bit parallel adder SJV74S283, time needed to compute St is given by teompar. = 18ns + (L - l)30ns (4.13) For example, a network of 1024 neurons requires a three level adder tree, and St can be calculated in 78ns. Comparison of St with it can be performed with a SJV74AS885 8-bit magnitude comparator and takes 18ns, worst case. The co-processor system incorporates a two stage pipeline. The first is a prefetch stage. It retrieves the row of the synaptic matrix associated with the neuron to be updated in the next cycle. As discussed previously, the rows are accessed in four blocks and re-assembled in the first pipeline stage. The second stage can be viewed as an execution stage. It evaluates the update rule (4.3). Included in this stage are the circuitry in the NNMs for calculating the intermediate sums, and the comparator module. For optimum speed-up, stages in a pipelined system must have similar cycle times. The cycle time of the prefetch stage as given by (4.12) is 115ns, and is virtually independent of network size. The times for the second stage is the combination of delays in computing the intermediate sums in the NNMs and the delay in the comparator module. To balance the two cycle times, the delay in computing the intermediate sum is constrained by tlSum — tmemcyclt ^compart < 115n« - (18n« + (L — l)30ns) (4.14) The allusion to off-the-shelf components in implementing the various modules, is for reference only. It is intended to provide a basis for evaluating the complexity and timing performance of the co-processor system. The analysis showed that only simple functions are required. Ideally, the ancillary modules would be intergrated in a support VLSI circuit. A co-processor system, so implemented, would consists of a set of neural network module IC's, some memory components, and the support IC incorporating the ancillary modules. Chapter 4. Proposed Associative Memory Architecture 52 4.3 Neural Network Module Architecture The architecture of the neural network module is optimized for VLSI implementation by using only simple regular structures. A block diagram of the module is shown in Figure 4.4. er 4. Proposed Associative Memory Architecture 53 LocMemControl LocMemAddr ActiveCopy Controller Block LoodNextState 68000 Bus Interface Block Controls A NextState Neuron States NextCopy Neuron Block ThSel NeuronSel CurrState 'ki A Accumulator Block JSum Figure 4.4: Functional block diagram of neural network module Chapter 4. Proposed Associative Memory Architecture 54 The controller block incorporates a small state machine that controls the operations of the NNM. It monitors the control bus for instructions from the main processor and takes the appropriate actions. Another function performed by this block is to sequentially select candidate neurons. When the NNM accesses the local memory, the controller will effect the loading of the pipeline register and provide the necessary control signals and the intra-NNM index and interleaf index parts of the address. In terms of hardware, the controller contains a small state machine, a candidate index counter, a candidate selector, and an interleaf index counter. The candidate selector enables the selected neuron to assume the state indicated by the comparator module. At the discretion of the implementor, the selector can be a decoder keyed off the candidate index counter or a 128-bit shift-register containing a "walking-one" pattern. The choice of implementation is too dependent on the actual topology of the VLSI circuit to be finalized at this stage. The accumulator block performs two functions. It computes the products TH • Ui for all the neurons in the module and reports the sum of the products on bus I Sum. Given that the value of 7]^ and Ui are either 0 or 1, multiplication is equivalent to a logical AND operation. The summation circuitry must count the number of ones in the products. This is not as formidable a task as it would appear on the surface. As previously discussed, there can only be a maximum of one TRUE bit per group and there is a maximum of sixteen groups per NNM. The circuit, therefore, has only sixteen inputs, not 128. Counting the number of TRUE bits belong to the symmetric class of functions [11]. These functions are known to be inefficient to implement with conventional multi-input gates (eg. AND, OR, etc.) and particularly suited to construction with switch-type logic. Thus, building the summation circuitry with full-adders will likely be a poor choice. On the other hand, the tally circuit described in [13] is optimized for MOS implementation. Another alternative to is to perform the summation with a set of shifters, one per group. In this scheme, a "walking one" pattern is shifted up by one bit if the product for the group is 1 and the pattern unchanged if the product is 0. The position of the TRUE bit in the pattern encodes the sum. The tally circuit requires fewer transistors, while the shifter circuit may better match the topology of the neuron block. The final choice of the accumulator circuit is left with the implementor. The neuron block contains storage for 128 neurons. Neurons can be selected individually and its state set to the Next State signal generated by the comparator module. The ThSel is set to UseThOFF if all the neighbours in the same groups as the selected neuron is FALSE, and to UseThON otherwise. Additionally, neurons can be set in groups of 32 from the main processor via the interface block. In this way, the main processor loads a probe vector into the co-processor system. The state of the neurons is Chapter 4. Proposed Associative Memory Architecture 55 readable in groups of 32 by the main processor. The handshaking functions associated with the main processor bus is performed by the interface block. 4.4 Simulations The performance of the proposed model was studied using Monte-Carlo techniques. A computer program pathsim was written to simulate the behaviour of the network. A large variety of parameters can be set via command line switches. For example, the following invocation pathsim -n 1024 -m 32 -P 50 -g 128 -T 88 -s 30 -p 30 -h 8 simulates networks with 1024 neurons, 32 stored memories whose bits have a 50% probability of being TRUE, 128 groups, and a threshold (THOFF) of 88. The remaining switches specifies that 30 random networks are simulated and each network is probed with 30 vectors that are 8 Hamming distance from the closest memory vector. Pathsim reports the number of trials that successfully converge to any memory, and the number that converges to the closest or one of the closest memories. Due to the dilution process, the number of information bits J in a memory vector is not the same as the number of neurons N. In fact, I = plog2 ("7)- e a c n trial network, pathsim generates m random memories V = \ y 1 , V 2 , . . . V m ] . In order to minimize the statistical dependence of bits within each memory and bits between memories, the memories are generated via individual calls to the UNIX built-in function random. The bits in each memory are divided into g groups and expanded by a b 2h decoder. The diluted vectors form the set V = [V1, V2,... Vm]. The synaptic strengths are calculated from the diluted vectors using (2.9). Next, pathsim forms the probe vector P. It randomly selects one of the undiluted memory vectors as base and randomly inverts h of the bits. The Hamming distances of P to the vectors in V are computed. If the minimum Hamming distance is smaller than the desired value h, another bit is selected randomly and inverted. The process of distance computation, and bit inversion is repeated until the desired Hamming distance is achieved. The probe vector is diluted using the same scheme applied to the memory vectors. After an appropriate probe vector is formed, the Hamming distances between the probe and all the memory vectors are computed. The trial network is then initialized to correspond to the diluted probe vector V, and is allowed to evolve. When no more state changes occur, the network has settled. The final state of the network is compared to the set of diluted memories. Convergence is successful if the Chapter 4. Proposed Associative Memory Architecture 56 final state is in V\ and is optimum if the network terminates at one of the memories whose undiluted form has the smallest Hamming distance from P. 4.5 Performance of Proposed Network Simulation results of the proposed network under a large variety of parameters are presented in this section. The performance of a network is its ability to converge to the closest memory when probed with an initial vector that is some distance from the memory. In the normal operation of a content-addressable memory, one often starts with partial information about a memory vector and sets the remaining unknown bits is such a way as to minimize the Hamming distance based on knowledge of the distribution of memory bits. The network is then allowed to evolve and ideally converge upon a memory vector that is closest to the probe vector. If p, the probability of a memory bit being TRUE is 0.5, the unknown bits can be set randomly. If p favours a particular bit value, all the unknown bits should be set to the favoured value. It is expected that for larger networks, the probe can be further from the target memory than for smaller networks and still achieve the same level of performance. Therefore, simply plotting the frequency of convergence verses Hamming distance would not be very illustrative. On the other hand, normalizing the Hamming distance by the information content (I) would yield a metric that is applicable to networks of all sizes, and given by h = j. It has the additional advantage of letting one estimate the fraction (k) of the target memory that must be known, a priori, to obtain a particular level of performance. The quantities h and k are related by: Tt = l — —-—; as = max (p, 1 — p) (4.15) 1 — SB Figure 4.5 shows the performance of typical networks with threshold ThoFF is set to (/i + 3<r). Networks ranging from 32 to 1024 neurons are plotted. The behaviour is consistent over a wide range of network sizes. As the Hamming distance between the probe vector and the closest memory grows large the frequency of convergence decrease. The frequency of convergence remains steady and high for relatively small Hamming distances and falls off rapidly as the distance approaches a critical value hcrit- The point at which performance falters appears to be rather low (/ier»t & 0.25). However, if one considers the amount of knowledge about the target memory required to achieve good performance, one finds that the level necessary is very reasonable. Figure 4.6 replots the same network results as figure 4.5 with normalized knowledge replacing normalized Hamming distance, in the abscissa. For Chapter 4. Proposed Associative Memory Architecture 57 larger networks, the degree of knowledge about the target memory vector must be approximately 70% to obtain good frequency of convergence. This appears to be reasonable requirement, if one considers that at 50% knowledge, the network is supplied with equal amounts of signal and noise. At 70%, signal bits out-number noise bit by a 2.3 : 1 margin. Normalized Hamming Distance Figure 4.5: Performance of Typical Networks as a Function of Normalized Hamming Distance Figure 4.5 shows that all networks having a frequency of convergence of unity at zero Hamming distance. This is a direct result of memories being unconditionally stable in the proposed model. The network will not change state when probed by a memory vector, and will, therefore, always converge to the optimal (starting) memory. When probed by vectors with normalized Hamming distance less than herit the networks converge to the optimal memory at a frequency close to that of Hamming distance one (h = j.) Therefore, the performance at h = 1 gives a good indication of the performance in the useful region of the network. The frequency of convergence at such a distance was discussed in Section 4.1.3, and (4.11) was developed to predict the convergence frequency. Figure 4.7 shows the predicted and simulated frequency of convergence from Hamming distance of 1 for the proposed network under various loading conditions. Chapter 4. Proposed Associative Memory Architecture 58 0.4 0.6 0.8 1 Normalized Knowledge of Target Figure 4.6: Performance of Typical Networks as a Function of Normalized Knowledge of Target 1 -ThOFF = \L+3Q CD U c CD O ) CD > c 0.8 -O c CD 0.6 ~ ThOFF = \L+2a Simulation Results: X ThOFF = \L+3o + ThOFF = \i+2o 20 40 60 I 80 100 Memories Stored Figure 4.7: Predicted and Simulated Frequency of Convergence, h = 1, JV = 1024, g = 64 Chapter 4. Proposed Associative Memory Architecture 59 In predicting the frequency of convergence, it is necessary to estimate the probability that a neuron will spuriously receive a larger than expected signal. This probability is given by (4.8). The statistics of the signal received by each neuron is formally similar to that of the truncated Palm model. Both models use (0,1) to represent TRUE and FALSE values, and the synaptic matrix is generated identically. Due to these similarities, the difficulties in fitting a standard function to the probability density function of Sk in the Palm models are also encountered here. The accuracy of the theoretical bit error rate given by (4.8) is not expected to be very high, and is intended to guide the user to a threshold suitable for the application at hand. For large networks, the probability density function of the signal is assumed to be Gaussian. By setting the threshold ThoFF to be a function of the standard deviation of Sk, the close-field frequency of convergence can be made independent of the memory loading. The small deviation from linearity of the predicted frequency is caused by rounding of the threshold to integer values. As can be seen in Figure 4.7, the prediction is better at higher memory loads. This is because the number of terms contributing to the signal Sk is proportional to the number of memories. With a larger number of terms, the distribution of Sk becomes more closely approximated by the Gaussian distribution function. The performance of a network can be characterized by a pair of figures of merit. The first, Chi, measures the frequency of convergence of a network probed by undiluted vectors of Hamming distance one from a memory. As shown in Figure 4.5, the networks converges to the optimal memory with probability close to Chi if the probe is within the region of convergence. The normalized radius of that region, denoted h~crit, is defined, rather arbitrarily, as the normalized Hamming distance at which the frequency of convergence falls to 50%. The dependencies of Chi a n ( i h~crit on the threshold, the size of the network, the dilution factor, and the memory loading are presented below. 4.5.1 Effects of Threshold The character of the proposed network can be adjusted by varying the threshold THOFF- When ThoFF is set to a high value, Chi tends to unity, and the frequency of convergence remains constant around a tight, well defined region around each memory. However, as the Hamming distance of the probe ventures beyond the region of convergence, the performance of the networks falls precipitously. On the other hand, HThoFP is set to a low value, Chi is also lower but the rate at which performance declines as Hamming distance increases is much gentler. The frequency of convergence of a network with ThoFF set at 2, 3, and 4 standard deviations above the mean is plotted in Figure 4.8. The threshold should be Chapter 4. Proposed Associative Memory Architecture 60 set according to the application at hand. If the probes are expected to be relatively error-free, a high threshold is desirable in order to maximize Chi- The smaller region of convergence is of little concern if the probes are very unlikely to fall outside the region. On the other hand, if the network must operate on highly corrupted vectors, a lower threshold is desirable, in order to obtain useful results for high Hamming distance probes, even though the ability to converge at low Harnming distance is somewhat reduced. Figure 4.9 and 4.10 shows simulation results of varying the threshold on Chi and herit for a sample network. In general, setting the threshold using: ThoFF = /i + x(r; 2.0 < x < 4.0 (4.16) offers good compromise between convergence at close proximity and radius of region of convergence. 0 20 4 0 60 8 0 Hamming Distance Figure 4.8: Frequency of Convergence Vs. Hamming Distance; (N = 1024, g = 64, m = 80) Chapter 4. Proposed Associative Memory Architecture 61 u+2o u+3o u+4o Threshold Figure 4.9: Maximum Frequency of Convergence Chi Vs. Threshold; (N = 1024, g = 64, m = 80) K+2o H+3o ji+4o Threshold Figure 4.10: Normalized Radius of Convergence h^n Vs. Threshold; (N = 1024, g = 64, m = 80) Chapter 4. Proposed Associative Memory Architecture 62 The scheme of setting the threshold using (4.16) is intended mainly for large networks with high memory loading. In those cases, the probability density function of the signal Sk is smooth, and conse-quently, the performance of the networks is relatively insensitive to small changes in threshold. However, the mean and standard deviation can both be small if the network is small, memory loading is low, or if the dilution factor is high. The probability density function of Sk is highly discontinuous. The optimal threshold can only be discovered via simulations. For example, a network with N = 384, m = 10, and dilution factor D = 6.4, the mean and standard deviation are 0.096, and 0.35, respectively. Equation (4.16) suggested a threshold of one, while simulations indicate that a threshold of two is optimal. 4.5.2 Effects of Dilution As discussed in Section 4.2, neural network modules can have 4, 8, or 16 groups. The group counts correspond to decoding the original memory vectors by factors of 5 =>• 32,4 =5- 16, and 3 => 8, respectively. The dilution factor of a b 2* decoder is given by D = ^ . For a fixed sized network, the dilution factor affects the length of memory vectors that can be stored. But in practical situations, one often knows the length and number of the memory vectors, and wishes to find the appropriate dilution factor and as a by-product the size of network. For example, one may wish to store 60-bit vectors in the proposed network. Depending on the dilution factor chosen, the network would have either 384, 240, or 160 neurons. These networks are simulated with memory loads of 5, 10, and 30 vectors, and the performance plotted in Figures 4.11 and 4.12. The thresholds are chosen to maximize h„it while maintaining Chi to be at least 90%. In general, the optimal level of dilution is a function of the memory loading. A high memory load requires a larger dilution factor, and a low memory load, a smaller dilution factor. This is the case because, with a large number of memories, a high dilution factor reduces the frequency of TRUE bits, and, thus the amount of noise in the synaptic matrix. Otherwise, with m large, and D small, the synaptic matrix would be populated mainly be ones, and therefore, contain little information. For low memory loads, aliasing is occasionally helpful in extending the region of convergence, thus a small dilution factor is desirable. Chapter 4. Proposed Associative Memory Architecture 63 - T 0.9 - | c CD _ J CT CD fZ 0.85 ~ E "a 0.8 ~ Dilution Factor Figure 4.11: Maximum Frequency of Convergence Chi Vs. Dilution Factor; (I = 60) CD u c CO </) b cn s o • o CD . N "S E L _ o 4 5 Dilution Factor Figure 4.12: Normalized Radius of Convergence h^a Vs. Dilution Factor; (7 = 60) Chapter 4. Proposed Associative Memory Architecture 64 4.5.3 Effects of Memory Loading The radius of convergence of the proposed network is inversely proportional to the number of vectors stored. Figure 4.13 shows the normalized radius as a function of the memory load at various levels of dilution. In all three cases, the radius shrinks as the memory load is increased. As shown in Figure 4.11, the maximum frequency of convergence is only minimally affected by the load. This is because Chi is a strong function of the threshold ThoFF, and ThoFF w a s chosen to be the lowest value for which simulation runs yield Chi of at least 0.9. The value of 0.9 is chosen to ensure good convergence ability at close range while maintaining a large radius of convergence. Fixing threshold while changing the memory load would be meaningless. 03 Memories Stored Figure 4.13: Normalized Radius of Convergence h„i% Vs. Memory Loading; (7 = 60) 4.6 Convergence Delay In order for the proposed model to be practical, it must be able to converge to the final state in a short time. As discussed previously, the proposed model will always converge to a final state, and will never oscillate. Simulations were performed on networks ranging from N — 160 to 1024 neurons. It was found that when probed by vectors near h^n, the convergence time increases linearly with network size. The Chapter 4. Proposed Associative Memory Architecture 65 number of cycles (neuron updates)taken to reach the final state is given by: Cjinti = zN; 1.55 < x < 1.87 (4.17) After the network has reached its final state, convergence is detected by noticing that all other neurons have been selected for update and none changes state. Therefore, the average number of cycles required for the network to be declared as having converged is Converged = C]inal + TV (4-18) Using the neuron update cycle time (<cycle) of 115ns derived in Section 4.2, the proposed network will converge in tcanverge = (* + 1) Woyele < 3 X TV X 115n« (4.19) 4.7 Comparison with Existing Models A set of neural network models reported in the literature is described in Chapter 2. Table 2.2 shows that the analogue Hopfield model has the best performance. However, the need for connection strengths programmable over a wide range limits its practicality. The truncated Hopfield model offers much lower hardware complexity with only a small decrease in performance. Therefore, the truncated Hopfield model will be used as the basis of comparison for the proposed model. 4.7.1 Convergence Speed The dynamic behaviour of analogue based networks is described by the set of differential state equations (3.45). The speed of convergence is governed by the R C time constants of the equations. The capacitance C is proportional to the linear dimensions of the synaptic matrix, and thus increases linearly with the size of the network. On the other hand, there is one conduction term G*, contributing current to the capacitance for each neuron. Ideally then, the R C time constants and consequently network speed is independent of network size. In practice, however, the resistances in the synaptic matrix must be scaled with the network size in order to maintain reasonable power dissipation. Therefore, the convergence speed is expected to scale linearly with network size. Neural network CAMs have been studied mainly through computer simulations, with very few hard-ware implementations. Therefore, little information is available on the speed of these devices. In [21], the ASSOCMEM with 22 neurons was reported to converge in "a few microseconds". A 32 neuron Chapter 4. Proposed Associative Memory Architecture 66 implementation of the JPL model has convergence times of 15(is [10]. As shown in (4.19), the conver-gence time of the proposed model is linearly proportional to the size of network. A 22 neuron circuit is expected converge in 7.6/xs, and a 32 neuron circuit in llfis. The speed of the proposed network, there-fore, compares favourably to the reported results. The linear rise in convergence delay of the proposed network (4.19) is formally similar to analogue networks as the networks become large. 4.7.2 M e m o r y Capac i ty Figure 4.14: Performance of the Truncated Hopfield and the Proposed Network (I = 60, D — 4.0) The size of the truncated Hopfield network is equal to the length of the memory vectors. Although it is known that more vectors can be stored in larger networks than smaller ones, no useful dilution schemes have been reported to date. It is beyond the scope of this thesis to develop or disprove the existence of such schemes. Therefore, no dilution is used in this comparison (N = I). Figure 4.14 plots the frequency of convergence of a truncated Hopfield network loaded with m = 5 and ro = 10 memories of length I = 60 bits. Also plotted are the frequencies for the proposed model with D = 4.0, and ro = 5,10 and 30. The curve for the truncated Hopfield network at ro = 30 is omitted as the network was never observed to 1 0 10 Hamming Distance 20 Chapter 4. Proposed Associative Memory Architecture 67 converge to a memory in 10000 simulations trials. The performance of the truncated Hopfield networks is superior to that of the proposed network only at m = 5 with large distance probes. In all other situations, the proposed network offers higher performance. 4.7.3 Network Size Prior to storage in the proposed network, memory vectors are expanded by a b => 2l decode operation. The network then takes advantage of the unique syntax of the diluted vectors in its convergence operation. Consequently, fox a given memory vector length, the neuron count of the proposed network is greater than that of its truncated Hopfield counterpart by a factor of D — \ . At the cost of increased size, the proposed network offers greater storage capacity, and guaranteed stability of memory vectors. By using only unipolar signals, the proposed network is simpler to construct than the truncated Hopfield network which requires bipolar signals. The simplicity gained with unipolar representation can partially offset the effects of dilution. Chapter 5 Conclusions 5.1 Summary Neural network content-addressable memories based on the Hopfield model are susceptible to two failure modes. In the first, the network converges to a spurious stable point that is not one of the stored memories. The second failure mode refers to the situation where the stored memories are not stable points of the network. It is felt that the presence of unstable memories are of particular concern as the network would contaminate an otherwise error-free probe vector. The probability of occurrence of unstable memories was investigated for the analogue and truncated versions of the Hopfield, Denker, and Palm models. The method of analysis was based on Bayes detectors in classical communications theory. As a result of the investigation, an analytic model that predicts the frequency of bit errors and memory errors was developed. The model considers the size of the network, the number of memory vectors stored and the relative frequencies of TRUE and FALSE bits in the memory vectors. Previous researchers have concentrated on the analogue Hopfield model and assumed equal frequencies of TRUE and FALSE bits. Simulations were performed to evaluate the accuracy of the models. It was found that the model yielded excellent agreement for both Hopfield models. The level of agreement is not as good for the Denker and Palm models. However, the analytic model was able to predict the counter-intuitive result of the truncated Palm model having lower error rates than the analogue Palm model. A novel neural network model that guarantees stability of memories has been proposed. The proposed model is based on the JPL model, but is fully digital and optimized for VLSI implementation. The superior convergence ability of the network results from taking advantage of the unique one-hot syntax of the diluted memory vectors. By placing the storage of the synaptic matrix externally, a large number of neurons can be incorporated in a single VLSI circuit, and the total network can be enlarged easily by cascading multiple circuits. Simulations were performed to characterize the proposed network. It was found that at low memory 68 Chapter 5. Conclusions 69 loads, the late of convergence of the proposed network was close to that of the truncated Hopfield model. At high memory loads, the proposed model was clearly superior. However, due to the dilution requirements, the proposed models requires more neurons than the Hopfield models. The dilution factor can be 2.7, 4.0, or 6.4. The optimum amount of dilution is a factor of memory loading, with low dilution for low memory loads and high dilution for high loads. The convergence delay of the proposed model was found to be linear in network size, and is, therefore, formally similar to that of the Hopfield models. 5.2 Recommendations In this work, the memory stability of the Hopfield, Denker and Palm models was studied. It may be possible to extend the analysis to the region around memories. Little analytic information is currently available about this region. Consider the case where a network is at Hamming distance h from the closest memory. The first and second order expectations of the signal impinging on any neuron can be found. With minor modifications to (3.8), the probability that a neuron will change state can be calculated. Any change will either increase the Hamming distance to the nearest memory to h + 1 or decrease it to h — 1. If one maps the Hamming distance to a state in a Markov chain, the result is similar in form to a birth-death process in stochastic theory. One can then determine the characteristics of the region around memories. Methods of increasing the storage or reducing the hardware complexity of the neural network CAM models should be explored. The method of analysis developed in this thesis can be used to characterize these new algorithms. It would be instructive to show analytically whether syntax independent dilution schemes exist. While simulations suggest that the proposed model offers good performance as a content addressable memory, it should be confirmed by implementation in a VLSI circuit. An implementation would verify the speed predictions and hardware costs estimates. The circuit would be the first fully digital, expandable neural network circuit to be constructed. References [1] Y. Abu-Mostafa and J-M. St. Jacques, "Information capacity of the hopfield model," IEEE Trans. Inform. Theory, vol. IT-31, No. 4, pp. 461-464, 1985. [2] S. Bottini, "An algebraic model of an assciative noise-like coding memory," Biol. Cybern. 36, pp. 221-229, 1980. [3] S. Bottini, "An after-Shannon measure of the storage capacity of an associative noise-like coding memory," Biol. Cybern. 59, pp. 151-159, 1988. [4] J. Denker, "Neural network models of learning and adaptation," Physica 22D, pp. 216-232, 1986. [5] H. Graf et al., "VLSI implementation of a neural network memory with several hundreds of neurons," in J. S. Denker (Ed.) AIP Conference Proceedings 151, Neural Networks for Computing, Snowbird Utah, AIP, 1986. [6] H. Graf et al., "VLSI implementation of a neural network model," IEEE Computer, pp. 41-49,1988. [7] J. Hopfield, "Neural networks and physical systems with emergent collective computational abili-ties," Proc. Nail. Acad. Set. USA, vol 79, pp. 2554-2558, 1982. [8] J. Hopfield, "Neurons with graded response have collective computational properties like those of two-state neurons," Proc. Natl. Acad. Sci. USA, vol 81, pp. 3088-3092, 1984. [9] B. Kernighan and D. Ritchie, The C Programming Language, Prentice-Hall, New Jersey, 1978. [10] J. Lambe et al., "Electronic device aspects of neural network memories," Proc. AIAA/ACM/-NASA/IEEE Computers In Aerospace Conference, 1985. [11] EJ. McCluskey, Logic Design Principles, Prentice-Hall, New Jersey, 1986. [12] R. McEliece, "The capacity of the Hopfield associative memory," IEEE Trans. Inform. Theory, vol. IT-33, pp. 461-482, 1987. [13] C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Massachusetts, 1980. 70 References 71 [14] A. Moopenn et al, "Electronic implementation of associative memory based on neural network models," IEEE Trans. Syst. Man Cybern., vol. SMC-17, No. 2, pp. 325-331, 1987. [15] A. Murray and A. Smith, "Asynchronous VLSI neural networks using pulse-strean arithmetic," IEEE J. Solid-State Circuits, vol. 23, No. 3, pp. 688-698, 1988. [16] G. Palm, "On associative memory," Biol. Cybern. 36, pp. 19-31, 1980. [17] E. Parzen, Modern Probability Theory and Its Applications, John Wiley & Sons, New York, 1960. [18] P. Peretto and J. Niez, "Stochastic dynamics of neural networks," IEEE Trans. Syst. Man Cybern., vol. SMC-16, No. 1, pp. 73-83, 1986. [19] R. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1959. [20] D. Rumelhart, and J. McClelland, Parallel Distributed Processing : Explorations in the Microstruc-ture of Cognition, MIT Press, 1986. [21] M. Sivilotti et al., "A novel associative memory implemented using collective computation," 1985 Chapel Hill Conference On Very Large Scale Integration, pp. 329-342, 1985 [22] D. Tank and J. Hopfield, "Simple 'neural' Optimization networks: An A/D Converter, Signal Decision Circuit, and a linear programming circuit," IEEE Trans. Circuits Systems, vol. CAS-33, pp. 533-541, 1986. [23] B. Wooley, ed., IEEE J. Solid-State Circuits - Special Issue on Logic and Memory, vol. 23, No. 5, 1988. [24] R Zeimer and W. Tranter, Principles of Communications, Houghton Mifflin, Boston, 1976. [25] Guide to the Intergated Circuit Implementation Services of the Canadian Microelectronics Corpo-ration, Queen's University, Kingston, 1987. [26] MC68010/MC68012 16-/32-BU Virtual Memory Microprocessors, Motorola Inc., Austin, 1985. [27] Memories Data Book, 1986-87 Edition, Fujitsu Microelectronics Inc., Santa Clara, 1986. [28] The TTL Data Book, Volumes 1, 2, and 3, Texas Instruments Inc., Dallas, 1985. A p p e n d i x A The central idea in evaluating the expectations in Eqn (3.15) and (3.16) is to locate the regions where 7~() is constant. Using the following vector definitions: i = [vi1,vj2,...,v7-1)v7+1,...,vr]T, J = W,V?R..,VT-\VT+\...,Vr]T. X = [^nV..,vr\v;+V..)vr]T, the final term in (3.15) can be written as E If N / m \ /m \ i=l ; = 1 \o=l / \b=l J E ^ V / T ( /?•/+VJVr) -T (K-J + Vkrvj"j |Vj] foranyt^i (A.l) The elements of K can be divided into two sets; the ON-set consisting of elements with value 1, and the OFF-set consisting of elements with value —1. The weight w of K is the cardinality of the ON-set. The probability of K having weight w is Bp(w,m — 1). If the elements of I corresponding to the elements in the ON-set contains x ones, and the elements corresponding to those in the OFF-set contains y ones, then K • I = x — (w — x) + (m — 1) — w — 2y. There a r e ( _ ' ) - ( T O — y _ I vectors conforming to the above format, and each occurs with probability pm+v • gm-1-(«+v)> Similarly, if the elements of J corresponding to the elements in the ON-set contains u ones, and the elements corresponding to those in the OFF-set contains v ones, then K • J = u — (_> — u) + (m — 1) — w — 2v. The function T (ll • I + V^Vf^j is constant in regions where K • 1+ V^V? >l,K-I+VkrV?<-l and K • 7+ V^V^ = 0. The probability of each region for Vhr V? = 1 and V^V^ = -1 are given by: PM(u>) = Pr [i?.J+V?VT >1 | V-VT = _,_,] w min(m-l-_,[^ J-l) £ Bp(x,w) £ Bp(y,m-w-1) (A.2) B=max(0,ui+y-[»J+l) V=0 72 Appendix A. 73 and Pi , _ i H = Pv[K-i + v;v{>i\nv; = -i,w] m-w-l min(m-l-*,,|=r±J-l) E E ^(y,m-«-l) (A.3) «=max(0,v+t U -L=f i J+l) » - ° p _ M H = Pr [.?./+1717 <-i | v;rvr = i,w] min(w, [=^i j - l ) m-w-l E **(*.«0 E Bp(y,m-ti»-l) (A.4) P_I,_I(U>) = Pr[j?-/+v2-v7<-i| v;v7 = -i,«;] min(«.,[=fij) m - w - l E E Bp(y,m-w-1) (A.5) •=° v-max(0,i+ f^r 1] -«<) pO i l(_0 = Pr [/? • 1 + vxv;r = 01 vzv? = l,w] = Even(m) E J?p(z, u>)l?f ^as - w + y , m l j (A.6) 8=max(o ,w- L ^ J ) p 0 , - i W = Pr [ j? . /+v ; v7 = o 1 vjvr = - . , « . ] m i n ^ . L ^ J ) / m \ = Even(m) E tfpfow)-?,, - w - 1 + y,m-tu - lj (A.7) i_max(o ,u /+l— | ^ J ) Appendix A. 74 Using the notation of (A.2) to (A.7) we can write E \V?VJT (K . 1 + VTV7) T (K • / + vj-v/) | vz = 1] = P.x>l(w)P.ltl{w) + Pltl(w)P1A{w) m-l £ w-0 2pq -2P_i,i(u.)Pi,iH Pi,i(u>)P_i,_i W + P - M M P I . - I M -Pi.i W P I . - I H - P_i,i(«)P_i,_i(w) P I , - I H P I , - I H + P_I,_I W P - 1 , . 1 W -2Pli_i(w)P_1,_1(u») Similarly, if VKR = — 1, we have E [vrVJT (it • J + vrv?) r (K . / + v^v/) | = -1] = P _ U H P _ I , I H + Pi,i(w)Pi,iH -2P_x>1(io)Pi,i(tt) PI,I(W)P_I,_IH + P-i.i (w)Pi,_i(w) -Pl,l(«)Pl,_l(tll) -P_l,l(w)P_l,_l(tD) Pi,-i(w)Pi,_i(w) + P - i , - i(w)P_i , _ iH -3P l l-i(w)P_i,_iH + m-l X S P(w,m-l). < w =0 2pg + > (A.8) (A.9) Appendix B Notation Combination oft items chosen s at a time = Bx{s,t) Binomial distribution with probability x, t trial, s successes = ( j ^ x'(l — erfc() Complementary error function. fx(*\v) Probability density function (pdf) of random variable X given condition y. E[x) Expected value of as. Even (at) 1 if x is even, 0 is a? is odd. Pr(*) Probability of event x. Var(r) Variance of x. SO Liapunov energy function. w Nearest integer equal to or greater than x. w Nearest integer equal to or smaller than x. Calculated/simulated bit error rate. 9 Number of groups in memory vector. h Hamming distance. Hamming distance where frequency of convergence falls to 0.5. herit ., her it normalized by information content J. m Number of memories stored in network. me/m. Calculated/simulated memory error rate. 75 Appendix B. Notation 76 p Probability of a memory bit being TRUE. 9 Probability of a memory bit being FALSE, (g = 1 — p). tk Threshold of neuron k. C Memory capacity of network. CHI Frequency of convergence from Hamming distance of 1. D Dilution factor. I Information in diluted memory vector (measured in bits). N Number of neurons in network. P Probe vector. V Diluted version of probe vector. Sk Signal received by neuron k. ThoFF Threshold of neuron if all neighbours in OFF state. Th0N Threshold of neuron if any neighbour in ON state. Uk State of nenron k. Vr Memory vector r. v; Bit k of memory vector r. Diluted version of memory vector r. Bit k of diluted memory vector r. Probability that the product of»independent memory bits is 1.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- VLSI implementable associative memory based on neural...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
VLSI implementable associative memory based on neural network architecture Mok, Winston Ki-Cheong 1989
pdf
Page Metadata
Item Metadata
Title | VLSI implementable associative memory based on neural network architecture |
Creator |
Mok, Winston Ki-Cheong |
Publisher | University of British Columbia |
Date Issued | 1989 |
Description | Neural network associative memories based on Hopfield's design have two failure modes. 1) Certain memories are not recallable as they do not lie in a local minimum of the system's Liapunov (energy) function, and 2) the associative memory converges to a non-memory location due to trapping by spurious local minima in the energy function surface. The properties of the first failure mechanism in the Hopfield and five related models were investigated. A new architecture eliminating such failures is proposed. The architecture is fully digital and modular. Furthermore, it is more silicon-area efficient than any of the analyzed models. VLSI circuits built using this architecture will be able to contain a large number of neurons and several circuits may be connected together to further increase capacity. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0064936 |
URI | http://hdl.handle.net/2429/27942 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1989_A7 M64.pdf [ 3.83MB ]
- Metadata
- JSON: 831-1.0064936.json
- JSON-LD: 831-1.0064936-ld.json
- RDF/XML (Pretty): 831-1.0064936-rdf.xml
- RDF/JSON: 831-1.0064936-rdf.json
- Turtle: 831-1.0064936-turtle.txt
- N-Triples: 831-1.0064936-rdf-ntriples.txt
- Original Record: 831-1.0064936-source.json
- Full Text
- 831-1.0064936-fulltext.txt
- Citation
- 831-1.0064936.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0064936/manifest