i V L S I I M P L E M E N T A B L E A S S O C I A T I V E M E M O R Y B A S E D O N N E U R A L N E T W O R K A R C H I T E C T U R E Winston Ki-Cheong Mok B. A. Sc. (Hons.) University of British Columbia A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF A P P L I E D SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF E L E C T R I C A L ENGINEERING We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA April 1989 © Winston Ki-Cheong Mok, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of g ^ l ^ O T / ? ! C Art. t=M6, fN&&£//J£? The University of British Columbia Vancouver, Canada DE-6 (2/88) A b s t r a c t Neural network associative memories based on Hopfield's design have two failure modes. 1) Certain memories are not recallable as they do not lie in a local minimum of the system's Liapunov (energy) function, and 2) the associative memory converges to a non-memory location due to trapping by spurious local minima in the energy function surface. The properties of the first failure mechanism in the Hopfield and five related models were investigated. A new architecture eliminating such failures is proposed. The architecture is fully digital and modular. Furthermore, it is more silicon-area efficient than any of the analyzed models. VLSI circuits built using this architecture will be able to contain a large number of neurons and several circuits may be connected together to further increase capacity. ii Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgement l x 1 Introduction 1 1.1 Hopfield Networks 1 1.2 Overview of Thesis 3 2 Associative Memory Networks 4 2.1 Hopfield Network • • 4 2.2 Simplifying Models 7 2.2.1 Truncated Hopfield Model 7 2.2.2 Denker Model 7 2.2.3 Truncated Denker Model 8 2.2.4 Palm Models 8 2.2.5 JPL Model 8 2.3 Model Summary 9 2.4 Simulation Results 9 3 Unstable Memory Analysis 12 3.1 Hopfield Model 12 3.2 Truncated Hopfield Model 15 3.3 Denker Model 18 3.4 Truncated Denker Model 20 3.5 Analogue Palm Model 20 iii 3.6 Truncated Palm Model 21 3.7 Correlation between Neuron Signals 22 3.8 Distribution of Neuron Signal • 24 3.9 Software Tools 29 3.10 Results 29 3.11 Limitations Of Current Implementations 37 4 Proposed Associative Memory Architecture 40 4.1 Model Description 40 4.1.1 Threshold Determination 41 4.1.2 Memory Stability 44 4.1.3 frequency of Convergence 44 4.2 System Architecture 45 4.3 Neural Network Module Architecture 52 4.4 Simulations 55 4.5 Performance of Proposed Network 56 4.5.1 Effects of Threshold 59 4.5.2 Effects of Dilution 62 4.5.3 Effects of Memory Loading 64 4.6 Convergence Delay 64 4.7 Comparison with Existing Models 65 4.7.1 Convergence Speed 65 4.7.2 Memory Capacity 66 4.7.3 Network Size 67 5 Conclusions 68 5.1 Summary 68 5.2 Recommendations 69 References 70 A 72 iv B Notation 75 v List o f Tables 2.1 Variations on Hopfield Model 9 2.2 Frequency of Convergence to Closest Memory 11 3.1 Attainable Values of Signal S t 24 vi List of Figures 2.1 Schematic of a Hopfield Neural Network 5 2.2 Network energy as a function of Ut 6 3.1 Conditional pdf of Sjt and optimum decision boundary 14 3.2 Distribution of 5 t ; Analogue Hopfield model, JV = 128, m = 10, p = 0.5 27 3.3 Distribution of St] Analogue Denker model, N = 128,m = 10, p = 0.5 27 3.4 Distribution of St; Analogue Palm model, N = 128, m = 10,p = 0.1 28 3.5 Distribution of St in the Region of Integration 28 3.6 Predicted and Simulated Optimum Threshold; bipolar models, m = 10, p = 0.3 30 3.7 Predicted and Simulated Optimum Threshold; unipolar models, m = 10, p = 0.1 31 3.8 Predicted and Simulated Bit Error Rate; bipolar models, m = 10, p = 0.5 32 3.9 Predicted and Simulated Bit Error Rate; unipolar models, TO = 10,p = 0.1 32 3.10 Bit Error Rates; bipolar models, N = 128, m = 10, p = 0.3,0.4, and 0.5 33 3.11 Bit Error Rates; Palm models, JV = 128, TO = 10, p = 0.05, and 0.1 34 3.12 Predicted and Simulated Memory Error Rate; bipolar models, m = 10, p = 0.5 35 3.13 Predicted and Simulated Memory Error Rate; Palm models, TO = 10, p = 0.1 35 8.14 Predicted Capacity of Hopfield Networks; p = 0.5 36 3.15 Comparison with Published Results of Capacity of Hopfield model 37 4.1 Probability density function of signal Si for VJ = 0 and VJ = 1 43 4.2 Block diagram of associative memory co-processor system . . . 46 4.3 Frequency of convergence of proposed network (N = 1024 neurons, ThoFF = /* + 30i (2.3) »=i j=i »=i Chapter 2. Associative Memory Networks 6 Rewriting (2.3) explicitly in terms of Uk, and differentiating with respect to Uk, one gets: - N N N AT £0) = -\T*kUl - UkJ^TtM+Wi* -z2l2UJTJiUi + T,tiUi i*k i=ii=i i*k i*k and •^- = -TkkUk-^2TkiUi + tk if 8Uk i=l ijtk t=l (2.4) (2.5) Comparing (2.1) and (2.5), it is clear that a state transition is generated if the directional derivative predicts an energy reduction. However, the energy function given by (2.4) is parabolic and concave down. The derivative is, therefore, not always reliable in predicting the energy difference between the two states. This is illustrated in Figure 2.2 below. If (7* = — 1 (point A) currently, and the energy is as shown in the figure, neuron Jfe will be prevented by the positive ^from changing state to Uk = 1 (point B) even though the change would lead to a lower total energy. This problem is caused by a non-zero T i t term. If Tkk is set to zero arbitrarily, the energy function is linear with respect to each neuron, and the entire energy surface will be shifted up. However, the difference in energy between states remains unchanged. The small energy barrier surrounding certain states will be flattened. It is therefore customary to set Tkk to zero. In the following analysis, Tkk is assumed to be zeroed. Figure 2.2: Network energy as a function of Uk Chapter 2. Associative Memory Networks 7 2.2 Simplifying Models In the Hopfield model, the strength of synaptic connections range from —m to +m, where mis the number of memories stored. In an analogue implementation, resistances are used to effect the connections. Having programmable resistors with a wide range of values is prohibitively expensive with current VLSI technology. Consequently, various modifications to the basic Hopfield model have been proposed. Some reduce the number of resistors required, while others replace them with fixed resistances and switches. We will describe six such modifications in this section. 2.2.1 Truncated Hopfield Mode l The truncated Hopfield model was proposed by Sivilotti et al. [21], and implemented in a VLSI circuit. In this model, the synaptic strengths are set to +1 if the sum of outer product is positive, and —1 if the sum is negative. where +1 ; x> 1 T(x) = 0 ; x = 0 - 1 ; x < - 1 The synaptic matrix consists of a storage element and a polarity switch at each node. The resistance is provided by the channel resistance of the pass transistors. Positive and negative resistance is achieved by gating the true or complement output of the neurons, respectively. The associative memory integrated circuit (ASSOCMEM) built used standard VLSI technology, and contained 22 neurons. The neuron update rule, and energy function remain unchanged from the basic Hopfield model. 2.2.2 Denker Mode l In an effort to reduce the number of variable resistances required, and to demonstrate the fault-tolerance of the Hopfield model, Denker [4] simulated a circuit in which all positive elements of the synaptic matrix were set to zero. It was shown empirically that the network still performed as an associative memory, albeit with reduced efficacy. The neuron update rule and energy function of the basic Hopfield model remain applicable here. Chapter 2. Associative Memory Networks 8 The connection strengths are related to the memory vectors by Tki=V (2.7) where 0 x > 0 V(x)= 0 SB = 0 X x < 0 2.2.3 T r u n c a t e d Denker M o d e l Combining the ideas of [4] and [21], one arrives at the truncated Denker model. The synaptic strengths are generated using (2.6) and the positive elements are then set to zero. The neuron states are still represented by [—1, +1]. Fault-tolerance of the neural network architecture is amply demonstrated, as, in spite of the radical mutilation of the synaptic matrix, the resultant network still performs its intended function. 2.2.4 P a l m M o d e l s The next two models, analogue Palm and truncated Palm [16], use unipolar representations for neuron states. The ON state is denoted by Uk = 1» and the OFF state by Uk — 0. (Similarly for memory elements V^.) These models ease VLSI implementation, as only single-ended signals are required. However, these models require that memory vectors be diluted before storage, in order to reduce aliasing. Palm showed that for maximum storage capacity, the optimum weight (number of T R U E bits) of diluted vectors is given by Wgpt = log2N. However, he did not take into account the effect of dilution rate on the size of the region of convergence around each memory. After dilution, the analogue Palm model uses (2.2) to generate synaptic strengths, while the Palm model uses (2.6). 2.2.5 J P L M o d e l The JPL model was developed by Moopenn et al. at the Jet Propulsion Laboratory [14] and is a refinement of the Palm model. In this model, the memory vectors are first divided into, say, g groups of b bits. Each group is then expanded by a 6 => 2 l decoder. The expanded representation will contain (2.8) Chapter 2. Associative Memory Networks 9 exactly one T R U E bit pei group. The output of the decoder is concatenated to form a diluted memory vector V r . The vectors will contain g T R U E bits; that is, a weight of g. The synaptic matrix is programmed from the diluted vectors using: Instead of having a fixed threshold, Moopenn utilizes a variable analogue threshold that seeks to enforce the one T R U E bit per group syntax as given by tk = B. £ Ui (2.10) <€{ it, for any bit k. If one assumes that the memories are random and independent, and the bits in each memory are also random and independent, the distribution of Si is approximately Gaussian (3.1) because of the large number of terms contributing to it. The probability density function of St in the truncated Hopfield model and the two Denker models can be similarly approximated. The characteristics of Si is then completely specified by the mean (/*) and variance ( + '-- + Vrkv? + --- + vkmv?n) |V? = -1] = (JV - 1) x The expectation of Sf is of the form: m - l (p-g) £ ( B K ( j , m - l ) - J 3 A ( ; , m - l ) ) -p B ^ L f J . m - l J - g B ^ L f J . m - l ) E[St2] = E JV N i = i j = i )Pi,iH Pi . iHP_i ._ iH + p_ 1,1 H P i . - i H - P i , i H P i , - i H - P - L I H P . I , . ! ^ ) P i , - i H P i , - i H + P - I . - I H P . L . I H - 2 ? l r l ( w ) P . 1 , . 1 ( « ; ) + E[S t 2 |V7 = - l ] = ( N - l ) m - l 1 - 53 B f K m - * ) • bPo,-iH-i-gPo.iH] ui-0 m - l + ( J V - ! ) ( # - 2 ) £ _ » , ( u . , m - l ) x ui_0 2pg P_i , iHP_i , iH + Pi,iHPi,i(ti;) -2P_i .xHPi . iH Pi,i(t»)p_i,_i(w) + P _ I , I H P I , _ I H - P i , i H P i , - i H - P _ I . I H P . L _ I H P i , - i H P i , _ i H + P _ I , _ I H P _ I , _ I H -2Pi,_i(u;)P_i,_iH The expressions for Pa,t(w) and the derivation of (3.15) and (3.16) are shown in Appendix A. (3.15) (3.16) Chapter 3. Unstable Memory Analysis 18 3.3 Denker Model The Denker model is analysed along similar lines to that used for the truncated Hopfield model. Regions where V(x) = x are found, and the expectation sums are performed over each region. "With A = 2pq, the expressions for the Denker model are: E[S„ | VJ = 1] = ( N - l ) m-l 53 [{p-q)(m-2j) + 2q]Bx(j,m-l)+ [1 + Even (m)]gflA([fJ, m-l) (3.17) E[S2|V? = l]=-m-l (J\T-1) £ _ ? „ ( _ . , m - l ) x m i n ( « , | = f i j - l ) m_w_x p 53 53 (^*iw)Bj)(y,n»-ti;-l)(2a!-2y + m-2ti;) "=° y - m a x ( o , « + [ = ^ i ] - _ ) min(tK,[2^iJ) m _ w _ i q £ ^2 Bp(x,w)Bp(y,m-w-l)(2x-2y + m-2-2w) «=° v = m a x ( 0 , x + [ = S f i ] - W ) m-l (JV-l)(JV-2)53_^(ti»,m-l)x min(ti/,|=f±J) m-w-1 min^, ) m _ w _ j «2 53 53 53 53 ^( • .«w».m-»-i)x «=° y - m a x ( 0 , < r + p S f ± " | - w ) u = 0 «=max(0,u+ \ s f 1 " \ - w ) _? p(u, w)Bp(v, m - w - l)(2(x - y - w - l ) + m)(2(u - v - w - l ) + m) m i n ( « / , L = = _ J - l ) m-w-1 m i n ^ . L ^ J ) m - t » - l 2p? 53 53 53 53 B p ( » t w ) B , { y , m - w - l ) x "=° y-max(o,»+["=fI"|-iii) u = 0 ti-max(o,u+[=f±"|-_) _>p(tt, tu)i?p(t», m - ui - l)(2(sr - y - w) + m)(2(u - v - w - l ) + m) m i n ( i K , L ^ i J - l ) „ , _ „ , _ ! _ u n ( _ , L * ^ i J - l ) > + + 53 ^(»it")B,(y,TO-«;-l)x »=° v=max(0,B+[=fi]-tU) u = 0 v=max(u,u+ [s£±'] -vi) Bp(u, w)Bp(v, m - w - l)(2(x — y — w) + m)(2(u -v — w) + m) (3.18) Chapter 3. Unstable Memory Analysis 19 E[Si | Vfc = -1] = m-l E [ ( p - g ) ( m - 2 j ) - 2 p ] 5 A ( i , m - l ) -[1 + Even (m)] pB A ( [ f J , m - l) (3.19) and E [ S 2 | V ? = -1] = m-l ( N - l ) ^ 2 B p ( w , m - l ) x ui=0 min(«;, |=fij-l) m_„_i « E E Bp(x,w)Bp(y,m-w- 1) (2* - 2y + m - 2u>)2 + •=° V=max(0,*+[=fi]-tU) min(«/,|=rij) m-«/-l p E E Bp (a, w)Bp(y, m - w - 1) (2x - 2y + m - 2 - 2w) •=° v=max(0,x+[=Sji i"]_ w ) m-l ( N - l ) ( N - 2 ) ' £ i B p ( w , m - l ) x v>=0 min(u/,[=f±J) m_„,_i min(tU,L=T^J) m-tu-l P2 E X) £ E B p ( x , w ) B p ( y t m - w - l ) x *=° v=max(0,«+[=ri]-t») u = 0 v=max(0,u+ p T ^ ] - « " ) J5p(u, tu)5p(v, m - w - l)(2(x - y - w - l ) + ro)(2(u - v - w - 1) + m) min(i/,|=S=AJ-l) m - w - l min^ , J) m - w - l 2pq E E E E ^ ( * , i » W y , " » - « - l ) x «=° v=max(0,a!+p=£i"]-ti;) «=» «=max(o,u+f=fi"|-«) B p(u,n;)5p(t;,m- to - l)(2(se - y - to) + m)(2(« -1> — w - 1) + m) q2 E E E E * , ( « , « ) * F ( » , m - W - l ) * •=° y=max(o,e+["=±i"|-«i) «=° i<=max(o,u+p=£l"| _«,) Bp(u, w)Bp(v, m - w - l)(2(x - y - w ) + m)(2(« -v-w) + m) > + + (3.20) Chapter 3. Unstable Memory Analysis 20 3.4 Truncated Denker Model To analyse the truncated Denker model, the regions where T(VQ) yields -1 are located, and the expec-tations computed over each region separately. The expectations of St and Si (again with A — 2pq) are given by: m-l BP* I VZ = l] = {N-l) E[sk\v; = -i] = ( N - i ) (q-p) E Bx(j,m-l) + qBx([j\,m-l) m-l (q-p) E Bx{j,m-l)-pBx([j\,m-l) E [ S 2 | V T = I] = m-l (N - 1) £ £?„(_.,m - 1) [pP_ M M + !P-i,-iW] + m-l (N-l)(N-2)Y^Bp(w,m-l) w=0 P 2 P 2 I , I W + 9 2P 2 I,-I(™)-2pgP_1|1(_>)P_i1_i(u>) and E[S t 2 |V? = -1] = m-l (*-i)53 ^ («.« -!) [pP-i,-i W + gP-i,iH] + tu=0 m-l (tf-l)(i\T-2) £_»,(_•, m-l) t»=0 P ^ i . - l M + ^ i . i W " 2pgP_i,1(u.)P_i,_1(«;) 3.5 Analogue Palm Model (3.21) (3.22) (3.23) (3.24) The analysis of the analogue Palm model is very similar to that of the Hopfield model. To calculate the expectation of St, Tu is expanded according to (2.2) and the expectation of each term is summed. The expectation relations are: fi+ = E [St | V? = 1] = (N - l)p+ (N - l)(m - l)p3 = E[S t | VI = 0] = (N-l)(m-l)p3 (3.25) (3.26) Chapter 3. Unstable Memory Analysis 21 E[5 t 2|VT=l] = (JV - 1)(JV - 2)(m - l)(m - 2)p6 + 2(JV - 1)(JV - 2)(m - l)p4 + (JV - 1)(JV - 2)(m - l)p5 + (JV - 1)(JV - 2)p2 + (JV - l)(m - l)p3 +. (JV - l)(m - l)(m - 2)p5 + (JV - l)(m - l)p3 + (JV - l)p (3.27) and E[St2|VI = 0] = (JV - 1)(JV - 2)(m - l)(m - 2)p6 + (JV - 1)(JV - 2)(m - l)p5 + (JV - l)(m - l)(m - 2)p5 + (JV - l)(m - l)p3 (3.28) The expectation of Si may be very small (< 5). In those cases, the probability density function of St is approximated by the Poisson distribution (3.29). For larger E[Si], the probability density function is approximately Gaussian (3.1). For the Poisson distribution, the parameter A is equal to the variance. /sk(*|Vt') = ^ p (3.29) 3.6 Truncated Palm Model In the truncated Palm model, the elements in the synaptic matrix are given by (2.6). As with previ-ous analysis of models having discontinuous functions denning Tti, we calculate expectations here by summing over each region where T() has value 1. The expectation of St and 5 | are given by: /J+ = E [St | V2* = 1] = (JV - l)p (3.30) m-l M _ = E [St | VT = 0] = (JV - l)p £ Bp(w, m - 1) [1 - (1 - (3.31) w=l E [S2k | V? = 1] = (JV - l)p + (JV - 1)(JV - 2)p2 (3.32) and E[S||VT=0] = (JV-l ) p X;B PKm-l)[l-(l - p n + w=i m-l (JV - 1)(JV - 2)p2 ^ Bp(w,m - 1) [1 - (1 - p)«f (3.33) w-l Chapter 3. Unstable Memory Analysis 22 3.7 Correlation between Neuron Signals As discussed previously, our estimation of the memory error rate m, is based on the assumption that bit errors are uncorrelated. In this, section, the correlation of the signals received by two disparate neurons is investigated. The analogue Hopfield and Plam models are examined in detail. The truncated Hopfield and Denker models are expected to be similar to the Hopfield model and the truncated Palm model to its analogue counterpart. Consider first the analogue Hopfield network. With the network state at memory Vr, the covariance of the signals received by any two neurons, say neurons k and /, can be calculated. The correlation coefficient r, which measures the level of linear correlation between the two signals, is given by ratio of the covariance to the variance of the signals themselves. The mean and variance of St and Sj are identical because the memory bits are generated randomly and independently. ,2 _ 6, • JV.) Since, bit errors tend to be concentrated in certain memories, the remaining memories must have below average bit errors if the aggregate bit error rates were to remain unchanged. Therefore, the memory error rates are less than predictions based on independent occurrence of bit errors. As will be shown in a later section, we are able to predict the bit error rate with reasonable accuracy, but grossly over-estimate the memory error rate. 3.8 Distribution of Neuron Signal During simulations, it was discovered that, under certain conditions, the signal St for the Hopfield and Denker models will only take on odd or even integer values. Put another way, every alternate value will have a frequency of occurrence of zero, even if the value is close to the mean. Superficially, this appears to prevent the use of continuous probability density functions to approximate the distribution of Sk. The cause of this behaviour must be investigated. Table 3.1 shows the values that Sk can attain. Model JV m sk Model JV TO sh Analogue Hopfield Odd Odd Even Truncated Hopfield Odd Odd Even Odd Even Even Odd Even Either Even Odd Odd Even Odd Odd Even Even Even Even Even Either Analogue Denker Odd Odd Either Truncated Denker Odd Odd Either Odd Even Even Odd Even Either Even Odd Either Even Odd Either Even Even Even Even Even Either Analogue Palm Odd Odd Either Truncated Palm Odd Odd Either Odd Even Either Odd Even Either Even Odd Either Even Odd Either Even Even Either Even Even Either Table 3.1: Attainable Values of Signal Sh Chapter 3. Unstable Memory Analysis 25 The constraints of Sk result from the construction of the synaptic matrix using the sum-of-outer-products rule. St is, naturally, given by m Sk = J2TkiVT • (3.38) <=i Expanding T« by its sum-of-outer-products equivalent, one gets sk = vtvM + vtvfvf + '-' + vtvzv; + --- + v1rvkmv1m WW + v2rvk2v? + ••• + VJVJVJ + • • • + vrv 2 m vi-iVkVU+vi-ivtv^+...+vsr_1vsr^,_i+• • •+v^v rv -^ Vk+iVivi+l + v;+1vk2vk2+1 + • • • + v; + 1 v; v? + 1 + • • • + v r ^ WW + V*NV2V2 + • • • + V^TYV^ + • • • + nVkmVJ? (3-39) Consider the case where N is even, m is odd, and the network state is currently at memory VT. Each row in (3.39) contains m terms. The rth term of each row is of the form VJT (V?)2. Since neurons states are constrained to be —1 or +1, (V7) 2 always evaluates to one, and thus, the rth term has value VkT. The remaining m — 1 terms are (—1, +1) random variables. As stated earlier, m is assumed to be odd. TO — 1, is therefore, even. Each row of (3.39) consists of the sum of TO — 1 odd terms and Vkr which is odd. Thus, the sum of each row is odd. There are N — 1 rows in (3.39). Row k is missing because Tkk is set to zero to smooth the energy surface, as discussed earlier. With N even, Sk is the sum of an odd number of odd terms. Therefore, Sk must be odd. Conversely, if TV is odd, Sk would be the sum of an even number of terms. Or, if m is even , each row would contribute an even intermediate sum. In those cases, Sk would be even. Analogous analysis can be applied to the other models to predict the parity of the neuron signals. As shown in Appendix A, the intermediate sum of each row is given by R,um = 2SB - 2u> - 2y + TO - 1 + Vk (3.40) w, x, and y are integer random variables and are defined in Appendix A. Since all random variables are multiplied by a factor of 2, R,Um can only change in steps of two. Therefore, Sk only takes on alternate values. However, the simple transform of dividing Sk by two would ensure that it will change in unity Chapter 3. Unstable Memory Analysis 26 steps. The approximation function must be parameterized with the statistics of the transformed Sk, and the arguments to the function must also be divided by two. The effect of the transform on the expectations are E | | ] = 5 S • .(Ml, and If the rows in (3.39) were independent and no row dominates, the distribution of 4?- would be approximated by the Gaussian distribution. This is a direct result of the central limit theorem in statistics. In the present case, the rows are identically distributed but not truly independent. It was felt that the level of independence was sufficiently high to satisfy the relatively lax requirements of the central limit theorem to warrant the use of the Gaussian distribution. To verify the assumption, simulations were performed to gather the frequency at which various values of Sjt occurs. In the simulations, 640,000 samples of Sk were taken. To generate the theoretical frequencies, a Gaussian distribution biased with mean and variance of 4^ - was used. Freq(Sk = x) = 640,000 • Gaussian (jis±, to i °'1 o IP !5 CO X ) o 0.01 + * Simulation Results: + Palm * Trunc. Palm 100 Palm Trunc. Palm 2 0 0 Number of Neurons Figure 3.13: Predicted and Simulated Memory Error Rate; Palm models, m = 10, p: Chapter 3. Unstable Memory Analysis 36 Having established that the error rates for the bipolar models can be predicted with good accuracy, it is possible to use our analytic model to guide network development. For a particular application where the maximum tolerable memory error rate is known, the model can be used to calculate the level of memory loading that will meet the constraint. Conversely, given the target memory error rate, and memory loading, the model can be used to determine the size of network needed to satisfy these constraints. Figure 3.14 shows the number of memories that can be stored in Hopfield networks of various sizes at memory error rates of 10-1,10-2, and 10~3. 80 60 -O CO ca-cti o E CD 20 -me=10-3 500 Number of Neurons Figure 3.14: Predicted Capacity of Hopfield Networks; p = 0.5 1000 The curves in Figure 3.14 are virtually linear with a slight downward curvature. Other researchers [1, 12] investigating the capacity of Hopfield networks as the number of neurons approach infinity, have shown that the capacity is of the order: (3.44) The capacity relation is plotted with the constant error curve of me = 10"1 in Figure 3.15. To convert (3.44) from a proportionality relation to an equation, a scaling factor is found by least squares fitting Chapter 3. Unstable Memory Analysis 37 of the points defining the mt = 10 1 curve. This factor was found to be 0.736. Since both curves are arrived at using completely different methods, their close fit supports the validity of our analytic model. 8 0 0 —( • > • • 1 ' > ' ' 1— 0 500 1000 Number of Neurons Figure 3.15: Comparison with Published Results of Capacity of Hopfield model 3.11 Limitations Of Current Implementations Current VLSI implementations of neural network associative memories with modifiable synapses have all been of limited size. For example, the ASSOCMEM [21] had only 22 neurons while the integrated circuit (IC) produced by Moopenn et al. [14] had 32 neurons. The limiting factor is the size of the synaptic matrix. In the ASSOCMEM, the matrix occupied over 90% of the die core area. The matrix is large because conventional CMOS VLSI fabrication processes are inefficient for implementing random access memories (RAMs). They lack many features found in dedicated RAM processes such has high impedance pull-up structures, polysilicide gate material, and trench capacitors [23]. The designers of ASSOCMEM estimated that a non-programmable circuit could contain as many as 220 neurons. Using a simpler architecture and non-modifiable synapses, Graf et al. [5] produced an IC Chapter 3. Unstable Memory Analysis 38 with 512 neurons. However, using a fixed matrix is not a real solution as the resultant network would be restricted to a predetermined set of functions. The lack of expandability is a major weakness in current implementations. It is impossible to increase the power of the network by combining several smaller circuits. This is because adding more neurons requires augmenting the synaptic matrix with more rows. With the matrix embedded in the circuit and each row dedicated to a particular neuron, there is no convenient way to remap the rows to different neurons. In an analogue implementation of the Hopfield network, high valued "synaptic" resistances are re-quired. These resistances, when combined with the small parasitic capacitances form sufficiently long time constants so that the state differential equation (3.45) is formally similar to (2.5). A second reason for having large resistances is to reduce power dissipation. Unlike conventional digital circuits, where only a small portion of the IC is active at any one time, the entire matrix in a Hopfield network is always active. The network dissipates P = N • -ffi Watts. A modest network with 100 neurons and VDD = BVolts would , for example, require resistances of at least 125fcft, if power dissipation is to be limited to 1 Watt. However, standard fabrication services available to universities, through MOSIS in the United States, or CMC in Canada [25], do not inherently provide compact constructs for realizing such large resistors. N (3.45) where Gu = Tti X synaptic strength to conductance conversion factor vt = input voltage of neuron Jb Uk = output voltage of neuron k Ck — capacitance attributed to the input of neuron k As discussed in previous sections, memories in the Hopfield network and its variants can become unstable. When this occurs, an otherwise error-free probe vector will be contaminated. Unstable memory cannot be eliminated, only their frequency minimized, albeit at a cost of reduced capacity. The models Chapter 3. Unstable Memory Analysis 39 discussed also present a mixed interface to the rest of the system. The programming and probe vectors are digital while the threshold is analog. This complicates system design when the main function of the network is digital and yet an analogue input that is a non-trivial function of the network parameters (m,N,p) is required. Chapter 4 Proposed Associative Memory Architecture 4.1 Model Description The proposed model is based on the JPL model and is designed to address some of the limitations discussed in Section 3.11. The main features of the model are : • guaranteed stable memories, • fully digital implementation, • cascadable to increase capacity, • amenable to discrete time simulation algorithms, • adjustable threshold to take advantage of memory syntax, • use of external RAM permits more neurons per integrated circuit. In the proposed model, neurons are divided into groups. The "one-hot" memory dilution scheme of the JPL models is retained. The synaptic strengths are computed from the diluted memory vectors using sum-of-outer-products rule described by (2.9). The resultant synaptic matrix is stored in an external memory. Networks of different sizes can be accommodated simply by varying the depth and width of the memory. Instead of a continuum of values used in the JPL model, the threshold it here can take on two values; ThoFF if all the neighbours in all the same group as neuron k are all OFF, and ThoN if any of the neighbours is ON. The threshold depends only on the state of the neighbours and is independent of the state of neuron Jfe itself. If ThoN is chosen to be an unattainably large number, groups would be restricted to containing no more than one TRUE bit. The choice of ThoFF is a compromise between reducing spurious TRUE bits, and facilitating desired TRUE bits to turn ON when the current state is moderately far from a memory. Neurons are updated, cyclically, one at a time. The row of the synaptic matrix associated with the selected neuron is read in from the external memory. The signal St = YA=I TkiUi is computed digitally 40 Chapter 4. Proposed Associative Memory Architecture 41 and presented to an external comparator, which compares the signal with the threshold and determines the next state of the neuron. An external comparator can effectively service multiple network modules cascaded to form a large network. Digital techniques are chosen over analogue ones to enhance speed, and compatibility with available CMOS VLSI process technologies, and to ease system integration. The use of digital algorithms allows the construction of simple and fast simulation programs. Until further analytic tools describing the topology of the regions of convergence are developed, simulation is the only avenue to evaluate the performance of a network. Therefore, fast simulation software is vitally important. Due to the high level of fault tolerance of neural network models, noise immunity offered by digital implementations is of little benefit here. It is, definitely, of no hindrance however. The energy function of the proposed model is given by - JV N N k=lt=l t=l and rewriting it explicitly in terms of Uk, one gets j JV JV JV JV ^Uk) = --TkkU^-UkYT^i + ^ U k - Y Y U i T i i U i + X ) T I U I + X T I U I (4-2) »'=1 i=l »=1 »=1 t € {Gk> i*k i * {ahy t?k In (4.2) above, {Gk} denotes the set of neurons in the same group as neuron k. The energy function is parabolic in Uk and concave down. From the discussion in Section 2.1, Tkk can be set to zero with no lost of generality. Then, the energy function is linear in Uk. An update rule that ensures monotonic decrease in energy for (4.2) when neuron k changes state is listed below. i ; T2LiTuUi>tk Uh(t + 1)= Uk(t) ; Y^LiThiUi=tk 0 J Z^iTkiUi ) • ] + w=l (ff - l)(ff - 2) £ Bp(w, m - 1) [1 - (1 - p)«]2 (4.5) 10 = 1 After neuron k has taken on state 0, the group containing neuron k would consists solely of FALSE neurons, and the remaining g—1 groups would be unchanged from the original assumption of correspond-ing to diluted memory V*r exactly. As previously discussed, bit k' of memory Vr is TRUE (VI/ = 1.) The signal received by the neuron is Sf = <=i ^ Vi^t- Consider one of the g — 1 neurons that have state 1, say neuron /. It contributes TfiUi to the signal St1- Expanding Ttn by its sum-of-outer-products definition and recalling that Ui = V{ = 1, one gets TViUi = r(vi,v,1 + vt2,v,2 + ... + vi,vr + -" + vmv/n)vf = T(V^ V,1-|-Vi2,V,2 + -..-|-l-l-|-..--l-VmVm)l = 1 (4.6) Since each of the g—1 TRUE neurons contributes 1 to the signal, St1 has value g — 1 deterrninistically. ThoFF must be set sufficiently far above E[Si] such that neuron k will not remain at value 1 due to noise, but below g — 1 so that neuron k' can attain state 1. A more stringent constraint on the upper bound of ThoFF is related to controlling the extent of the basin of attraction around memories. A large ThoFF implies a small basin, and vice versa. This is because, the signal Sk is proportional to the number of groups in the current state vector matching the target memory. Chapter 4. Proposed Associative Memory Architecture 43 E[Sy + * < ThoFF < 9-1 (4.7) Consider the case where all groups except one correspond to diluted memory Vr, and the remaining group, G, contains all FALSE bits. This is the penultimate configuration of the network prior to converging to any memory, regardless of the initial probe vector. The convergence frequency from any probe vector is limited by the convergence frequency from this configuration. The ability of the network to converge from this state is, therefore, of critical importance. For VJ = 0, the signal Sk impinging on neuron k is formed by summing many pseudo-independent Bernoulli random variables. Its probability density function fsh (x | VJ = 0) can, therefore, be approximated by the Gaussian or Poisson distribution, depending on the mean. On the other hand, given VJT, = 1, neuron k' receives a deterministic signal of 9 — 1. The probability density function is, of course, the Dirac delta function. Figure 4.1 shows the probability density function of 5t for VJ = 0 and VJ = 1. The network will fail to converge to a memory, 0.3 - | 1 1 Figure 4.1: Probability density function of signal Sk for VJ = 0 and VJ = 1 if any of the neurons in group G that should remain at 0 receives a signal greater than ThoFF, and is Chapter 4. Proposed Associative Memory Architecture 44 selected for update befoie the neuron that should change to state 1. The spurious neuron will become TRUE, and prevent the desired neuron from doing so. The probability that a neuron will spuriously receive a signal Si above ThoFF is given by *.= /" fs„(x\Vrk=l)dx (4.8) With randomly generated memories, the desired neuron can be at any position in the group with equal probability p = -fr. In order for a neuron to evaluate to TRUE, all neurons in G selected before it must evaluate to FALSE. Thus, a network will fail to converge to a memory from this penultimate state with probability J V / , As a compromise between convergence at the penultimate state, and extent of the region of attraction, ThoFF is chosen to be a few standard deviations of Si above E[Si]. That is, ThoFF = *(E[Si 2]-E 2[S t]) + E[Si] = xa + ii; 2 c 0.8 -O c CD 0.6 ~ ThOFF = \L+2a Simulation Results: X ThOFF = \L+3o + ThOFF = \i+2o 20 40 60 I 80 100 Memories Stored Figure 4.7: Predicted and Simulated Frequency of Convergence, h = 1, JV = 1024, g = 64 Chapter 4. Proposed Associative Memory Architecture 59 In predicting the frequency of convergence, it is necessary to estimate the probability that a neuron will spuriously receive a larger than expected signal. This probability is given by (4.8). The statistics of the signal received by each neuron is formally similar to that of the truncated Palm model. Both models use (0,1) to represent TRUE and FALSE values, and the synaptic matrix is generated identically. Due to these similarities, the difficulties in fitting a standard function to the probability density function of Sk in the Palm models are also encountered here. The accuracy of the theoretical bit error rate given by (4.8) is not expected to be very high, and is intended to guide the user to a threshold suitable for the application at hand. For large networks, the probability density function of the signal is assumed to be Gaussian. By setting the threshold ThoFF to be a function of the standard deviation of Sk, the close-field frequency of convergence can be made independent of the memory loading. The small deviation from linearity of the predicted frequency is caused by rounding of the threshold to integer values. As can be seen in Figure 4.7, the prediction is better at higher memory loads. This is because the number of terms contributing to the signal Sk is proportional to the number of memories. With a larger number of terms, the distribution of Sk becomes more closely approximated by the Gaussian distribution function. The performance of a network can be characterized by a pair of figures of merit. The first, Chi, measures the frequency of convergence of a network probed by undiluted vectors of Hamming distance one from a memory. As shown in Figure 4.5, the networks converges to the optimal memory with probability close to Chi if the probe is within the region of convergence. The normalized radius of that region, denoted h~crit, is defined, rather arbitrarily, as the normalized Hamming distance at which the frequency of convergence falls to 50%. The dependencies of Chi a n ( i h~crit on the threshold, the size of the network, the dilution factor, and the memory loading are presented below. 4.5.1 Effects of Threshold The character of the proposed network can be adjusted by varying the threshold THOFF- When ThoFF is set to a high value, Chi tends to unity, and the frequency of convergence remains constant around a tight, well defined region around each memory. However, as the Hamming distance of the probe ventures beyond the region of convergence, the performance of the networks falls precipitously. On the other hand, HThoFP is set to a low value, Chi is also lower but the rate at which performance declines as Hamming distance increases is much gentler. The frequency of convergence of a network with ThoFF set at 2, 3, and 4 standard deviations above the mean is plotted in Figure 4.8. The threshold should be Chapter 4. Proposed Associative Memory Architecture 60 set according to the application at hand. If the probes are expected to be relatively error-free, a high threshold is desirable in order to maximize Chi- The smaller region of convergence is of little concern if the probes are very unlikely to fall outside the region. On the other hand, if the network must operate on highly corrupted vectors, a lower threshold is desirable, in order to obtain useful results for high Hamming distance probes, even though the ability to converge at low Harnming distance is somewhat reduced. Figure 4.9 and 4.10 shows simulation results of varying the threshold on Chi and herit for a sample network. In general, setting the threshold using: ThoFF = /i + x(r; 2.0 < x < 4.0 (4.16) offers good compromise between convergence at close proximity and radius of region of convergence. 0 20 4 0 60 8 0 Hamming Distance Figure 4.8: Frequency of Convergence Vs. Hamming Distance; (N = 1024, g = 64, m = 80) Chapter 4. Proposed Associative Memory Architecture 61 u+2o u+3o u+4o Threshold Figure 4.9: Maximum Frequency of Convergence Chi Vs. Threshold; (N = 1024, g = 64, m = 80) K+2o H+3o ji+4o Threshold Figure 4.10: Normalized Radius of Convergence h^n Vs. Threshold; (N = 1024, g = 64, m = 80) Chapter 4. Proposed Associative Memory Architecture 62 The scheme of setting the threshold using (4.16) is intended mainly for large networks with high memory loading. In those cases, the probability density function of the signal Sk is smooth, and conse-quently, the performance of the networks is relatively insensitive to small changes in threshold. However, the mean and standard deviation can both be small if the network is small, memory loading is low, or if the dilution factor is high. The probability density function of Sk is highly discontinuous. The optimal threshold can only be discovered via simulations. For example, a network with N = 384, m = 10, and dilution factor D = 6.4, the mean and standard deviation are 0.096, and 0.35, respectively. Equation (4.16) suggested a threshold of one, while simulations indicate that a threshold of two is optimal. 4.5.2 Effects of Dilution As discussed in Section 4.2, neural network modules can have 4, 8, or 16 groups. The group counts correspond to decoding the original memory vectors by factors of 5 =>• 32,4 =5- 16, and 3 => 8, respectively. The dilution factor of a b 2* decoder is given by D = ^ . For a fixed sized network, the dilution factor affects the length of memory vectors that can be stored. But in practical situations, one often knows the length and number of the memory vectors, and wishes to find the appropriate dilution factor and as a by-product the size of network. For example, one may wish to store 60-bit vectors in the proposed network. Depending on the dilution factor chosen, the network would have either 384, 240, or 160 neurons. These networks are simulated with memory loads of 5, 10, and 30 vectors, and the performance plotted in Figures 4.11 and 4.12. The thresholds are chosen to maximize h„it while maintaining Chi to be at least 90%. In general, the optimal level of dilution is a function of the memory loading. A high memory load requires a larger dilution factor, and a low memory load, a smaller dilution factor. This is the case because, with a large number of memories, a high dilution factor reduces the frequency of TRUE bits, and, thus the amount of noise in the synaptic matrix. Otherwise, with m large, and D small, the synaptic matrix would be populated mainly be ones, and therefore, contain little information. For low memory loads, aliasing is occasionally helpful in extending the region of convergence, thus a small dilution factor is desirable. Chapter 4. Proposed Associative Memory Architecture 63 - T 0.9 - | c CD _ J CT CD fZ 0.85 ~ E "a 0.8 ~ Dilution Factor Figure 4.11: Maximum Frequency of Convergence Chi Vs. Dilution Factor; (I = 60) CD u c CO 2l decode operation. The network then takes advantage of the unique syntax of the diluted vectors in its convergence operation. Consequently, fox a given memory vector length, the neuron count of the proposed network is greater than that of its truncated Hopfield counterpart by a factor of D — \ . At the cost of increased size, the proposed network offers greater storage capacity, and guaranteed stability of memory vectors. By using only unipolar signals, the proposed network is simpler to construct than the truncated Hopfield network which requires bipolar signals. The simplicity gained with unipolar representation can partially offset the effects of dilution. Chapter 5 Conclusions 5.1 Summary Neural network content-addressable memories based on the Hopfield model are susceptible to two failure modes. In the first, the network converges to a spurious stable point that is not one of the stored memories. The second failure mode refers to the situation where the stored memories are not stable points of the network. It is felt that the presence of unstable memories are of particular concern as the network would contaminate an otherwise error-free probe vector. The probability of occurrence of unstable memories was investigated for the analogue and truncated versions of the Hopfield, Denker, and Palm models. The method of analysis was based on Bayes detectors in classical communications theory. As a result of the investigation, an analytic model that predicts the frequency of bit errors and memory errors was developed. The model considers the size of the network, the number of memory vectors stored and the relative frequencies of TRUE and FALSE bits in the memory vectors. Previous researchers have concentrated on the analogue Hopfield model and assumed equal frequencies of TRUE and FALSE bits. Simulations were performed to evaluate the accuracy of the models. It was found that the model yielded excellent agreement for both Hopfield models. The level of agreement is not as good for the Denker and Palm models. However, the analytic model was able to predict the counter-intuitive result of the truncated Palm model having lower error rates than the analogue Palm model. A novel neural network model that guarantees stability of memories has been proposed. The proposed model is based on the JPL model, but is fully digital and optimized for VLSI implementation. The superior convergence ability of the network results from taking advantage of the unique one-hot syntax of the diluted memory vectors. By placing the storage of the synaptic matrix externally, a large number of neurons can be incorporated in a single VLSI circuit, and the total network can be enlarged easily by cascading multiple circuits. Simulations were performed to characterize the proposed network. It was found that at low memory 68 Chapter 5. Conclusions 69 loads, the late of convergence of the proposed network was close to that of the truncated Hopfield model. At high memory loads, the proposed model was clearly superior. However, due to the dilution requirements, the proposed models requires more neurons than the Hopfield models. The dilution factor can be 2.7, 4.0, or 6.4. The optimum amount of dilution is a factor of memory loading, with low dilution for low memory loads and high dilution for high loads. The convergence delay of the proposed model was found to be linear in network size, and is, therefore, formally similar to that of the Hopfield models. 5.2 Recommendations In this work, the memory stability of the Hopfield, Denker and Palm models was studied. It may be possible to extend the analysis to the region around memories. Little analytic information is currently available about this region. Consider the case where a network is at Hamming distance h from the closest memory. The first and second order expectations of the signal impinging on any neuron can be found. With minor modifications to (3.8), the probability that a neuron will change state can be calculated. Any change will either increase the Hamming distance to the nearest memory to h + 1 or decrease it to h — 1. If one maps the Hamming distance to a state in a Markov chain, the result is similar in form to a birth-death process in stochastic theory. One can then determine the characteristics of the region around memories. Methods of increasing the storage or reducing the hardware complexity of the neural network CAM models should be explored. The method of analysis developed in this thesis can be used to characterize these new algorithms. It would be instructive to show analytically whether syntax independent dilution schemes exist. While simulations suggest that the proposed model offers good performance as a content addressable memory, it should be confirmed by implementation in a VLSI circuit. An implementation would verify the speed predictions and hardware costs estimates. The circuit would be the first fully digital, expandable neural network circuit to be constructed. References [1] Y. Abu-Mostafa and J-M. St. Jacques, "Information capacity of the hopfield model," IEEE Trans. Inform. Theory, vol. IT-31, No. 4, pp. 461-464, 1985. [2] S. Bottini, "An algebraic model of an assciative noise-like coding memory," Biol. Cybern. 36, pp. 221-229, 1980. [3] S. Bottini, "An after-Shannon measure of the storage capacity of an associative noise-like coding memory," Biol. Cybern. 59, pp. 151-159, 1988. [4] J. Denker, "Neural network models of learning and adaptation," Physica 22D, pp. 216-232, 1986. [5] H. Graf et al., "VLSI implementation of a neural network memory with several hundreds of neurons," in J. S. Denker (Ed.) AIP Conference Proceedings 151, Neural Networks for Computing, Snowbird Utah, AIP, 1986. [6] H. Graf et al., "VLSI implementation of a neural network model," IEEE Computer, pp. 41-49,1988. [7] J. Hopfield, "Neural networks and physical systems with emergent collective computational abili-ties," Proc. Nail. Acad. Set. USA, vol 79, pp. 2554-2558, 1982. [8] J. Hopfield, "Neurons with graded response have collective computational properties like those of two-state neurons," Proc. Natl. Acad. Sci. USA, vol 81, pp. 3088-3092, 1984. [9] B. Kernighan and D. Ritchie, The C Programming Language, Prentice-Hall, New Jersey, 1978. [10] J. Lambe et al., "Electronic device aspects of neural network memories," Proc. AIAA/ACM/-NASA/IEEE Computers In Aerospace Conference, 1985. [11] EJ. McCluskey, Logic Design Principles, Prentice-Hall, New Jersey, 1986. [12] R. McEliece, "The capacity of the Hopfield associative memory," IEEE Trans. Inform. Theory, vol. IT-33, pp. 461-482, 1987. [13] C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Massachusetts, 1980. 70 References 71 [14] A. Moopenn et al, "Electronic implementation of associative memory based on neural network models," IEEE Trans. Syst. Man Cybern., vol. SMC-17, No. 2, pp. 325-331, 1987. [15] A. Murray and A. Smith, "Asynchronous VLSI neural networks using pulse-strean arithmetic," IEEE J. Solid-State Circuits, vol. 23, No. 3, pp. 688-698, 1988. [16] G. Palm, "On associative memory," Biol. Cybern. 36, pp. 19-31, 1980. [17] E. Parzen, Modern Probability Theory and Its Applications, John Wiley & Sons, New York, 1960. [18] P. Peretto and J. Niez, "Stochastic dynamics of neural networks," IEEE Trans. Syst. Man Cybern., vol. SMC-16, No. 1, pp. 73-83, 1986. [19] R. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1959. [20] D. Rumelhart, and J. McClelland, Parallel Distributed Processing : Explorations in the Microstruc-ture of Cognition, MIT Press, 1986. [21] M. Sivilotti et al., "A novel associative memory implemented using collective computation," 1985 Chapel Hill Conference On Very Large Scale Integration, pp. 329-342, 1985 [22] D. Tank and J. Hopfield, "Simple 'neural' Optimization networks: An A/D Converter, Signal Decision Circuit, and a linear programming circuit," IEEE Trans. Circuits Systems, vol. CAS-33, pp. 533-541, 1986. [23] B. Wooley, ed., IEEE J. Solid-State Circuits - Special Issue on Logic and Memory, vol. 23, No. 5, 1988. [24] R Zeimer and W. Tranter, Principles of Communications, Houghton Mifflin, Boston, 1976. [25] Guide to the Intergated Circuit Implementation Services of the Canadian Microelectronics Corpo-ration, Queen's University, Kingston, 1987. [26] MC68010/MC68012 16-/32-BU Virtual Memory Microprocessors, Motorola Inc., Austin, 1985. [27] Memories Data Book, 1986-87 Edition, Fujitsu Microelectronics Inc., Santa Clara, 1986. [28] The TTL Data Book, Volumes 1, 2, and 3, Texas Instruments Inc., Dallas, 1985. A p p e n d i x A The central idea in evaluating the expectations in Eqn (3.15) and (3.16) is to locate the regions where 7~() is constant. Using the following vector definitions: i = [vi1,vj2,...,v7-1)v7+1,...,vr]T, J = W,V?R..,VT-\VT+\...,Vr]T. X = [^nV..,vr\v;+V..)vr]T, the final term in (3.15) can be written as E If N / m \ /m \ i=l ; = 1 \o=l / \b=l J E ^ V / T ( /?•/+VJVr) -T (K-J + Vkrvj"j |Vj] foranyt^i (A.l) The elements of K can be divided into two sets; the ON-set consisting of elements with value 1, and the OFF-set consisting of elements with value —1. The weight w of K is the cardinality of the ON-set. The probability of K having weight w is Bp(w,m — 1). If the elements of I corresponding to the elements in the ON-set contains x ones, and the elements corresponding to those in the OFF-set contains y ones, then K • I = x — (w — x) + (m — 1) — w — 2y. There a r e ( _ ' ) - ( T O — y _ I vectors conforming to the above format, and each occurs with probability pm+v • gm-1-(«+v)> Similarly, if the elements of J corresponding to the elements in the ON-set contains u ones, and the elements corresponding to those in the OFF-set contains v ones, then K • J = u — (_> — u) + (m — 1) — w — 2v. The function T (ll • I + V^Vf^j is constant in regions where K • 1+ V^V? >l,K-I+VkrV?<-l and K • 7+ V^V^ = 0. The probability of each region for Vhr V? = 1 and V^V^ = -1 are given by: PM(u>) = Pr [i?.J+V?VT >1 | V-VT = _,_,] w min(m-l-_,[^ J-l) £ Bp(x,w) £ Bp(y,m-w-1) (A.2) B=max(0,ui+y-[»J+l) V=0 72 Appendix A. 73 and Pi , _ i H = Pv[K-i + v;v{>i\nv; = -i,w] m-w-l min(m-l-*,,|=r±J-l) E E ^(y,m-«-l) (A.3) «=max(0,v+t U -L=f i J+l) » - ° p _ M H = Pr [.?./+1717 <-i | v;rvr = i,w] min(w, [=^i j - l ) m-w-l E **(*.«0 E Bp(y,m-ti»-l) (A.4) P_I,_I(U>) = Pr[j?-/+v2-v7<-i| v;v7 = -i,«;] min(«.,[=fij) m - w - l E E Bp(y,m-w-1) (A.5) •=° v-max(0,i+ f^r 1] -«<) pO i l(_0 = Pr [/? • 1 + vxv;r = 01 vzv? = l,w] = Even(m) E J?p(z, u>)l?f ^as - w + y , m l j (A.6) 8=max(o ,w- L ^ J ) p 0 , - i W = Pr [ j? . /+v ; v7 = o 1 vjvr = - . , « . ] m i n ^ . L ^ J ) / m \ = Even(m) E tfpfow)-?,, - w - 1 + y,m-tu - lj (A.7) i_max(o ,u /+l— | ^ J ) Appendix A. 74 Using the notation of (A.2) to (A.7) we can write E \V?VJT (K . 1 + VTV7) T (K • / + vj-v/) | vz = 1] = P.x>l(w)P.ltl{w) + Pltl(w)P1A{w) m-l £ w-0 2pq -2P_i,i(u.)Pi,iH Pi,i(u>)P_i,_i W + P - M M P I . - I M -Pi.i W P I . - I H - P_i,i(«)P_i,_i(w) P I , - I H P I , - I H + P_I,_I W P - 1 , . 1 W -2Pli_i(w)P_1,_1(u») Similarly, if VKR = — 1, we have E [vrVJT (it • J + vrv?) r (K . / + v^v/) | = -1] = P _ U H P _ I , I H + Pi,i(w)Pi,iH -2P_x>1(io)Pi,i(tt) PI,I(W)P_I,_IH + P-i.i (w)Pi,_i(w) -Pl,l(«)Pl,_l(tll) -P_l,l(w)P_l,_l(tD) Pi,-i(w)Pi,_i(w) + P - i , - i(w)P_i , _ iH -3P l l-i(w)P_i,_iH + m-l X S P(w,m-l). < w =0 2pg + > (A.8) (A.9) Appendix B Notation Combination oft items chosen s at a time = Bx{s,t) Binomial distribution with probability x, t trial, s successes = ( j ^ x'(l — erfc() Complementary error function. fx(*\v) Probability density function (pdf) of random variable X given condition y. E[x) Expected value of as. Even (at) 1 if x is even, 0 is a? is odd. Pr(*) Probability of event x. Var(r) Variance of x. SO Liapunov energy function. w Nearest integer equal to or greater than x. w Nearest integer equal to or smaller than x. Calculated/simulated bit error rate. 9 Number of groups in memory vector. h Hamming distance. Hamming distance where frequency of convergence falls to 0.5. herit ., her it normalized by information content J. m Number of memories stored in network. me/m. Calculated/simulated memory error rate. 75 Appendix B. Notation 76 p Probability of a memory bit being TRUE. 9 Probability of a memory bit being FALSE, (g = 1 — p). tk Threshold of neuron k. C Memory capacity of network. CHI Frequency of convergence from Hamming distance of 1. D Dilution factor. I Information in diluted memory vector (measured in bits). N Number of neurons in network. P Probe vector. V Diluted version of probe vector. Sk Signal received by neuron k. ThoFF Threshold of neuron if all neighbours in OFF state. Th0N Threshold of neuron if any neighbour in ON state. Uk State of nenron k. Vr Memory vector r. v; Bit k of memory vector r. Diluted version of memory vector r. Bit k of diluted memory vector r. Probability that the product of»independent memory bits is 1.