A L G E B R A I C D E R I V A T I O N O F N E U R A L N E T W O R K S A N D ITS A P P L I C A T I O N S I N I M A G E P R O C E S S I N G by PINGNAN SHI B. A. Sc. (Electrical Engineering), Chongqing University, 1982 A. Sc. (Electrical Engineering), The University of British Columbia, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES THE DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1991 © Pingnan Shi, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Artificial neural networks are systems composed of interconnected simple computing units known as artificial neurons which simulate some properties of their biological counter-parts. They have been developed and studied for understanding how brains function, and for computational purposes. In order to use a neural network for computation, the network has to be designed in such a way that it performs a useful function. Currently, the most popular method of de-signing a network to perform a function is to adjust the parameters of a specified network until the network approximates the input-output behaviour of the function. Although some analytical knowledge about the function is sometimes available or obtainable, it is usually not used. Some neural network paradigms exist where such knowledge is utilized; however, there is no systematical method to do so. The objective of this research is to develop such a method. A systematic method of neural network design, which we call algebraic derivation methodology, is proposed and developed in this thesis. It is developed with an emphasis on designing neural networks to implement image processing algorithms. A key feature of this methodology is that neurons and neural networks are represented symbolically such that a network can be algebraically derived from a given function and the resulting network can be simplified. By simplification we mean finding an equivalent network (i.e., performing the same function) with fewer layers and fewer neurons. A type of neural networks, which we call LQT networks, are chosen for implementing image processing al-gorithms. Theorems for simplifying such networks are developed. Procedures for deriving such networks to realize both single-input and multiple-input functions are given. To show the merits of the algebraic derivation methodology, LQT networks for im-plementing some well-known algorithms in image processing and some other areas are developed by using the above mentioned theorems and procedures. Most of these net-works are the first known such neural network models; in the case there are other known network models, our networks have the same or better performance in terms of compu-tation time. i n Table of Contents Abstract ii List of Tables vii List of Figures viii Acknowledgement x A Glossary of Symbols xi 1 I N T R O D U C T I O N 1 2 B A C K G R O U N D 6 2.1 Artificial Neural Networks 6 2.1.1 Neuron Models 8 2.1.2 Network Models 12 2.2 Image Processing 17 2.2.1 Image Enhancement 17 2.2.2 Image Restoration 26 3 D E F I N I T I O N S A N D N O T A T I O N S 32 4 R E A S O N S F O R A L G E B R A I C D E R I V A T I O N 40 4.1 Drawbacks of the Learning Approach 41 ' 4.2 The Advantages of The Analytical Approach 46 iv 4.2.1 Designing the Hamming Network 47 4.2.2 Designing the Parity Network 48 4.3 The Algebraic Derivation Methodology 50 5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T -W O R K S 52 5.1 Neuron Models 52 5.2 LQT Networks 55 6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S 59 6.1 Network Equivalence 59 6.2 Network Quality 64 6.2.1 Network Depth 64 6.2.2 Network Size 65 6.3 Criterion for Network Simplification 66 7 D E R I V A T I O N P R O C E D U R E S 67 7.1 Realization of SISO Functions 68 7.2 Realization of MISO Functions 80 8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S 88 8.1 Network Realizations of Linear Techniques 88 8.2 Network Realizations of Non-linear Filters 93 8.2.1 Dynamic Range Modification 93 8.2.2 Order Statistic Filtering . 95 8.2.3 Directional Filtering 106 9 N E T W O R K R E A L I Z A T I O N S O F IR T E C H N I Q U E S 110 v 9.1 Network Realizations of Linear Filters 110 9.2 Network Realizations of Non-linear Filters 112 9.3 Comparison with the Hopfield Network 116 10 A P P L I C A T I O N S I N O T H E R A R E A S 119 10.1 Sorting 119 10.2 Communication 126 10.2.1 Improvement over Hamming Network 126 10.3 Optimization 130 10.3.1 Solving Simultaneous Equations 130 10.3.2 Matrix Inversion 132 11 C O N C L U S I O N S 134 11.1 Summary 134 11.2 Contributions 135 11.3 Future Work 137 Appendices 140 A T H E G E N E R A L I Z E D D E L T A R U L E 140 B H O P F I E L D N E T W O R K A L G O R I T H M 142 Bibliography 143 v i List of Tables 4.1 The Exclusive OR 44 4.2 Network Parameters For Solving XOR Problem 44 4.3 Network Parameters For Solving Parity-3 Problem 45 4.4 Network Parameters For Solving Parity-4 Problem 45 4.5 Number of Learning Steps 46 vii List of Figures 2.1 A Biological Neuron 9 2.2 An Artificial Neuron 9 2.3 Typical Activation Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-logic; and (d) Sigmoid 10 2.4 A Three-layer Back-propagation Network 13 2.5 The Hopfield Network 14 2.6 The Hamming Network 15 2.7 Directional Smoothing Filter 21 2.8 Color Image Enhancement 26 3.1 The Schematical Representation of a Neuron 33 3.2 The Schematical Representation of the Input Neuron 35 3.3 Three Ways of Setting Up an Input Neuron 35 3.4 The Back-propagation Network 36 3.5 The Hopfield Network 36 3.6 The Hamming Network 37 4.1 Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem 43 4.2 Realizing Function (4.4) 49 4.3 Realizing Function (4.5) 49 4.4 The Network for Solving the Parity Problem 50 viii 5.1 The Contrast Stretching Function 53 5.2 Typical LQT Network Topologies 55 5.3 Representing Part of a Network 56 5.4 A LQT network 58 8.1 Network Architecture for Linear Filtering 89 8.2 Network Realization of the Order Statistic Function for n = 3 99 8.3 The Schematical Representation of OSnet 99 8.4 The Schematical Representation of Adaptive OSnet 100 8.5 A Max/Median Filter Realized by Using OSNets 101 8.6 A Max/median Filter Realized by Using OSNets 102 8.7 A Network Model of the CS Filter 106 8.8 A Network Realization of the DA Filter for the Case iV = 3 109 9.1 A Network Realization of the Maximum Entropy Filter 114 10.1 A Neural Network for Sorting an Input Array of Three Elements 125 10.2 The Network Model for Implementing the HC Algorithm 129 ix Acknowledgement First of all, I would like to acknowledge the indispensable support of my wife, Angela Hui Peng. She has maintained an environment under which I have been able to devote all my energy to this thesis. Moreover, she did all the drawings in this thesis, which are far better than I could have done. I would like to thank my supervisor, Professor Rabab K. Ward for her guidance and support. I am grateful to her for introducing me to this challenging and fascinating world of artificial neural networks, and providing me the opportunity to explore it. Special thanks are due to Professor Peter Lawrence for his kindness and recommen-dation of a book written by P.M. Lewis, which is invaluable for my research. Thanks are also due to Professor Geoffery Hoffmann, whose courses on neural net-works and immune systems gave me fundamental understandings of the two related fields, and inspiration to continue my research on the former one. Finally, I would like to thank John Ip, Doris Metcalf, Kevin O'Donnell, Robert Ross, Qiaobing Xie, Yanan Yin, and many others in this and other departments who have made my study here pleasant and productive. x A Glossary of Symbols Notation C x1,...,xn | w1,...,w2 3 V 3 u,n 0 < Xx,...,Xn | ltf!,...,«;2 > AxB (Z X i , . . . , X n | 10!,...,102 • {| X U . . . , X n | W1,...,W2 |} o I Explanation Membership relation "belong to" and its negative "does not belong to" Strict containment relations Linear neuron For all There exists one element Union, intersection Empty set Besides the usual "minus" operator, it also denotes set difference Quasi-linear neuron Cartesian product of sets A and B Threshold-logic neuron Either one of the three neuron types / is a function form set X to y Composition of two functions Sets of natural numbers, integers, real numbers, positive real numbers including zero, and binary numbers. Cardinality of the set X, length of the sequence S, absolute value of the relative number x. End of proof xi I End of solution Implication operators A , V , 0 Logic AND, OR, XOR x, x Vector ||x|| L2 norm of vector x X Matrix 1,1 Vector of one's 0,0 Vector of zero's 0,1 Zero or unit matrices {} Denote a set; various forms are {x^, {x\P(x)}, and { x i , x n j , etc. f, g Original and distorted one-dimensional images U, V Original and distorted two-dimensional images W Window for processing images xn Chapter 1 I N T R O D U C T I O N In recent years artificial neural networks have been successfully applied to solve prob-lems in many areas, such as pattern recognition/classification [60, 70], signal processing [24, 62], control [71, 9], and optimization [35, 37]. The essence "of neural network problem solving is that the network performs a function which is the solution to the problem. Currently, the most popular method of designing a network to perform a function is through a learning process, in which parameters of the network are adjusted until the network approximates the input-output behaviour of the function. Although some ana-lytical knowledge about the function is sometimes available or obtainable, it is usually not used in the learning process. There are some neural network paradigms where such knowledge is utilized; however, there is no systematic method to do so. It is the objective of this thesis to develop such a method. Many tasks, such as those in image processing, require high computational speed. Examples are real-time TV signal processing and target recognition. Conventional von-Neumann computing machines are not adequate to meet the ever increasing demand for high computational speed [66]. Among non-conventional computing machines, artificial neural networks provide an alternative means to satisfy this demand [19]. The study of artificial neural networks has been inspired by the capability of the nervous system and motivated by the intellectual curiosity of human beings to understand themselves. The nervous system contains billions of neurons which are organized in networks. 1 Chapter 1. INTRODUCTION 2 A neuron receives stimuli from its neighboring neurons and sends out signals if the cu-mulative effect of the stimuli is strong enough. Although a great deal about individual neurons is known, little is known about the networks they form. In order to under-stand the nervous system, various network models, known as artificial neural networks, have been proposed and studied [78, 52, 33, 12, 21]. These networks are composed of simple computing units, referred to as artificial neurons since they approximate their biological counterparts. Although these networks are extremely simple compared with the complexity of the nervous system, some exhibit similar properties. In addition to understanding the nervous system, artificial neural networks (referred to as neural networks hereafter) have been suggested for computational purposes. Mc-Culloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was designed with the intention for computational purposes. Widrow proposed in 1960 [95] a network model called Madaline, which has been successfully used in adaptive signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his colleagues rediscovered the generalized delta rule [80], which enables a type of net-works known as the back-propagation networks, or multi-layer perceptrons, to implement any arbitrary function up to any specified accuracy [30]. These contributions have laid down the foundation for applying neural networks to solve a wide range of computational problems, especially those digital computers take a long time to solve. From the computational point of view, a neural network is a system composed of interconnected simple computing units, each of which is identical to the others to some extent. Here follows an important question: How can these simple computing units be used collectively to perform complex computations which they can not perform individ-ually? Chapter 1. INTRODUCTION 3 This problem can be attacked from two directions: (1) assuming a network model (usually inspired by certain parts of the nervous system), adjusting its parameters until it performs a given function; (2) assuming a set of simple computing units, constructing a network in such a way that it performs a given function. The application of the back-propagation networks is an example of the former approach; the designing of the Hopfield network is an example of the latter approach. Due to reasons given in Chapter Four, we have chosen the second approach. Most existing network models were designed for the purpose of understanding how the brain works. As a result, these models are restricted by the characteristics of the nervous system. We, however, are mainly concerned with applying some computing principles believed to be employed in the nervous system, such as collective computing, massive parallelism and distributed representation, to solve computational problems. Consequently, the network models we are going to develop shall be restricted only by today's hardware technology. Therefore, a more precise definition of the task we are undertaking is due here. The problem we are concerned with is, assuming a finite supply of simple computing units, how do we organize them in a computationally efficient way such that a network so formed can perform a given function. By simple, we mean that these units can be mass produced with today's hardware technology. By computationally efficient, we mean that the network so formed can perform the function in a reasonably short period of time. The latter restriction is necessary not only because computational speed is our main concern but also because there exist an infinite number of networks which can perform the same function. Although a lot can be learnt from the works of researchers such as Kohonen, Hop-field, and Lippmann on the constructing processes of their network models, a systemat-ical method of network design has not yet been developed. They relied mainly on their Chapter 1. INTRODUCTION 4 imagination, intuition, as well as inspiration from their knowledge of the nervous sys-tem. However, to effectively use neural networks as computing machines, a systematical method of network design is very much desired. It is our objective to develop such a method. In this thesis, we propose a methodology of network design, which consists of the following five stages: 1. Find the minimum set of neuron models for a given class of functions; 2. Devise symbolic representations for these neurons and their networks; 3. Establish theorems for manipulating these symbols based on the computational properties of the neurons and their networks; 4. Establish procedures for deriving neural networks from functions; 5. Use these procedures and theorems to derive and simplify network models for spec-ified functions. We call this methodology algebraic derivation of neural networks. This methodology is developed with an emphasis on deriving neural networks to realize image processing techniques. Procedures and theorems for network derivation are developed. They are used to construct network models for realizing some image processing techniques and techniques in other areas. The results we have obtained show that our methodology is effective. This thesis consists of eleven chapters. Chapter Two provides some background in-formation on neural networks and image processing. The materials in the rest of the thesis are believed to be new and are organized as follows. Chapter Three gives formal definitions of neurons, neural networks, and related concepts. Chapter Four elaborates Chapter 1. INTRODUCTION 5 on the reasons for our approach; we shall show the limitations of learning as used in back-propagation networks and the limitations of conventional network design method as exemplified by the Hamming network. Chapter Five finds a set of neuron models which are used to form networks for realizing image processing techniques, and provides symbolic representations for these neurons and their networks. Chapter Six develops theorems for network simplification based on some basic computational properties of the neurons and their networks. Chapter Seven gives procedures of algebraic derivation and explains them through some examples, which in turn are useful in network derivations in later chapters. Chapters Eight and Nine show examples of deriving neural networks to realize image enhancement and restoration techniques respectively. Chapter Ten con-tains examples of deriving neural networks for realizing some techniques in other areas. Chapter Eleven concludes our work and speculates on the future work. Chapter 2 B A C K G R O U N D This chapter gives necessary background information on neural networks and image pro-cessing. Section 2.1 gives an overview of neural networks with an emphasis on their appli-cations for computing. Concepts of neurons and neural networks are introduced. Some neuron and neural network models are described. Among the network models proposed in literature, the back-propagation network, the Hopfield network, and the Hamming network are chosen to be described in more detail since they are good representatives of their classes and also they will be referred to further in later chapters. Section 2.2 gives an overview of problems in image enhancement and restoration and techniques used to solve them. 2.1 Artificial Neural Networks People have long been curious about how the brain works. The capabilities of the nervous system in performing certain tasks such as pattern recognition are far more powerful than today's most advanced digital computers. In addition to satisfying intellectual curiosity, it is hoped that by understanding how the brain works we may be able to create machines as powerful as, if not more powerful than, the brain. The nervous system contains billions of neurons which are organized in networks. A neuron receives stimuli from its neighboring neurons and sends out signals if the cu-mulative effect of the stimuli is strong enough. Although a great deal about individual neurons is known, little is known about the networks they form. In order to understand 6 Chapter 2. BACKGROUND 7 the nervous system, various network models, known as artificial neural networks, have been proposed and studied. These networks are composed of simple computing units, referred to as artificial neurons since they approximate their biological counterparts. Work on neural networks has a long history. Development of detailed mathematical models began more than 40 years ago with the works of McCulloch and Pitts [63], Hebb [29], Rosenblatt [78], Widrow [95] and others [74]. More recent works by Hopfield [33, 34, 35], Rumelhart and McClelland [80], Sejnowski [82], Feldman [18], Grossberg [27], and others have led to a new resurgence of the field in the 80s. This new interest is due to the development of new network topologies and algorithms [33, 34, 35, 80, 18], new analog VLSI implementation techniques [64], and some interesting demonstrations [82, 35], as well as by a growing fascination with the functioning of the human brain. Recent interest is also driven by the realization that human-like performances in areas such as pattern recognition will require enormous amounts of computations. Neural networks provide an alternative means for obtaining the required computing capacity along with other non-conventional parallel computing machines. In addition to understanding the nervous system, neural networks have been suggested for computational purposes ever since the beginning of their study. McCulloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was designed with the intention for computational purposes. Widrow proposed in 1960 [95] a network model called Madaline, which has been successfully used in adaptive signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his colleagues rediscovered the generalized delta rule [80], which enables a type of networks known as the back-propagation networks to implement any arbitrary function up to any specified accuracy. In European, researchers such as Aleksander, Caianiello, and Kohonen Chapter 2. BACKGROUND 8 have made conscious applications of computational principles believed to be employed in the nervous system [3]. Some commercial successes have been achieved [4]. These contributions have laid down the foundation for applying neural networks to solve a wide range of computational problems, especially those digital computers takes a long time to solve. 2.1.1 Neuron Models The human nervous system consists of IO 1 0 - 1 0 u neurons, each of which has 103 -104 connections with its neighboring neurons. A neuron cell shares many characteristics with the other cells in the human body, but has unique capabilities to receive, process, and transmit electro-chemical signals over the neural pathways that comprise the brain's communication system. Figure 2.1 shows the structure of a typical biological neuron. Dendrites extend from the cell body to other neurons where they receive signals at a connection point called a synapse. On the receiving side of the synapse, these inputs are conducted to the body. There they are summed, some inputs tending to excite the cell, others tending to inhibit it. When the cumulative excitation in the cell body exceeds a threshold, the cell "fires"—sending a signal down the axon to other neurons. This basic functional outline has many complexities and exceptions; nevertheless, most artificial neurons model the above mentioned simple characteristics. A n artificial neuron is designed to mimic some characteristics of its biological counter-part. Figure 2.2 shows a typical artificial neuron model. Despite the diversity of network paradigms, nearly all neuron models — except for few models such as the sigma-pi neuron (see [15]) — are based upon this configuration. In Figure 2.2, a set of inputs labeled (xi,X2, ...,XN) is applied to the neuron. These inputs correspond to the signals into the synapses of a biological neuron. Each signal Chapter 2. BACKGROUND 9 From other neurons Dendrites To other neurons Figure 2.1: A Biological Neuron XN (Ol (Oi f(.) Activation Fuction Figure 2.2: A n Art i f i c ia l Neuron Xi is mult ipl ied by an associated weight w, 6 i w 2 , W N } , before it is applied to the summation block, labeled Each weight corresponds to the "strength" of a single biological synapse. The summation block, corresponding roughly to the biological cell body, adds up al l the weighted inputs, producing an output e = Yl? w%Xi. e is then compared wi th a threshold t. The difference a = e — t is usually further processed by an activation function to produce the neuron's output. The activation function may be a simple linear function, a threshold-logic function, or a function which more accurately simulates the nonlinear transfer characteristic of the biological neuron and permits more Chapter 2. BACKGROUND 10 general network functions. Most neuron models vary only in the forms of their activation functions. Some ex-amples are the linear, the quasi-linear, the threshold-logic, and the sigmoid activation functions. The linear activation function is fL(ct) = a The quasi-linear activation function is a if a > 0 0 otherwise The threshold-logic activation function is 1 if a > 0 0 otherwise M<x) = A n d the sigmoid activation function is Figure 2.3 shows these four activation functions. f(cx) A l • a f(a) • 1 a - • a (b) (c) f(a) • (d) (2.1) (2.2) (2.3) (2.4) a Figure 2.3: T y p i c a l Act ivat ion Functions: (a) Linear; (b) Quasi-linear; (c) Thresh-old-logic; and (d) Sigmoid. The threshold-logic and the sigmoid neurons are the most commonly used neuron models among all the models proposed i n the literature. Linear neuron model was used Chapter 2. BACKGROUND 11 in the early days of neural network study. Quasi-linear neuron model is rarely seen in the literature. The threshold-logic neuron model was used by McCulloch and Pitts [63] to describe neurons which were later known as McCulloch-Pitts neurons. Rosenblatt later used this model to construct the well-known Perceptron [79]. Lewis and Coates also used this model, which they called T-gate, to form networks to implement logic functions [57]. The Hopfield network proposed by Hopfield [33] also uses the threshold-logic neurons as the primitive processing units. The sigmoid neuron model was used out of necessity by Rumelhart and his colleagues to construct what is later known as the back-propagation network [80]. This neuron model is used for the convenience of back-propagating errors. Variants of this model are used in networks which are trained by error back-propagation. The linear neuron model was used by Kohonen and Anderson respectively to construct networks known as linear associators [53, 5]. Since networks of linear neurons are easy to analyze and their capacities won't increase as the number of layers increases, they are not academically challenging. Nevertheless, the practical use of linear neurons can not be overlooked. They are important in performing certain tasks such as matrix computations. Moreover, they are the basis of other more complex neuron models. The quasi-linear neuron model is used in this thesis as a gating device (see Chapter Seven). This model can be viewed as the neuron model, which is sometimes referred to as threshold-logic node (see [58]), without saturation. The quasi-linear activation function was used by Fukushima to construct S-cell which is one of the primitive processing units of his network model (see [21]). Chapter 2. BACKGROUND 12 2.1.2 Network Models Although a single neuron can perform certain pattern detection functions, the power of neural computing comes from connecting neurons into networks. Larger, more complex networks generally offer greater computational capabilities. According to their structures, neural networks are classified to two categories: feedforward and feedback (recurrent) networks. Feedforward Networks In feedforward networks, neurons are arranged in layers. There are connections between neurons in different layers, but no connection between neurons i n the same layer. The connection is unidirectional. The output of a feedforward network at time k depends only on the network input at time (k — d), where d is the processing time of the network. In other word, a feedforward network is memoryless. M a n y neural network models belong to this class, examples are Perceptron, back-propagation network, self-organizing feature map [52], counter-propagation network [30], Neocognitron, and functional-link network [50]. In the following, the back-propagation network is described since it is the most popular one i n neural network applications. The back-propagation network was introduced by Rumelhart and his P D P group [80]. The neuron model used i n this network is the sigmoid neuron. Figure 2.4 shows a three-layer back-propagation network. The back-propagation network is used to implement a mapping from a set of patterns (input patterns) to another set of patterns (output patterns). The set composed of both sets of input and output patterns is referred to as the training set. The implementation is done by obtaining parameters (weights and thresholds) of a network through a error-back-propagation training procedure known as the generalized delta rule (see Appendix Chapter 2. BACKGROUND 13 v i y, y„ Xi X; XN Figure 2.4: A Three-layer Back-propagation Network A). The training begins by initializing the parameters with random values. Then an input pattern is fed to the network. The stimulated output pattern is compared with the desired pattern. If there is disagreement between these two patterns, the difference is used to adjust the parameters until the output pattern matches the desired pattern. This process is repeated for all the input patterns in the training set until a convergence criterion is satisfied. After the training, the network is ready to work. The back-propagation network has been applied in many areas including pattern recognition [82, 72, 17, 80] and image processing [84]. It has been found to perform well in most cases. Nevertheless, it has some practical problems, one of which is that it usually takes a long period of time to train, especially if 100% accuracy is required. Chapter 2. BACKGROUND 14 Feedback Networks Networks in which there are connections between neurons in the same layer and/or from neurons in the later layers to that of earlier layers are called feedback or recurrent networks. Examples are the Hopfield network [33], the Hamming network [58], the ART networks [12], and the Bidirectional Associative Memories (BAM) [54]. In the following the Hopfield network and the Hamming network are described in more detail. Figure 2.5 shows the structure of a discrete Hopfield network. The neurons in such a network are threshold-logic neurons. The network parameters are obtained through a simple procedure (see Appendix B). Figure 2.5: The Hopfield Network After all the parameters are set up, the network is initialized with an input pattern (normally a binary pattern). Then the output of the network is feedback to its input, and the network iterates until it converges. The output after the convergence is the "true" network output. The Hopfield network was originally used as an associative memory [33]. The network "memorizes" some patterns which are used as exemplars. When presented with a pattern, the network will produce the exemplar which most resembles the input pattern. The Hopfield network has also been applied to various areas including optimization Chapter 2. BACKGROUND 15 [35, 36, 89] and image processing [99, 83]. The Hopfield network is a "pure" feedback network in the sense that there is only one layer of neurons and every neuron can feed its output to anyone else. There are other networks which are mixtures of the feedforward networks and the "pure" feedback networks. The Hamming network proposed by Lippmann [58] is an example of such networks. Figure 2.6 shows the structure of a Hamming network. The upper sub-network has a feedback structure, and the lower sub-network has a feedforward structure. The neuron model used in this network is the threshold-logic node but all the neurons are operating in the linear range. Hence, they are virtually linear neurons. upper subnet picks maximum lower subnet calculate matching scores Xo X l Xn-2 Xn-1 Figure 2.6: The Hamming Network The Hamming network is used on problems where inputs are generated by selecting an Chapter 2. BACKGROUND 16 exemplar and reversing bit values randomly and independently. This is a classic problem in communication theory, which occurs when binary fixed-length signals are sent through a memoryless binary symmetric channel. The optimum minimum error classifier in this case calculates the Hamming distance to the exemplar for each class and selects that class with the minimum Hamming distance [23]. The Hamming distance is the number of bits in the input which do not match the corresponding exemplar bits. The Hamming network implements this algorithm. For the network shown in Figure 2.6, parameters in the lower subnet are set such that the matching scores generated by the outputs of the lower subnet are equal to ./V minus the Hamming distances to the exemplar patterns. These matching scores range from 0 to the number of elements in the input. They are the highest for those neurons corresponding to classes with exemplars that best match the input. Parameters in the upper subnet are fixed. All the thresholds are set to zero. Weights from each neuron to itself are 1; weights between neurons are set to a small negative value. After all the parameters have been set, a binary pattern is presented at the bottom of the Hamming network. It must be presented long enough to allow the outputs of the lower subnet to settle, and initialize the output values of the upper subnet. The lower subnet is then removed and the upper subnet iterates until the output of only one neuron is positive. Classification is then complete and the selected class is that corresponding to the neuron with a positive output. The Hamming network has been used as associative memory, as which the Hamming network has some advantages over the Hopfield network [59].. Chapter 2. BACKGROUND 1 7 2.2 Image Processing Images consist a large percentage of information people daily receive. Since images are often degraded due to the imperfection in the recording process, it is necessary to process these images to reveal the true information they contain. Image processing includes several classes of problems. Some basic classes are image representation and modeling, image enhancement, image restoration, image reconstruc-tion, and image data compression. In this thesis, only image enhancement and image restoration are considered. Hence, the term image processing hereafter is used to refer to both areas of image enhancement and image restoration unless specified otherwise. Image processing techniques may be divided into two main categories: transform-domain methods, and spatial-domain methods. Approaches based on the first category basically consist of computing a two-dimensional transform (e.g. Fourier or Hadamard transform) of the image to be processed, altering the transform, and computing the inverse to yield an image that has been processed in some manner. Spatial-domain tech-niques consist of procedures that operate directly on the pixels of the image in question. In this thesis, both categories are introduced with an emphasis on the latter one. 2.2.1 Image Enhancement Image enhancement techniques are designed to improve image quality for human viewing. This formulation tacitly implies that an intelligent human viewer is available to recognize and extract useful information from an image. This viewpoint also defines the human as a link in the image processing system. Image enhancement techniques include contrast and edge enhancement, pseudocoloring, noise filtering, sharpening, magnifying, and so forth. Image enhancement is useful in feature extraction, image analysis, and visual Chapter 2. BACKGROUND 18 information display. The enhancement process itself does not increase the inherent infor-mation content in the data, but rather emphasizes certain specified image characteristics. Enhancement algorithms are generally interactive and application-dependent. Before we start the overview of image enhancement techniques, some notations have to be explained here. U represents the image to be processed and V represents the enhanced image. The ij th element of U is denoted as U;J or u(i, j); similarly, or v(i,j) is the ij th element of V . A window is denoted as W and the window size is denoted as i n The following is an overview of some image enhancement techniques. Point Operations Point operations are zero memory operations where a given gray level u 6 [0, L] is mapped into a grey level v £ [0, L] according to a transformation, that is, v = f(u) (2.5) Contrast Stretching Low-contrast images occur often due to poor or nonuniform light conditions or due to nonlinearity or small dynamic range of the imaging sensor. A typical contrast stretching transformation is au, 0 < u < a /3(u-a) + va, a<u<b (2.6) 7(u — b) + Vb, b < u < L The slope of the transformation is chosen greater than unity in the region of stretch. The parameters a and b can be obtained by examing the histogram of the image. For example, the gray scale intervals where pixels occur most frequently would be stretched most to improve the overall visibility of a scene. v = < Chapter 2. BACKGROUND 19 Cl ipping and Thresholding A special case of contrast stretching where a = 7 = 0 is called clipping. This is useful for noise reduction when the input signal is known to lie in the range [a, b]. Thresholding is a special case of clipping where a = b — t and the output becomes binary. For example, a seemingly binary image, such as a printed page, does not give binary output when scanned because of sensor noise and background illumination varia-tions. Thresholding is used to make such an image binary. Intensity Level Slicing This technique permits segmentation of certain gray level regions from the rest of the image. It is useful when different features of an image are contained in different gray levels. If the background is not wanted, the technique is v Otherwise, it is v Histogram Model ing The histogram of an image represents the relative frequency of occurrence of the various gray levels in the image. Histogram-modeling techniques modify an image so that its histogram has a desired shape. This is useful in stretching the low-contrast levels of images with narrow histograms. Histogram modeling has been found to be a powerful technique for image enhancement [41, 20]. Spatial Operations Many image enhancement techniques are based on spatial operations performed on lo-cal neighborhoods of input pixels. Often, the image is convolved with a finite impulse response filter called spatial mask. L, a < u < b = < (2.7) 0, otherwise L, a < u <b ~ (2.8) u, otherwise Chapter 2. BACKGROUND 20 Noise Smoothing It is desirable to remove the noise out of a noise degraded picture. An typical smoothing technique is the so-called spatial averaging. Here each pixel is replaced by a weighted average of its neighborhood pixels, that is v(m, n) = y ^ y ^ a(k,l)u(m — k,n — /) (2.9) W is a suitably chosen window, and a(k,l) are the filter weights. A common class of spatial averaging filters has all equal weights, giving v(m,n) = -^YS2 u(m-k,n-l) (2.10) w {k,i)ew where N\y is the number of pixels in the window W, i.e., Nw = \W\. Another spatial averaging filter used often is given by v(m,n) = -(u(m,n) + - (u(m — 1,n) + u{m + l ,n) + u(m,n — l) + u(m,n + 1))) (2.11) that is, each pixel is replaced by its average with the average of its nearest four pixels. Although the spatial averaging can smooth a picture, it also blur the edges. To protect the edges from blurring while smoothing, a directional averaging filter can be useful [26]. Spatial average v(m, n : 6) are calculated in several directions (see Figure 2.7) as v(m, n : 6) = - J r u(m-k,n-l) (2.12) and a direction 9* is found such that \u(m,n) — u(m,n : #*)| is minimum. Then v(m,n) = v(m,n : 0*) (2.13) gives the desired result. Median Filtering Median filtering is a non-linear process useful in reducing impulse or salt-and-pepper noise [90]. Here the input pixel is replaced by the median of the pixels contained in a window around the pixel, that is u(m, n) = median{ii(m — k, n — I) : (k, I) € W} (2-14) Chapter 2. BACKGROUND 21 We e * - 1 k Figure 2.7: Directional Smoothing Filter where W is a suitably chosen window. The algorithm for median filtering requires ar-ranging the pixel values in the window in increasing or decreasing order and picking the middle value. Generally the window size is chosen so that \W\ is odd. If \W\ is even, then the median is taken as the average of the two values in the middle. Unsharp Masking The unsharp masking technique is used commonly in the print-ing industry for crispening the edges [75]. A signal proportional to the unsharp, or low-pass filtered, version of the image is subtracted from the image. This is equivalent to adding the gradient, or a high-pass signal, to the image. In general the unsharp masking operation can be represented by where A > 0 and g(m,n) is a suitably defined gradient at (m, n). A commonly used gradient function is the discrete Laplacian. g(m, n) = u(m, n) — ^(u(m — 1, n) + u(m, n — 1) + u(m + l,n) + ii(m, n — 1)) (2.16) Low-pass, Band-pass, and High-pass Filtering Low-pass filters are useful for noise smoothing and interpolation. High-pass filters are useful in extracting edges and in u(m,n) + \g(m, n) (2.15) Chapter 2. BACKGROUND 22 sharpening images. Band-pass filters are useful in the enhancement of edges and other high-pass image characteristics in the presence of noise. If hijp(rn,n) denotes a FIR low-pass filter, then a FIR high-pass filter, hjjp(rn,n), can be defined as hffp(m,n) = 8(m,n) — hLp(m,n) (2-17) where <S(m,n) is a two-dimensional delta function defined as 1 if m = n — 0 tS(m, n) (2.18) 0 otherwise Such a filter can be implemented by simply subtracting the low-pass filter output from its input. Typically, the low-pass filter would perform a relatively long-term spatial average (for example, on a 5 x 5, 7 x 7, or larger window). A spatial band-pass filter can be characterized as hBp(m, n) = hLl (m, n) - hL2 (m, n) (2.19) where and denote the FIRs of low-pass filters. Typically, and would represent short-term and long-term averages, respectively. Zooming Often it is desired to zoom on a given region of an image. This requires taking an image and displaying it as a larger one. Typical techniques of zooming is replication and linear interpolation [45]. Replication is a zero-order hold where each pixel along a scan line is repeated once and then each scan line is repeated. This is equivalent to taking a M x N image and interlacing it by rows and columns of zeros to obtain a 2M x 2N matrix and convolving the result with an array H, defined as A I" 1 1 H = (2.20) 1 1 Chapter 2. BACKGROUND 23 (2.22) This gives u(m,n) = u(M), k = [ y ] , ' = |y] , m,n = 0,1,2,... (2.21) Linear interpolation is a first order hold where a straight line is first fitted in between pixels along a row. Then pixels along each column are interpolated along a straight line. For example, for a 2 x 2 magnification, linear interpolation along rows gives ui(m,2n) = u(m,n), 0 < m < M - 1, 0 < n < TV - 1 vi(m,2n + l) = |[u(m,n) + u(m,n + 1)], 0 < m < M - 1, 0 < n < iV - 1 Linear interpolation of the preceding along columns gives the result as «i(2m,n) = vi(m,n), 0 < m < M - 1, 0 < n < N - I ui(2m + 1, n) = i[t>!(m,n) + vx(m + l,n)], 0<m<M-l,0<n<N~l Here it is assumed that the input image is zero outside [0,M — 1] x [0, N — 1]. The above result can also be obtained by convolving the 2M x 2N zero interlaced image with the array H (2.23) H i i i 4 2 4 i 1 i 1 1 1 L 4 2 4 (2.24) whose origin (m = 0, n = 0) is at the center of the array. In most of the image processing applications, linear interpolation performs quite satisfactorily. High-order (say, p) inter-polation is possible by padding each row and each column of the input image by p rows and p columns of zeros, respectively, and convolving it p times with H. For example p — 3 yields a cubic spline interpolation in between the pixels. Transform Operations In the transform operation enhancement techniques, zero-memory operations are per-formed on a transformed image followed by the inverse transformation. We start with Chapter 2. BACKGROUND 24 the transformed image U ' = {u'(k, /)} as U' = A U A r (2.25) where U = {u(m,n)} is the input image. Then the inverse transform of v'(k,l) = f(u'(k,l)) (2.26) gives the enhanced image as V = A-1V'[AT]-1 (2.27) The transform can be Discrete Fourier Transfer (DFT) or other orthogonal transforms. Generalized Linear Filtering In generalized linear filtering, the zero-memory transform domain operation is a pixel-by-pixel multiplication u'(k,l) = g(k,l)u'(k,l) (2.28) where g(k, I) is called a zonal mask. A filter of special interest is the inverse Gaussian filter, whose zonal mask for TV x N images is defined as g(k,l) = l P l 2S2 J ' - 2 ( 2 2 9 ) g(N — k, N — /), otherwise for the case when A in (2.25) is DFT. This is a high-frequency emphasis filter that restores images blurred by atmospheric turbulence or other phenomena that can be modeled by Gaussian PSFs. Root Filtering The transform coefficients u'(k, I) can be written as u'(M) = K ( M ) | e j m ° (2.30) In root filtering, the a-root of the magnitude component of u'(k, I) is taken, while retain-ing the phase component, to yield v'(k, I) = \u'(k, l)\ae je^ ' l\ 0 < a < 1 (2.31) Chapter 2. BACKGROUND 25 For common images, since the magnitude of u'(k, I) is relatively smaller at higher spatial frequencies, the effect of a-rooting is to enhance higher spatial frequencies relative to lower spatial frequencies. Homomorphic Filtering If the magnitude term in (2.31) is replaced by the loga-rithm of |w'(A:,/)| and define v'(kJ)^[log(\u'(k,l)\)}e^ (2.32) then the inverse transform of v'(k,l), denoted by u(m,ra), is called the generalized cep-strum of the image. In practice a positive constant is added to |u'(fc,Z)| to prevent the logarithm from going to negative infinity. The image t>(ra, n) is also called the general-ized homomorphic transform, H, of the image u(m,n). The generalized homomorphic linear filter performs zero-memory operations on the transform of the image followed by inverse .//-transform. The homomorphic transformation reduces the dynamic range of the image in the transform domain and increases it in the cepstral domain [39]. Pseudocoloring In addition to the requirements of monochrome image enhancement, color image en-hancement may require improvement of color balance or color contrast in a color image. Enhancement of color images becomes a more difficult task not only because of the added dimension of the data but also due to the added complexity of color perception [55]. A practical approach to developing color image enhancement algorithms is shown in Figure 2.8. The input color coordinates of each pixel are independently transformed into another set of color coordinates, where the image in each coordinate is enhanced by its own (monochrome) image enhancement algorithm, which could be chosen suitably from the foregoing set of algorithms. The enhanced image coordinates T[, T£ are inverse transformed to R', G', B' for display. Since each image plane Tk(m,n), k G {1,2,3}, Chapter 2. BACKGROUND 26 T i ( monochrome •li „ R' image enhancemant inverse oordinate sformation coordinate T 2 monochrome Ti ( inverse oordinate sformation G* cd comversion image enhancemant inverse oordinate sformation Cl. cn ° S B monochrome T3' ( B' ( image enhancemant Figure 2.8: Color Image Enhancement is enhanced independently, care has to be taken so that the enhanced coordinates T'k are with the color gamut of the R-G-B system. The choice of color coordinate system Tk, k 6 {1,2,3}, in which enhancement algorithms are implemented may be problem-dependent. 2.2.2 Image Restoration Any image acquired by optical, electro-optical or electronic means is likely to be degraded by the sensing environment. The degradation may be in the form of sensor noise, blur due to camera misfocus, relative object-camera motion, random atmospheric turbulence, and so on. Image restoration is concerned with filtering the observed image to minimize the effect of degradations. The effectiveness of image restoration filters depends on the extent and the accuracy of the knowledge of the degradation process as well as on the filter design criterion. Image restoration techniques are classified according to the type of criterion used. Image restoration differs from image enhancement in that the latter is concerned more with accentuation or extraction of image features rather than restoration of degradations. Image restoration problems can be quantified precisely, whereas enhancement criteria are Chapter 2. BACKGROUND 27 difficult to represent mathematically. Consequently, restoration techniques often depend only on the class or ensemble properties of a data set, whereas image enhancement techniques are much more image dependent. The most common degradation or observation model is g = H f + n (2.33) where g is the observed or degraded image, f is the original image and n is the noise term. The objective of image restoration is to find the best estimate $ of the original image f based on some criterion. Unconstrained Least Squares Filters From Equation (2.33), the noise term in the degradation model is given by n = g - H f (2.34) In the absence of any knowledge about n, a meaningful criterion is to seek an f such that H f approximates g in a least-squares sense, that is, J(f) = ||g-Hf||2 (2.35) is minimum, where ||x|| is the L2 norm of vector x. Inverse Filter Solving Equation (2.35) for f yields f = ( H 2 ^ ) " 1 ^ (2.36) If H is a square matrix and assuming that H - 1 exists, Equation (2.36) reduced to f = H _ 1 g (2.37) This filter is called inverse filter [26]. Chapter 2. BACKGROUND 28 Constrained Least-Squares Filters In order that the restoration filters have more effect than simply inversions, a constrained least-square filter might be developed in which the constraints allow the designer addi-tional control over the restoration process. Assuming the norm of the noise signal ||n||2 is known or measurable a posteriori from the image, the restoration problem can be reformulated as minimizing ||Qf||2 subject to ||g — H f ||2 = ||n||2. By using the method of Langrange multipliers, the restoration problem becomes finding f such that J(f) = ||Qf||2 + a(||g-Hf||2-||n||2) (2.38) is minimum. The solution to Equation (2.38) is f = ( H T H + 7 Q T Q ) - 1 H r g (2.39) where 7 — —. Pseudo-inverse Filter. If it is desired to minimize the norm of f, that is, Q = I, then an estimate f is given by f = ( H T H + 7 I ) - 1 H T g (2.40) In the limit as 7 —> 0, the resulting filter is known as the pseudo-inverse filter [2]. Wiener Filter. Lef Rf and R n be the correlation matrices of f and n, defined respectively as Rf = £ { f f r } (2.41) and R n = £{nn r } (2.42) Chapter 2. BACKGROUND 29 where E{.} denotes the expected value operation. By defining Q r Q 4 R ^ R Q (2.43) and substituting this expression to Equation (2.39), we obtain f = ( H T H + 7 R f 1 R n ) - 1 H T g (2.44) which is known as the Wiener filter[32]. M a x i m u m Entropy Filter. If the image f is normalized to unit energy, then each /,• scalar value can be interpreted as a probability. Then the entropy of the image f would be given by Entropy = - f T In f (2.45) where lnf refers to componentwise natural logarithms, that is lnf = ( l n / 1 , l n / 2 , . . . , l n / N ) T (2.46) If constrained least-squares approach is applied as before, then the negative of the entropy could be minimized subject to the constraint that ||g — H f ||2 = ||n||2. Thus the objective function becomes J(f) = Flnf - a(||g - H f|| 2 -||n|| 2 ) (2.47) The solution to this equation is f = exp{- l - 2aH T (g - Hf)) (2.48) Baysian Methods In many imaging situations, for instance, image recording by film, the observation model is non-linear as g = s{Hf} + n (2.49) Chapter 2. BACKGROUND 30 where -s{x} is a componentwise non-linear function of vector x. The Bayes estimation problem associated with Equation (2.49) is to find an estimate f such that max p(f|g) (2.50) where p(f |g) is the density function of f given g. From Bayes' law, we have p ( f | g ) = K^M) ( 2 5 1 ) p(g) Maximizing the above equation requires that a priori probability density on the right-hand side be defined. M a x i m i m u m A Posteriori Estimate Under the assumption of Gaussian statistics for f and n, with covariance Rf and R n , the MAP estimate is the solution of minimizing the following function [6] lnp(f|g) = - | ( g - a{m})TR?(s - s{Hf}) - i ( f - ? f R f - 1 ( f - f ) + lnp(g) + ^ (2.52) where K is a constant factor and ? is the mean of f . The solution is { = f + R f H F D R ^ g - s{Ui}) (2.53) where D is a diagonal matrix defined as D ^ D i a g { ^ | x = 6,} (2.54) and bi are the elements of the vector b = Hf. Equation (2.53) is a nonlinear matrix equation for f, and since f appears on both sides, there is a feedback structure as well [42]. Analogies in the estimation of continuous waveforms have been derived for communication theory problems [91]. Chapter 2. BACKGROUND 31 Maximimum Likelihood Estimate Associated with the M A P estimate is the maximum likelihood (ML) estimate, which is derived by assuming that p(f|g) = p(g|f); that is, the vector f is a nonrandom quantity. Accordingly, Function (2.52) is reduced to lnp(f |g) = - i ( g - 5 { H f } ) T R i 1(g - s{Hf}) + Inp(g) + K (2.55) The solution of minimizing the above function is H T D R ; 1 ( g - a { H f } ) = 0 (2.56) That is g = s{m] (2.57) Chapter 3 D E F I N I T I O N S A N D N O T A T I O N S The field of neural networks has attracted many people from different disciplines. The diversity of their backgrounds is reflected in the variety of terminologies. Although some efforts are being made to address the terminology problem, a standard terminology is still yet to come [16]. For the sake of clarity and some other reasons stated later in this chapter, we give our definitions of neurons, neural networks, and other related concepts here. Definition 3.1 A neuron is a simple computing element with n inputs x\,x2, •••^n (n > 1) and one output y. It is characterized by n + 1 numbers, namely, its threshold t, and the weights w\, w2, ...,wn, where W{ is associated with X{. A neuron operates on a discrete time k = 1,2,3,4,..., its output at time k + 1 being determined by its inputs at time k according to the following rule The function of a neuron is to map points in a multi-dimensional space Xn to points in a one-dimensional space y, that is, n y{k + l) = fa(EwMk)-t) (3.1) where fa(.) is a monotonic function. fa-xn->y (3.2) From the definition, fa is a composite function fa = fi o f2 (3.3) 32 Chapter 3. DEFINITIONS AND NOTATIONS 33 Xi x,— xN Figure 3.1: The Schematical Representation of a Neuron where h-.n^y (3.4) and f2:Xn-*K (3.5) Function f2 is a linear function in the sense that /2(Ai£i + X2x2) = A1/2(x1) + A2/2(x2) VAx, A2 € K (3.6) A neuron is defined on the triple (Xn, y, f) where / is an element of the set J- of activation functions. Let's denote the linear activation function as L, the threshold-logic activation function as T, and the quasi-linear activation function as Q. A neuron is schematically represented as that shown in Figure 3.1, where A £ T. Definition 3.2 A neural network is a collection of neurons, each with the same time scale. Neurons are interconnected by splitting the output of any neuron into a number of lines and connecting some or all of these to the inputs of other neurons. A output may thus lead to any number of inputs, but an input may come from at most one output. A neuron is denoted as p. The i th neuron in a network is denoted as pi. The set of all the indices which specify the sequence of neurons is denoted as X. For a network of n neurons, T = { 1 , 2 , n } . Chapter 3. DEFINITIONS AND NOTATIONS 34 The output of a neuron is called the state of the neuron. The state of pi at time k is denoted as Si(k). In this thesis, we assume that the states of all the neurons in a network are updated at the same time instance. Such updating is known as synchronous updating [25]. At k = 0, the network is initialized with appropriate values, and the network then updates itself until it halts when certain criterion is met. The function of a neural network with N inputs and M outputs is to map points in a multi-dimensional space to points in another multi-dimensional space, that is, it performs the following function / : XN -• yM (3.7) The input of a network is denoted as x or x , and the output of a network is denoted as V or y . Neurons in a network are classified into three types: input neurons, output neurons, and hidden neurons. An input neuron is a neuron whose initial state is an element of the network input x; an output neuron is a neuron whose state at the time when the network halts is an element of the network output y; a hidden neuron is a neuron which does not belong to either one of the first two classes. A neuron can be both input and output neuron. The input neuron is somewhat special. For example, in a feedforward network, an input neuron has only one input line. If the input x € B, then a threshold-logic neuron can be used as the input neuron; if x € V, then a quasi-linear neuron can be used as the input neuron; if x € 72., then a linear neuron has to be used as the input neuron. In all the cases, a linear neuron can always be used as the input neuron. Therefore, for the sake of simplicity, input neurons are always linear neurons unless specified otherwise. An input neuron is schematically shown in Figure 3.2, which represents an linear neuron with t = 0 and w — 1. Chapter 3. DEFINITIONS AND NOTATIONS 35 0 Figure 3.2: The Schematical Representation of the Input Neuron 0 0 (a) (b) (c) (d) Figure 3.3: Three Ways of Setting Up an Input Neuron There are three ways to set up an input neuron: (1) 5(0) = x, s(k > 0) = 0; (2) s(k > 0) = x; and (3) s(k) = x(k). These three cases are shown in Figure 3.3. To represent either one of the three settings, an input neuron without the input line is used (see Figure 3.3.d). At time k = 0, the input neurons are loaded with the network input, and the rest are loaded usually with zeros. The computation time Tp of a network is the time between when the network is initialized and when the solution appears on the output neurons. Our definitions of neuron and neural network are not arbitrarily chosen. They provide a unified framework to describe many kinds of neural network models. For instance, an 2 x 2 x 1 back-propagation network can be implemented by our network as that shown in Figure 3.4, where S denotes the sigmoid activation function. At time k = 0, the input neurons are loaded with network input x, and the rest of the neurons with zeros. At k = 2, the state of the output neuron is the true output of the network—hence the computation time of this network is 2. If the input neurons are set up as in Figure 3.3.b, then s5(k > 2) = s5(k = 2). Feedback network can also be implemented by our networks. For instance, a four-neuron Hopfield network is implemented as shown in Figure 3.5. Note that here every Chapter 3. DEFINITIONS AND NOTATIONS 36 Figure 3.5: The Hopfield Network neuron is both the input and output neuron. After the network is initialized, it updates itself until s{(k = h + 1) = Si(h) ^ Si(h - 1) Vi <E J = {1,2,3,4}. The states of all the neurons at k > h are the true network output—hence, h is considered the computation time of this network. Another example is the implementation of an 2 x 2 x 2 Hamming network as shown in Figure 3.6. In Lippmann's original definition of Hamming network, there is a control problem that the lower subnet has to be removed after the upper subnet is initialized with the output of the lower subnet (see [58]). Since the network itself has no means to do so, an external mechanism has to be employed, for example, a set of synchronized switches between the lower subnet and upper subnet. In our networks, such a control problem is easily solved by setting the input neurons as shown in Figure 3.3.a. Chapter 3. DEFINITIONS AND NOTATIONS 37 Y i Y 2 Figure 3.6: The Hamming Network Neurons in our networks are usually numbered in an one-dimensional manner. Never-theless, they can be spatially arranged in a two or greater dimensional manner as shown in previous examples. In those cases, the neurons are numbered either from top-to-bottom and left-to-right (see Figure 3.4), or from left-to-right and bottom-to-top (see Figure 3.5 and Figure 3.6). There are two kinds of connections from pi to pj. One is from pi directly to pj, and the other is from pi to some other neurons and then to pj. The former is called direct connection, and the latter indirect connection. There are also two distinctive direct connections between two neurons, say, pi and pj, in a network. One is from the output of />,• to the input of pj, whose strength or weight is denoted as Wji. The other is from the output of pj to the input of pi, whose weight is denoted as Wij. The matrix W whose ijth element is is called the weight matrix of the network. A vector whose ith element is the threshold of pi is called threshold vector and is denoted as t. A vector whose ith element is the type of activation function of pi is called activation vector and is denoted as a. A neural network is fully defined by the triple (W,t ,a) , Chapter 3. DEFINITIONS AND NOTATIONS 38 which specifies what function the network performs. Neurons in a network are usually grouped in layers. A layer of neurons is a group of neurons each of which has no direct connections with others in the same group. Symbol Ch denotes the set of all neurons belong to layer h, and C denotes the set of all the possible Ch- From the definitions, we know that the following is true E\Ch\ = \I\ (3.8) h The communication time tc(i,j) from pi to pj is defined as the shortest period of time it takes for data in pi to reach pj. For example, in the Hamming network shown in Figure 3.6, tc(2, 3) = 1 and ic(2,5) = 2. If there is no connection between two neurons, then tc(i,j) = tc(j,i) = oo. The communication time ti(k,h) from layer k to layer h is defined as tt(k, h) = ma,x{tc(i,j) : i € Ck, j € Ch] (3.9) A matrix C whose ij th element is defined as 1 if wu ^ 0 3 T (3.10) 0 otherwise is called connectivity matrix. The architecture Ap of a network is defined by the connec-tivity matrix of the network. The configuration Cp of a network is defined by the triple (C,t,a). The architecture of a network specifies the structure of the network; there are two major classes of network architectures, namely, feedforward and feedback. The configuration of a network specifies what class of functions this network can perform. The connectivity matrix C of a network contains a great deal of information about the properties of the network. Fact 3.1 A neural network has a feedforward architecture iffcij A Cji = 0 Vz, j € X. Chapter 3. DEFINITIONS AND NOTATIONS 39 Fact 3.2 A neural network has a feedback architecture if3i,j 6 T such that c t J Ac,,- = 1. Fact 3.3 There is a direct connection between pi and pj iff c,j V Cji = 1 Fact 3.4 There is an indirect connection between pi and pj iff there exists a non-empty sequence (kt, k2,A;n) such that either cklj A ck2kl A. . . A cfcnfcn_1 A cikn = 1 or ckli A ckikl A ...Acknkn_1Acjkn = l. The neural networks of our definition are, mathematically speaking, a class of au-tomata networks [25]. Yet, our networks differ from the automata networks of the usual sense in that the state of our networks is inherently infinite. Moreover, our networks are different from that defined in the field of automata networks where neural networks are defined as systems of McCulloch-Pitts (threshold-logic) neurons (see [7, 25]). Note that such networks are only a subset of our networks. Nevertheless, analytical tools used in the field of automata networks are useful in the analysis of our networks. The analysis of our networks is certainly an interesting and challenging subject. How-ever, the main interest of this thesis is on the synthesis part, that is, how to map an function to a network such that the network can perform the function. In other words, we are interested in finding procedure(s) P such that P:A^JV (3.11) where A is the set of functions and J\f is the set of the triple (W,t, a). Chapter 4 REASONS FOR ALGEBRAIC DERIVATION The essence of neural network problem solving is that the network performs a function which is a solution to the problem. The function can be viewed as a computing rule which states how to compute a given input to get the correct output. This rule is coded in the network architecture, the weights, the thresholds, as well as the activation functions. A neural network is a collection of interconnected simple computing units, each of which can be viewed as a computational instruction. Thus constructing a network to solve a problem is equivalent to arranging these instructions such that a computing rule is formed which solves the problem. This is similar to programming in digital computers. There are basically two ways of "programming" a neural network: (1) analytically de-riving the network architecture and the network parameters based on a given computing rule; or (2) assuming a network architecture and adjusting network parameters until a proper computing rule is formed. Adjusting network parameters in such a meaningful way is referred to as learning or training in the field of neural networks [94]. Let us refer to the first approach as analytical approach, and the second as learning approach. There are two typical network models which are representatives of the two approachs respectively. One is the Hamming network [59], which represents the former approach. The other is the back-propagation network [80], which represents the latter approach. The learning approach is necessary when no computing rule is available or at least not completely available. The role of learning here is to find a computing rule, which solves 40 Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 41 the given problem, based upon partial solution of the problem. When the computing rule is completely known, the learning approach is still useful in the sense that here the role of learning is to automate the process of coding the computing rule. However, there are some practical problems associated with the learning approach, one of which is that it usually takes a very long period of time to train a network. This problem is caused by several factors which shall be discussed in the next section. When the computing rule is known, it is sensible to utilize it in constructing neural networks. The analytical approach ensures proper network architectures and proper network parameters. However, the existing network models designed using the analytical approach are only capable of solving a narrow range of problems. Moreover, these network models are more or less hand crafted, heavily based upon the imagination and intuition of those who developed these models. There is still no systematic method of network design. The objective of this thesis is to develop such a method. In the following sections, we shall demonstrate the drawbacks of the learning approach with the example of training the back-propagation network to solve the parity problem; we shall also describe the analytical approach in more details with examples of designing the Hamming network, as well as designing a network model to solve the parity problem. We then discuss the current situation of the analytical approach, and propose our method. 4.1 Drawbacks of the Learning Approach The back-propagation network is the most popular network model used in neural network applications because it can be trained to implement any function to any specified degree of accuracy [30]. However, there are some practical problems with its learning process. The first problem is that learning usually takes a long period of time even for a small Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 42 scale problem. One cause of this problem is due to the back-propagation algorithm itself. Although many improved versions have been developed [69, 28, 44], the fastest version is still time consuming. Another cause is that it is difficult to know in advance how many neurons in each layer are required. These "magic" numbers have to be found, at least for the time being, through trial-and-error. This means that after days, even weeks of training, one has to start the training process all over again if the number of neurons in the network are not enough. The second problem is that after training a network to implement a certain function, one rarely gains any insight that would help in training another network to implement a similar function, e.g., a function which is in the same family as the previous function. The nature of image enhancement and restoration problems makes the first problem even more severe. Taking the example of implementing a median filter with the window size of three (the simplest case), assuming that each pixel takes integer values ranging from 0 to 255, then there are 2563 = 16,777,216 possible input patterns. In order to achieve zero error rate, the training set may have to consist of all the input patterns and their corresponding correct outputs. Such a training set is astronomically enormous, needless to say for the cases where the window size is bigger. To underscore the com-plexity of learning, suppose we have the means to have neural networks learn one pattern every 1 ms, then to implement the above mentioned median filter, it would take a net-work about 5 hours to learn all the patterns. However, with the same learning rate, to implement a median filter with a window size of five, it would take a network about 35 years to learning all the possible patterns. The first problem is well documented in the literature [92, 43, 81], hence we shall only illustrate the second problem here through the example of training the back-propagation network to solve the parity problem. This problem is impossible for the Perceptron to solve [65], but is solvable by the back-propagation network [80], which is one of the Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 43 (a) (b) (c) Figure 4.1: Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem. reasons for the fame of this network model. We shall start with the XOR problem, which is a special case of the parity problem when the problem size n = 2, and continue with cases when n — 3 and n — 4. Example 4.1: The Parity Problem. The truth table of the Exclusive OR (XOR) is shown in Table 4.1. The network architecture we have chosen is shown in Figure 4.1. The generalized delta rule (see Appendix A) is used to train the network. All the patterns in Table 4.1 are included in the training set. The result after the network has learnt how to solve the XOR problem, i.e., for every input pattern its output is the same as that in Table 4.1, is shown in Table 4.2. Similarly, we carried out the training for n = 3 and n = 4. The network architectures for solving these problems are also shown in Figure 4.1. The network parameters after training are shown in Table 4.3 and Table 4.4 respectively. The number which has pj to Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 44 Table 4.1: The Exclusive OR input output Xi x2 z 0 0 0 0 1 1 1 0 1 1 1 0 Table 4.2: Network Parameters For Solving XOR Problem Pi Pi P3 4.146451 -2.968075 P4 1.379474 -2.464423 h U -1.889660 0.380817 P3 P4 Ps -3.351054 2.622306 h -1.845504 the top and pi to the left is the value of the weight of the connection from pj to />,-. The number which has to the top is the value of the threshold of neuron pi. By looking at the three tables of network parameters, it is difficult to generalize what the parameters of the network will be for cases where the problem size is five or larger. Consequently, one has to start from scratch to train networks for these cases. The number of learning steps required for the network to have learnt how to solve the parity problem is shown on Table 4.5 for n = 2,3,4,5 and rj = 0.10,0.15,0.20,0.25, where r] is the gain term of the generalized delta rule (see Appendix A). Here, the learning step is defined as the interval during which all the network parameters are being updated Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 45 Table 4.3: Network Parameters For Solving Parity-3 Problem Pi P2 P3 P4 -4.396586 5.399827 -3.631473 P5 -1.965704 2.695202 -0.726498 Pe 1.025570 -0.856012 2.323546 U h U -1.486861 -1.788286 -0.119762 P5 Pe Pi -4.192873 4.991574 -2.708791 t? 0.151018 Table 4.4: Network Parameters For Solving Parity-4 Problem Pi P2 P3 P4 Ps -4.865602 -6.202526 5.522527 -6.278468 Pe -3.045859 4.738367 -4.957426 4.048179 Pr -6.845311 4.469164 -2.124128 4.499167 Ps -3.836795 -2.465528 2.714290 -2.221879 te -3.129540 3.778817 2.953945 -3.396108 Ps Pe P7 P9 -4.868719 5.105030 -5.718770 6.778777 1.817817 Chapter 4. REASONS FOR ALGEBRAIC DERIVATION Table 4.5: Number of Learning Steps 46 n 7/ = 0.10 n = .15 n = 0.20 7/ = 0.25 2 9830 6485 4885 4003 3 30652 20581 15490 12527 4 94317 80384 69852 75428 5 170604 120640 89585 67585 once. The learning step is assumed constant, which implies that the updating is done in parallel. Note that the number of learning steps increases in a manner that it soon gets very large as the problem size increases. As a matter of fact, the so-called loading problem (see [48]), which concerns how long it takes a neural network to have learnt every pattern in the training set with the respect to the set size, turns out to be NP-complete [47, 48]. Moreover, for the back-propagation network to solve the parity problem, the training set may have to include every possible pattern, which means that the size of the training set is two to the power of the problem size, i.e., the size of the training set increases exponentially as the problem size increases. Therefore, training a network for the cases of larger problem size not only means starting all over again, but may be infeasible when the problem size is fairly large. Finally, the network architectures shown in Figure 4.1 were chosen because we knew they work from experiments done by other researchers [80]. Otherwise, extra amount of work would have had to be done on finding the right network architectures. 4.2 The Advantages of The Analytical Approach The analytical approach is demonstrated with examples of designing the Hamming net-work, as well as designing a network model to solve the parity problem. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 47 4.2.1 Designing the Hamming Network A classic problem in communication theory is that given a set of binary patterns, called exemplars, for an given binary pattern, how to determine which exemplar is the closest to the input pattern. This problem occurs when binary fixed-length signals are sent through a memoryless binary symmetric channel. The optimum minimum error classifier in this case calculates the Hamming distance to the exemplar for each class and selects that class with the minimum Hamming distance [23]. The Hamming distance is the number of bits in the input which do not match the corresponding exemplar bits. The Hamming network developed by Lippmann [58] implements this algorithm. Let us denote the input pattern and the exemplar as v and e respectively. Each element of both vectors are binary and belong to the set {+1,-1}. Suppose there are M exemplars {ei, e2,e^}, then the Hamming distance between an input pattern and the i th exemplar is d^^N-v-Z) (4.1) where N is the number of elements in the vector and v • e,- is the inner product of two vectors. Since minimizing d{ is equivalent to maximizing 6, = N — di = |(A/ + v • e;), the optimum minimum error algorithm can be restated as follows: 1. Calculate 6,- Vt € { 1 , 2 , A f } ; 2. Find the index k such that bk = max{6j : i £ { 1 , 2 , M } } ; 3. Output the exemplar vector eV To implement this algorithm, the Hamming network is structured in two subnets: the lower subnet calculates bi (the i th matching score) and the upper subnet finds the index k. Note that the Hamming network does not implement the third step. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 48 In the lower subnet, one neuron is enough to calculate each fe, by setting the weights to vo'- = ^e\, the threshold to 0j — — y , and the activation function to u if a > u a if u> a > 0 (4.2) 0 otherwise It is assumed that this neuron operates in its linear range, that is, it is essentially a linear neuron. The lower subnet consists of M such neurons. For the upper subnet, Lippmann used a feedback network known as lateral inhibitive network or winner-take-all network, which mimics the heavy use of lateral inhibition evident in the biological neural networks of the brain [49]. The weights in this subnet are set as follows w kl = < 1 if k = I (4.3) -e iik^l where 0 < e < All thresholds are set to zero. Neurons in this subnet have the same activation functions as in (4.2) and also operate in the linear range. After this subnet converges, only one neuron's output is positive and the rest are zeros. The one with none-zero output corresponds to the exemplar which has the minimum Hamming distance to the input pattern. 4.2.2 Designing the Parity Network Now let us use the analytical approach to construct a neural network which solves the parity problem. A function which solves this problem is such that it outputs 1 if there are odd number of l's in the input pattern; 0 otherwise. This function is the so called parity function. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 49 Let A 1 if E ? = i * ; > « V i e { i , 2 , . MJV} (4.4) 0 otherwise Then the following function N N y = E(-i) ,'~ 1* = Efc* (4.5) i=i t=i is equivalent to the parity function in the sense that the input-output behaviours are the same. Zi can be realized by a threshold-logic neuron shown in Figure 4.2, and y can also be realised by a threshold-logic neuron shown in Figure 4.3. Therefore, the whole network can be built as that shown in Figure 4.4. Xi •N Figure 4.2: Realizing Function (4.4) Zi y N Figure 4.3: Realizing Function (4.5) Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 50 y Figure 4.4: The Network for Solving the Parity Problem As shown by this and previous examples, one can easily derive a network using the an-alytical approach. Note that both the network architectures and the network parameters are simultaneously derived. The advantages offered by the analytical approach are as follows: (1) no training is needed; (2) the network architecture derived is proper if not optimal; and (3) complete understanding of the role of every individual neuron. Although a lot can be learnt from the work of researchers such as Hopfield, Koho-nen and Lippmann on the constructing processes of their network models, they have not yet developed a systematical method of network design. They relied mainly on their imagination, intuition, as well as inspiration from their knowledge of the nervous sys-tem. However, to effectively use neural networks as computing machines, a systematical method of network design is very much needed. 4.3 The Algebraic Derivation Methodology Here we propose a methodology of network design, which consists of the following five stages: 1. Find the minimum set of neuron models for a given class of functions; Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 51 2. Devise symbolic representations for these neurons and their networks; 3. Establish theorems for manipulating these symbols based on the computational properties of the neurons and their networks; 4. Establish procedures for deriving neural networks from functions; 5. Use these procedures and theorems to derive and simplify network models for spec-ified functions. During our study on network design, we found that it is very useful to symbolically represent these units and their networks. These symbols can then be manipulated to yield proper network models. Because symbol manipulation is an important feature of our methodology, we call it. algebraic derivation of neural networks. The approach of symbolically representing computing units and manipulating these symbols was used before in [57]. However, they were only concerned with designing networks of threshold-logic neurons to realize logical functions. Here, we are concerned with designing networks of diverse neuron types to realize more complex functions. Chapter 5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T W O R K S Our task is to map functions or algorithms to neural networks. As required by the algebraic derivation methodology, the first step is to determine a minimum set of neuron models; the second step is to devise symbolic representations for the chosen neuron models and their networks. These two steps are completed in this chapter. 5.1 Neuron Models Since we restrict the choice to the class of neuron models defined in Chapter Three, the task in the first step is reduced to deciding what types of activation functions to use. Because of practical restrictions, these functions have to be as simple as possible. Many techniques in image enhancement and restoration involve a weighted summa-tion of neighboring pixels and passing the sum to a function to determine the value of processed pixel (see Section 2.2). This function is usually piecewise. An example is the function used in the contrast stretching technique, which is shown in Figure 5.1. Denote the contrast stretching function as fc{%)- We can write another function as f(x) = afQ(x) + p?fQ(x -a) + 7/o(x - b) (5.1) where functions /Q(.) are shown in Figure 5.1 as dotted fines. By proper choice of coefficients, we can ensure that f(x) = fc(x) Vxe[0,c] (5.2) 52 Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS53 y o d x \(2) Figure 5.1: The Contrast Stretching Function Therefore, we can rewrite many functions used in image processing as sums of several quasi-linear functions. Hence, the quasi-linear activation function is chosen. where fh is a linear function. Linear activation function is also needed. A lot of logic operations are employed in image processing techniques, therefore, the threshold-logic activation function is also required. Since functions used in image processing techniques can be expressed as compositions of these three functions, we choose them to form the minimum set of activation functions. Because neither one of these functions can be realized by combinations of the other two, this set is indeed minimum. The neuron models we have just chosen are linear, quasi-linear, and threshold-logic neurons, which are referred to as LQT neurons. They are the simplest among all the neuron models proposed in the literature, hence should have minimum difficulty in their hardware implementations. In Chapter Seven, a formal proof is given which shows that any image processing technique can be realized by networks of these neurons. Note that f(x) = / i O (fQ{x)JQ(x - a),fQ(x - b)) (5.3) Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS54 For the convenience of algebraic derivation, symbols are used to represent LQT neu-rons. The linear neuron is denoted as y =C xi,x2,...,xn | w-l,w2,...,wn D t (5.4) The quasi-linear neuron is denoted as y =< x1,x2,...,xn | w1,w2,...,wn >* (5.5) The threshold-logic neuron is denoted as y =C xux2,...,xn \w1,w2,...,wnZl t (5.6) Where xt- represents the i th input to the neuron, W{ the weight associated with x,-, and t the threshold of the neuron. To represent either one of LQT neurons, the following notation is used. V = {|xi,x2,...,xn \w1,w2,...,wn\} t (5.7) If Wi = 1 Vi € { 1 , 2 , n } , then the neuron is denoted as y = {|£i,X2, ...,£„!}' (5-8) Sometimes, it is more convenient to use the following vector representations y = {\x\ w\Y (5.9) or r/ = {|x|w|r (5.10) where x = x. — (xi,x2,xn) and w — w = (w\, w2,wn). Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS55 5.2 LQT Networks A network which is composed of LQT neurons is called a LQT network. Al l the networks derived in this thesis are such networks. Three typical topologies of LQT networks are illustrated in Figure 5.2. Both networks in Figure 5.2.a and Figure 5.2.b are feedforward, and the network in Figure 5.2c is feedback. Note that the network shown in Figure 5.2.b is different from conventional feedforward neural networks in that some neurons in the third layer take inputs from the input neurons. y,(k) y,(k) y3(k) y4(k) (a) (b) (c) Figure 5.2: Typical LQT Network Topologies The notation of a network is similar to that of a neuron. The network input is denoted by x and the network output is denoted by y. The output of the i th neuron is denoted as Hi. Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS56 The notation of the network shown in Figure 5.2.a is V = C < C x i , x 2 I w3Uw32Zit3, LZ x 1 ? x 2 | w41,w42-Jiu | w53, w54 >'5, Note that the input neurons are not explicitly expressed here. When all the variables in the notation are independent, the notation represents a whole network. Part of the network can also be represented, such as y = C Ps,pe | w 7 5 , w76Dt7 (5-12) which represents the output neuron, and y =C< nz, PA I ™53, wb4> ts, H I ™75, w76D t7 (5.13) which represents a part of the network as shown in dashed lines in Figure 5.3. The Figure 5.3: Representing Part of a Network notation of the network shown in Figure 5.2.b is y = C < Xi , L Z X i , X 2 | ltf31,tU 3 2Z]' 3 ,IZ X i , X 2 | W41,W42Z\U | U>51,U>53, ™54 ><5, < L Z x i , x 2 | w31,w32Z} t3,n x i , x 2 | w41, wA2-3u,x2 \ w63, w64, w62 >'6 (5.14) | ^ 75,^76 D t7 Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS57 Another way of representing a whole network is fi5 = < xufi3,p4\w51,w53,w54 >ts = < M3,M4,X2 | W63, ^ 6 4 , W 6 2 >'6 (5.15) ^3 = C XUX2 IW31, w32n t3 Hi = (Z X i ,X 2 I W 4i,l0 4 2]* 4 where symbol { represents one layer of neurons. Note that input neurons are not explicitly expressed in this form either. The form used in equations (5.11) and (5.14) is called compact form, and the form used in (5.15) is called layered form. The layered form is easier to comprehend. Sometimes, a neuron can be grouped to either one of several layers. An example is shown in Figure 5.4, where the neuron with threshold t7 can be grouped either to the second or to the third layer. To avoid ambiguity, in this thesis such neurons are always grouped to the layer closest to the output. In this case, the neuron with the threshold t7 is grouped to the third layer, that is, the network's layered form representation is y = C p5,fie,^7 \wS5,w8e,w87 D ts ^5 = rz p3,p41 w53,w54zi Me = IZ fl3,P4 | ^ 6 3 , W64Z1 - rz x2 \ w723 t7 M3 = \ZXl7X2\ 1031,1032 • / i 4 = rz xx,x2 \ w4l, w42]u (5.16) 1*3 The symbolic representation of a feedback network is different from the representation of a feedforward network. Since feedback networks are dynamic in the sense they have to iterate many times, an indication of the iterations has to be included in the representation. Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS58 (5.17) Figure 5.4: A LQT network The representation of the feedback network shown in Figure 5.2.c is yt(h + l) = LZ yi(h),y2(h),y3(h),y4(h) | wu,w12, i ^ , ^ Z)*1 y2(h + l) = LZ yi{h),y2(h),y3(h),y4(h) \wn,w22,w23,w24Zlt2 y3(h+l) = H yi(h),y2(h),y3(h),y4(h) \w31,w32,w33,w34Z}t3 y4(h + l) = LZ yi(h), y2(h), y3(h), y4(h) \ w41, w42, w43, w44 Z2U where yi(h) is the i th element of the network output at the h th iteration. Vector y(0) = (yi(0), 2/2(0), 2/3(0), y4(0))T are the network input. Note that neurons in this network are both input and output neurons. Another representation of the above feedback network is as follows where and y(h + l) = L Z y ^ l W z 1 y(h) = (y1(h)1y2(h)Mh)Mh)? t — (ti, t2, t3, t4)T and W is the matrix composed of u^'s. (5.18) (5.19) (5.20) Chapter 6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S As required in the third step of the algebraic derivation methodology, this chapter es-tablishes some theorems based on the computational properties of LQT networks. These theorems are useful in simplifying and evaluating network realizations. Also, we shall discuss some factors which affect the quality of a LQT network, and give a criterion for network simplification. 6.1 Network Equivalence Definition 6.1 If a function f(x) and a Network NA satisfy the following condition where X is the definition domain of f, then f is said to be realized by NA, or NA is said to be the network realization of f. A well defined LQT network can only realize one function; but a function can be realized by many different networks, thus comes the concept of network equivalence. Definition 6.2 // Network A and Network B satisfy the following condition where X is the definition domain of both NA and NB, then Network A and Network B are said to be equivalent. f(x) = NA{x) VxeX (6.1) NA(x) = NB(x) VX<EX (6.2) 59 Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 60 The following theorems concern the property of network equivalence. Theorems 6.1 to 6.8 concern single neuron, or networks of single neuron. The rest in this section concern networks of multiple neurons. Theorem 6.1 If NA is VA = O l , Z 2, Zn | WX, W2, U>„|}* (6.3) and Ng is VB = ^zil,zia,...,Zin | wil,Wi9,...,win^t (6.4) where sequence {z^, z , - 2 , Z i n } and {w^, W{2, ...,u;,„} are permutations of {z\,z2,...,zn} and {wi, w 2 , w n } respectively, then NA and NB are equivalent. Theorem 6.2 Let NA be VA = {\zuz2,...,zn | w1,w2,...,wn§ t (6.5) If W{ = 0, then NA is equivalent to VB = {|2i,...,*t-i,*t+i,-,*n | «?!,...,u;.-_i,«;,-+!,.„,U7„|}* (6.6) Theorem 6.3 Let NA be VA = {\zi,z2,...,zn | wuw2,...,wn$* (6.7) If Zi == a, where a is a constant, then NA is equivalent to NB = {\z1,...,zi_i,zi+1,...,zn |iwi,...,u7,-_i,u;i +i,...,«; n|} t~ 0 ," i (6.8) Theorem 6.4 Let NA be VA = l > i , z 2 , z n | wx,w2,wn\Y (6.9) Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 61 If Zi = Zj and i < j, then NA is equivalent to NB = { | * i,...,Zi,...,Zj_i,Zj + i , . . . , z n |ti^,...,tx>,- + u;j,...,u;J-_i,u;,-+i,...,u;n|}* (6.10) or NC = {|zi,...,Z,-_i,2,-+i,...,Zj,...,2„ | +«»,•,...,U>„|}' (6.11) Theorem 6.5 / /TV^ is y>i = C z I tu • * (6.12) and NB is yB=Cz\ 0w Z\pt (6.13) where P > 0, then NA and NB are equivalent. Theorem 6.6 Let NA be VA = C ZX, Z2, *n I wi, w2,u)„D( (6-14) and NB be VB = < Zx,Z2,...,Zn | V)i,W2,...,Wn > ' . (6.15) If the output of NA is always greater than or equal to zero, then NA and NB are equivalent. Theorem 6.7 Let NA be yA = C zx,z2,...,zn | wx,w2,...,wn^ (6.16) and NB be yB = C z i , z 2 , . . . , z n | Wi,w2,...,wnZi+s (6.17) where 1 > <5 > 0. If the output of NA is either zero or one, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 62 Theorem 6.8 Let NA be VA =< zx,z2,...,zn | wi,w2,...,wn >* (6.18) and NB be VB =C * I , z 2 , z n | w-i, w2,u)Jt+< (6.19) where 1 > 6 > 0. If the output of NA is either zero or one, then NA and NB are equivalent. Theorem 6.9 Let NA be yA = {W,z2,...,znf (6.20) and NB be VB = { k i , ••, Z i - i , C Zi, ...,zi+m D'Szi+m+i, ••-,Zn|}<2 (6.21) where i G {1 ,2 , ra} anc? n > i + m > 1. Ift = t1+t2, then NA and NB are equivalent. Theorem 6.10 let NA be {\zx,z2,...,zn \w1,w2,...,wn\} t (6.22) and NB be {| C zx | ^ 3 " , C 2 2 |u>2 3 < 2 , . . . ,C z n | u>„ D ( " l }^ 1 (6.23) IfYJi=\ti = t, then NA and NB are equivalent. Theorem 6.11 Let NA be yA = {iCz\wD t* |a|}<2 (6.24) and NB be yB = {\z\aw\y (6.25) Ift — at. +12, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS Theorem 6.12 If NA is yA = {\Cza,zbDt> |a|}<2 and NB is yB = { \ C z a , z b \ a , a D a h |}<2 then NA and NB are equivalent. Theorem 6.13 Let NA be yA = {\<za,zb>ti |a|}<2 and NB be yB = {\<za,zb\ct,a>at> |}<2 lfa>0, then NA and NB are equivalent. Theorem 6.14 If NA is yA = C {\zx,z2,...,zn \wx,w2,...,wn§tl Dh and NB is VB = {\z1,Z2,...,Zn | iyi,W 2,..., Wn^ where t = ti + t2, then NA and NB are equivalent. Theorem 6.15 Let NA be yA = « zx,z2, ...,zn | wuw2, ...,wn >tl>'2 and NB be yB =< Zi, z 2 , - . , zn | w1,w2,wn >* If t = tx + t2 and t2 > 0, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 64 The theorems mentioned so far can be easily verified by writing out the functions realized by both networks NA and NB, and checking if the functions are equivalent. Example 6.1 Prove Theorem 6.15 is true. SOLUTION The function realized by NA is yA = fd(fQ(zw-t1)-t2) (6.34) If zw — ta > 0, then VA = IQ(ZW - h - t2) (6.35) otherwise, VA = 0 (6.36) This implies VA = VB = IQ(ZW - t) (6.37) which is the function realized by NB- Therefore, NA = NB- tt 6.2 Network Quality Since a function can be realized by an infinite number of LQT networks, a criterion is needed for us to choose a proper network among all its equivalents. Obviously, this choice is based on the quality of the networks. There are several factors which affect the quality of a network. They are introduced in the following sections. 6.2.1 Network Depth The depth of a LQT network is defined as the longest path from the input neurons to the output neurons. For the sake of simplicity, it is assumed that all the LQT neurons have the same processing time, which is taken as one time unit. For a feedforward network, its depth equals its computation time. For a feedback network, its computation time Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 65 depends not only on the convergence property of the algorithm it implements but also on its depth. Since our objective of using neural networks is to achieve high computation speed, the depth of a network is thus an important factor in choosing one network over another. For a feedforward network, its depth also corresponds to the number of layers it has. Denote the depth of a network NA as D(NA) and the number of layers as L(NA), the following relationship exists 6.2.2 Network Size Neural networks eventually have to be implemented in hardware, hence we should be aware of their size. It is natural to think the size of a network in terms of the number of neurons in the network. However, since a neuron is allowed to have indefinite number of inputs, it is more proper to measure the size of a network in terms of the number of connections. Moreover, in VLSI implementation, it is the connections which occupy most of the space on a chip. The number of connections in a neural network equals the number of weights. Since it is easier to count the number of weights in the symbolic representation of a LQT network, we define the network size as the number of weights in the network. Let us denote the number of neurons in network NA as N(NA) and its network size as S(NA). It is clear that D(NA) = L(NA) - 1. (6.38) N(NA) < S(NA). (6.39) Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 66 6.3 Criterion for Network Simplification Both network depth and network size have to be considered when choosing a network over its equivalents. In this thesis, the following simple criterion is used: Given two networks NA and NB which realize the same function, if D(NA) < D(NB) and S(NA) < S(NB), or if D(NA) = D(NB) and S(NA) < S(NB), then NA is chosen over NB, and NA is said to be better than NB- The process of choosing a network over another is referred to as network simplification. The theorems in Section 6.1 are developed for such purpose. Obviously, a network with the shortest depth and the minimum size is the best among all the equivalent networks. Unfortunately, such a case is rare. The network with the shortest depth often does not have the minimum size. How to effectively balance these two factors is still an on-going research. In the following chapters, we only give upper bounds on network depth and size for some functions. There are also some other factors which affect the quality of a network. For example, since there are inevitably some noise in any hardware implementation, how tolerant a network is to noise is certainly a good measure of network quality. However, due to time constraints, these factors are not considered in this thesis. Chapter 7 D E R I V A T I O N P R O C E D U R E S This chapter provides procedures of deriving LQT networks to realize functions. The usage of these procedures is demonstrated through some examples, which in turn are useful in deriving networks in later chapters. A function which can be realized by a LQT network is referred to as a realizable function; the network which realizes the function is referred to as the network realization of the function. A function which has a known network realization is called a realized function. There are four classes of functions: single-input single-output (SISO) functions, single-input multiple-output (SIMO) functions, multiple-input single-output (MISO) functions, and multiple-input multiple-output (MIMO) functions. We shall only describe procedures for realizing SISO and MISO functions since realizing SIMO or MIMO functions is just multiple applications of these procedures. From the definitions of LQT neurons, the following statements are true. Fact 7.1 The network realization for the following function n (7.1) is y = C x1,x2,...,xn | w1,w2,...,wn D* (7.2) 67 Chapter 7. DERIVATION PROCEDURES 68 Fact 7.2 The network realization for the following function E ? = i W{Xi - t if J2i=1 w{Xi > t 0 otherwise (7.3) is y =< x x , x 2 , . . . , x n \wuw2,...,wn > ' (7.4) Fact 7.3 The network realization for the following function \ i ifU=i^i>t ,7_, y = \ (7-5) [ 0 otherwise is y =n x 1 , x 2 , ...,xn | wt,w2, ...,wn • * (7.6) Now that we have established network realizations for the above three simple func-tions, we can work on network realization of more complex functions. 7.1 Realization of SISO Functions The procedure of deriving a network to realize a SISO function / is as follows: step 1. Call decompose(/); step 2. Compose all network realizations to form a network realization of / . step 3. Apply theorems in Chapter Six to simplify the resulting network. In the above procedure, decompose(/) is a recursive subroutine: decompose(/): 1. Decompose the function / as / = fo 0 ( / 1 , / 2 , - , / n ) Chapter 7. DERIVATION PROCEDURES 6 9 such that f0 has a known network realization; 2. For all i € { 0 , 1 , 2 , n } , if fi is a realized function, replace it with its network realization; else, call decompose(/,) 3. Return. The derivation procedure may be viewed as consisting of two stages: step one is the analysis stage, and steps two and three are the synthesis stage. The use of this procedure is demonstrated through the following examples. Functions in these examples are real unless otherwise specified. Example 7.1.1 Given a function F whose output is either 0 or 1 , find a network realization of its complement F. S O L U T I O N Since F = l - F ( 7 . 7 ) according to Fact 7 . 1 , the network realization of F is F = c F | - I D " 1 ( 7 . 8 ) t) Example 7.1.2 Find a network realization of the following function a if x > t 0 otherwise ( 7 . 9 ) S O L U T I O N Let 1 iix>t 0 otherwise ( 7 . 1 0 ) then V = a / i ( * ) ( 7 . 1 1 ) Chapter 7. DERIVATION PROCEDURES 70 which can be decomposed to 2/ = / o ° / i (7.12) where / 0 = a fa. F rom Fact 7.3 and Fact 7.1, we know both functions / 0 and / i have known network realizations as fo=Cfi\aD° (7.13) and fi(x) =C x3* (7.14) Therefore, the network realization of function (7.9) is y = C C xZi* | a D° (7.15) Example 7.1.3 F i n d a network realization of the following function S O L U T I O N Let y = s a if t2 > x > ti 0 otherwise a if x > t\ 0 otherwise (7.16) (7-17) and a if x > t2 0 otherwise then which can be decomposed to y(x) = fi(x)-f2(x) y = fo° (/i,/2) (7.18) (7.19) (7.20) Chapter 7. DERIVATION PROCEDURES 71 where / 0 = fi — f%. From Fact 7.1, fo can be realized by a linear neuron, / o = C / 1 , / 2 | l , - l D ° (7.21) From previous example, both fx and / 2 can be realized as h{x) = C C x3h | oc D° (7.22) and / 2(x) = C C x3t2 \ a D° (7.23) Therefore, the network realization of y(x) is y(x) = C C C xl]h | a D°, C C xZ)'2 | a D° | 1, - 1 D° (7.24) Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to y(x) =CC€1 xii* 1 | a D ° , c n xZi t2 | - a D°D° (7.25) Applying Theorem 6.10 again, this network can be further simplified to y(x) =CO xZiu,\Zx-3t2 | a , - a D° (7.26) tt E x a m p l e 7 .1 .4 Find a network realization of the following function a if x > t 0 otherwise (7.27) SOLUTION Function y(x) can be decomposed to y = foof1{x) (7.28) where fo = + <x (7.29) Chapter 7. DERIVATION PROCEDURES and / i ( x ) = 1 if -x > -t 0 otherwise F r o m Fact 7.1, we know that F r o m Fact 7.3, we know that fo =C fi | - a D' fi(x) = C x | - 1 • -t Substituting this to (7.31), we have y(x) = f0(x) = C C x\ - 1 | - a D~a Example 7.1.5 F i n d a network realization of the following function S O L U T I O N Let and then a if x = t 0 otherwise A W = { a if x > t 0 otherwise a if x > t 0 otherwise y(x) = /i(x) - / 2 ( x ) Chapter 7. DERIVATION PROCEDURES 73 which can be decomposed to V = / o ° ( / i , / 2 ) where fo = fi — / 2 , whose network realization according to Fact 7.1 is / o = C / i , / 2 | l , - l D° (7.38) (7.39) We know that both /i(x) and /2(x) have network realizations (see Example 7.1.2 and Example 7.1.4) as / i (x) =CC x •* |oO° (7.40) and / 2 (x) =CC x | - 1 | - a D~a (7.41) Substituting (7.40) and (7.41) to (7.39), we have y(x) = /o(x) = C C C x •* | a D°, C C x | - 1 | - a D " a | 1, - 1 D° (7.42) Applying Theorem 6.10 and Theorem 6.11, we can simplify this network to y(x) = C C x C x | - 1 | a,a Da . (7.43) It can be rather tedious to follow the procedure explicitly, from now on we only follow it implicitly, and put more emphasis on the simplification process. Example 7.1.6 If function (7.34) is an integer function, find its network realization. S O L U T I O N Let a if x > t 0 otherwise (7.44) and a if x > t + 1 0 otherwise (7.45) Chapter 7. DERIVATION PROCEDURES 74 then whose network realization, according to Fact 7.1, is y=<Zfi(x\f2(x)\l,-l D° Since, according to example 7.1.2, fi(x) = C C x • * | o O ° and / , ( I ) = C C I : ' + 1 \aD° substituting these two equation to equation (7.47) yields 2 / = C C C x = ] < | a D ° , C C x Zit+1 | a D ° [ 1 , - 1 D ° Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to y = C C x C x 3 t + 1 | a , - a D ° (7.46) (7.47) (7.48) (7.49) (7.50) (7.51) Example 7.1.7 Find a network realization of the following function Oil x = t\ ct2 x = t2 y = \ S O L U T I O N Let Mx) Ctji X — tfi a, if x = t{ 0 otherwise (7.52) (7.53) Chapter 7. DERIVATION PROCEDURES 75 where i £ { 1 , 2 , n } , then n (7.54) t'=l Recall Fact 7.1 and Example 7.1.5, we know that V = C / l ( x ) , / 2 ( x ) , . . . , / n ( x ) D° (7.55) and (7.56) Substitute (7.56) to (7.55), we have y = C C C X D ' ^ I Z x | - I D - * 1 I a i , a : i D 0 1 , . . C C xD*",IZ x I - I a n , a n 3 Q " D ° (7.57) Applying Theorem 6.10, we can simplify this network to y = C C xi]* 1 , C x I - l D _ t , , . . . , C xD*",IZ x | - 1ZT*" | o ^ a i , . . . , a B , a B D a (7.58) where a = 52 "=1 a t - tt This example shows that any single variable finite function can be realized by a LQT network with a depth at most of 2 and a size at most of n. It leads to the following Theorem 7.1 Any single variable finite function is realizable. Example 7.1.8 If function (7.52) is an integer function, find its network realization. S O L U T I O N Let theorem. a; if x = ti (7.59) 0 otherwise Chapter 7. DERIVATION PROCEDURES 76 where i € {1,2, ...,n}. It can be shown that Recall Fact 7.1, and Example 7.1.6, we know that y = C / i ( x ) , / 2 ( x ) , . . . , / n ( x ) D° (7.60) and fi(x) = C C x • x • < i + 1 | a,-, - a ; D° (7.61) (7.62) Substituting equation (7.62) to equation (7.61) yields y = C C C x Z) 1 1 , C x Z)11"1"1 l a ^ - a ! D " , c r Z x D ' 2 , r Z x D t 2 + 1 | a 2 , - a 2 3 u, CIZ x rz x | ctn,-ctn D°D° which can be simplified to (7.63) y = crz x z i ^ r z x z i ' i + 1 , r z x z i < 2 , r z x z i < 2 + 1 , r z xz\ tn, rz xz\ tn+1 (7.64) If = + 1, then the above network realization can be further simplified to y =CIZ xD '^rZ xD t 2,...,IZ xD*",rZ x • < n + 1 | a i , a 2 - <*i , . . . , a„ - a „ _ i , - a „ D° (7.65) t Example 7.1.9 Find a network realization of the absolute function y = |x| (7.66) S O L U T I O N Let x if x > 0 0 otherwise (7.67) Chapter 7. DERIVATION PROCEDURES anc —x if —x > 0 0 otherwise then y(x) = fi(x) + f2(x) According to Fact 7.1 and Fact 7.3, the network realization is y = C < x > ° , < x | - 1 > ° D ° Example 7.1.10 Find a network realization of the following function ax + b if x > t 0 otherwise S O L U T I O N Let and f2(x) = { -t iix>t otherwise 1 if x > t 0 otherwise then y(x) = afi(x) + (at + 6)/2(x) Recall Fact 7.1, Fact 7.2, and Fact 7.3, we have y = C < x >*, C x \a,at + b D° Chapter 7. DERIVATION PROCEDURES 78 Example 7.1.11 Find a network realization of following function y= < x if F = 1 0 otherwise (7.76) where F is a function whose output is either 0 or 1. S O L U T I O N Let x-a if F = 1 0 otherwise (7.77) where a < x Vx £ X, and A" is the definition domain of function (7.76). Then y(x) = f.ix) + aF whose network realization, according to Fact 7.1, is y=CMx),F\l,aD° fi(x) can be expressed as AGO = where b > x — a Vx £ A\ Let x - a - f e F iix-a-bF>0 0 otherwise then according to Fact 7.2, and, according to Fact 7.1, / u (x ) = x - a - bF fi(x) =< fn(x) >° (7.78) (7.79) (7.80) (7.81) (7.82) / n (x ) = C z , F | l , - & D ° (7.83) Chapter 7. DERIVATION PROCEDURES Hence, h{x) =< x,F\l,-b>a The network realization of F, according to Example 7.1.1, is F = c F | - l y 1 Substituting (7.85) to (7.84) yields Mx) =<x,F\l,b>a+b Substituting this to (7.79) yields y=C<x,F\l,b>a+b,F\l,aD° tt Example 7.1.12 Find a network realization of the following function x if F = 0 0 otherwise where F is a function whose output is either 0 or 1 SOLUTION According to the previous example, the network realization y =C< x,F | l , 6 > ° + b , F | l , a D° and recall Example 7.1.1, we know that F=cF\ - ID-1 Substituting this equation to (7.88) and after simplification, we have y =C< x,F\ l , - f e> a , F | 1,-a D~a y = s Chapter 7. DERIVATION PROCEDURES 80 7.2 Realization of M I S O Functions This section shows the network realization of MISO functions. The procedure of network derivation of MISO functions is as follows. step 1. Decompose function / as / = / i ° h where fr is a linear function; step 2. Proceed with the procedure for realizing SISO functions to find a network realization of fY; step 3. Compose the network realization found in step 2 and the network realization of /L to form the network realization of / ; step 4. Using theorems in Chapter Six to simplify the resulting network. The use of this procedure is shown through the following examples. Example 7.2.1 Find a network realization of the following function / :Z>?-2> 2 (7.91) where £>, is a finite set of numbers. SOLUTION We consider the following two cases. Case I: / is a linear function. Since we can express / as f = Y,biXi + b0 (7.92) t=i where Xi £ X^. According to Fact 7.1.1, we have f(x)=Cxux7,...,xn \bub2, ...A Db0 (7.93) Case II: / is a non-linear function. Chapter 7. DERIVATION PROCEDURES 81 Here we introduce a linear function as follows fL : V3 (7.94) and a finite non-linear function fi : X>3 - V2 (7.95) We now find the network realizations for both / i and fr,, and then compose them to form the network realization of / . Let Mi±\Di\ V i e {1,2,3} and Zi denotes an element in V3, and a,- = f\{zi). According to Example 7.1.7, the network realization of fi is fi{z) = C C 2 D * 1 , • 2 | - 1 I T * 1 , C z Z]ZM3, • 2 | - 1 • - * w 3 (7.96) | Ol, «1, % 3 ) Z M 3 ^ a where a = a t - According to Fact 7.1, z = JL(X) = C x1,x2,...,xn\bl,b2,...,bn Dbo (7.97) Substituting (7.97) to (7.96) and after simplification, we have y = C C x i , ...,xn\bi, ...,bn I f 1 , C X i , . . . ,x n| - 6 i , - b n Zi~ci, • xi , . . . ,x„|&i, ...,&„ U c « 3 , C xj , . . . , x „ | - & i , . . . , - & „ • " ^ (7.98) where c, = z; + fe0- if This example shows that any multi-variable finite function can be realized by a LQT network with a depth at most of 2 and a size at most of 2Ma(n + 1). It leads to the following theorem. Chapter 7. DERIVATION PROCEDURES 82 Theorem 7.2 Any multi-variable finite function is realizable. Since a pixel takes integer value from 0 to 255, any image processing technique can be expressed as finite functions. According to this theorem and Theorem 7.1, it is clear that the LQT networks are capable of realizing any image processing technique. In Example 7.2.1, we made an assumption / = f\ o //,, here we show under what condition this assumption is true. Denote f2 = / i o / L , we have the following theorem. Theorem 7.3 The necessary condition for f = fi is thatVxi, x2 E if f{x\) 7^ f(x2), then fL(xx) ^ / L ( X 2 ) . P R O O F If there exist Xi, x2 E Z>" s u c n t n a t f(xi) / f(x2) but /L(£*I) = /L (^ 2 ) , then fx o = /1 0 h(x2)- This implies that / 2(x x) = f2(x2), hence / 2 ^ / . I Corollary 7.3.1 ITie necessary condition for f = f2 is M3 > M2. P R O O F If M3 < M2, then there must exist a pair x a , x2 E T>™ such that f(xi) ^ f{x2) but fL(xi) = h(x2) I Theorem 7.4 Functions ft and fL are not unique. P R O O F This theorem will be shown to be true in some of the following examples. I As shown in Example 7.2.1, the size of the network realization of a non-linear multi-variable finite function is proportional to M3. Therefore, according to Corollary 7.3.1, the best projection function /j, is the one whose codomain size (M3) is equal to the codomain size (M2) of the non-linear function itself. Here, we introduce a projection function which is proper for all multi-variable finite integer functions. z = Y,xiM1 i-1 ' (7.99) 8 = 1 Chapter?. DERIVATION PROCEDURES 83 Such a function is called one-to-one projection since / L ( X I ) ^ / L ^ ) iff £1 7^ %2 Vx*i, x2 € which implies M 3 = M x n . Another projection function, which is proper only for a certain class of finite integer functions, is as follows z = X > < (7.100) which has the property that Af3 = n(M\ — 1) + 1. We will refer to functions (7.99) and (7.100) as projection I and projection II. The following examples show network derivation of some multi-variable finite integer functions. Example 7.2.2 Find a network realization of the AND function y — Xi A x2 A . . . A xn (7.101) SOLUTION Using projection I /L{X) = £"=1 2 , _ 1a; t-, the AND function can be ex-pressed as V = fx o h{x) (7.102) where ' 1 if z > 2 n - l AW = (7.103) 0 otherwise which, according to Fact 7.1, has the following network realization / i ( z ) = C z D 2 " - 1 (7.104) According to Fact 7.1, the network realization for J L { X ) is h(x) = C xux2, xn I 1 , 2 , 2 " " 1 D ° (7.105) Substituting it to (7.104) yields y = C C x u x 2 , x n I 1 , 2 , . . , 2 " " 1 D ° D 2 " " 1 (7.106) Chapter 7. DERIVATION PROCEDURES 84 which, according to Theorem 6.10, can be further simplified to y=CI :r i ,S2, - . . , *» | l , 2 , . . , 2 " - 1 z T - 1 (7.107) The resulting network has a depth of 1 and a size of n. If we use projection II / L ( X ) = 5Z"=i and chose 1 if z > n 0 otherwise then Since and hence (7.108) y = fiofL(x) (7.109) h(z) =C ^ •» (7.110) h{x)=CxuX2,...,xnD0 (7-111) y = d x i , x 2 , . . . , s B (7.112) The resulting network also has a depth of 1 and a size of n. (J Example 7.2.3 Find a network realization of the OR function y = Xl V x 2 V,. . . , Vx n (7.113) SOLUTION Using the projection I / L ( X ) = 2 ,_1x,-, the OR function can be expressed as y = fxofL(x) (7.114) where ' 1 if z > 1 Mz) = { ~ (7.115) 0 otherwise Chapter 7. DERIVATION PROCEDURES 85 From previous example, the network realization can be easily derived as If projection II is used, then we have the following network y = C xi,x2, ...,xnZi Note that both the two networks have a depth of 1 and a size of n. ft Example 7.2.4 Find the network realization of an XOR function y — X\ © x2 © ... 0 xn (7.116) SOLUTION Using projection II JL(X) = £"=i xi-> t n e XOR function can be expressed as w here Let then Since y = / i ° h(x) In = 1 if z is odd 0 if z is even 1 if z > i 0 otherwise /i(*) = E ( - i r 7 i , ( * ) t=i hi = C z (7.117) (7.118) (7.119) (7.120) (7.121) hence h(z) = C C z l l ' ^ z D 2 , C z | 1,-1, (-I)*- 1 3° (7.122) Chapter 7. DERIVATION PROCEDURES 86 5ince h(x) —C xi,x2,xn D ° (7.123) we have (7.124) y = C C xi,x2,xn Z]1, C #i, x 2 , a ; n • 2 , • x l 5 x 2 , x n I c — i ) " - 1 ^ ° The resulting network has a depth of 2 and a size of ra2 + n. For the sake of comparison, let us use projection I. Since it is difficult to express the non-linear function / i for an arbitrary n , we only show a special case when n — 3. Let fL(x) = J2?-1*i and AC*) = t ' = l 0 z = 0 1 z = 1 1 z = 2 0 z = 3 1 z = 4 0 z = 5 0 z = 6 1 z = 7 then y(s) = / i ° h(x) According to Example 7.1.8, we have (7.125) (7.126) (7.127) fi(z) = c r z z z i 1 , r z z z i 3 , r z z z i 4 , r z z z i 5 , r z z z i 7 |.i,—1,1,—1,1 D ° (7.128) Chapter 7. DERIVATION PROCEDURES 87 hence, y = C C Xi,x2,x3 | 1,2,4 X i , x 2 , x 3 | 1,2,4 Z l 3 , C Xi, x 2 , x 3 | 1,2,4 C x ! , x 2 , x 3 | l , 2 , 4 D 5 , C I x 1 , x 2 , x 3 | 1,2,4 | 1 , - 1 , 1 , - 1 , 1 D° Note that this network has a size of 20 while the network derived using projection II has a size of 12 (n = 3). Remark Example 7.2.4 shows that using projection II results in smaller network size, this is generally true. Although in both Examples 7.2.2 and Example 7.2.3, using projection II has no advantage in terms of network depth and network size, the magnitude of weights is much smaller. This is important in hardware implementation, since the value range of the weights have to be limited. Chapter 8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S In this chapter, image enhancement (IE) techniques are re-categorized according to their realization difficulty into two classes: linear and non-linear filtering techniques. Examples of realizing some of the techniques are provided here to show the effectiveness of the algebraic derivation approach. Since the linear techniques are easy to realize, they are treated briefly with two examples; the rest of this chapter is devoted to the realization of non-linear techniques. 8.1 Network Realizations of Linear Techniques A two-dimensional image can be projected to a one-dimensional image. For instance, an image U whose size is K x L can be mapped to a one-dimensional image g with the following relationship 9i = Ukji Vfce {1,2, ..,#}, V/G{1,2,...,I} (8.1) where i — (k — 1)L + I. This mapping is one-to-one, that is, U can be uniquely mapped back from g using equation (8.1). For all the linear filtering techniques, the enhancement process can be expressed, in matrix form, as f = E g + t (8.2) where g is the image to be enhanced and f is the enhanced image. For an input image 88 Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 89 of size n and an enhanced image of size m, the network realization of this function is fi = C gii92i— ,9n | e n ,e 1 2 , . . . ,e l n Dh h — C g1,g2,...,gn | e 2i, e 2 2 , e 2 n D < 2 fm — ^ 9li92i ••••>9n | emli em2i ••••> emn 3* A more compact notation of this network is f = C g | E D * The architecture of (8.3) is shown in Figure 8.1. en (8.3) (8.4) Figure 8.1: Network Architecture for Linear Filtering Now, for the linear filtering techniques, the task left is to map those techniques to the form shown in (8.2). The following two examples illustrate such mapping. Example 8.1 Find a network realization of the averaging filter. SOLUTION Suppose the following image U Ul,l « 1 , 2 "1,3 Ul,4 "2,1 "2,2 "2,3 "2,4 «3,1 "3,2 "3,3 "3,4 "4,1 "4,2 "4,3 "4,4 (8.5) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 90 is to be enhanced by the equally weighted averaging filter of (2.10) and the window of the filter is 3 x 3, then the processing, in the matrix form, is where with 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 f — (/l»/2, •••,/l6)T g = {gi,g2,--,gi6)T g = A g (8.6) (8.7) (8.8) 9i = uKl Vfc,Ze {1,2,3,4} Comparing (8.6) with (8.2), we obtain the weight matrix and the threshold vector as Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 91 follows E = A t = 0 where 0 represents the zero vector whose elements are all zeros, ft Example 8.2 Find a network realization of the zooming technique. S O L U T I O N Suppose the following image U = (8.9) (8.10) "1,1 "1,2 « 2 , 1 "2,2 is to be magnified two times by the the replication technique described in Section 2.2 such that the resulting image has the size of 4 x 4, then the zooming process, in the Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 92 matrix form, is where with f = 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 g = R g f — ( / i , h, fie)T (8.11) (8.12) (8.13) 9i = Uk,i Vfc,/ € {1,2} Comparing (8.11) with (8.2), we obtain the weight matrix and the threshold vector as Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 93 follows E = R t = 0 (8.14) 8.2 Network Realizations of Non-linear Filters There is no unified theory of non-linear filtering techniques. As a result, there is no universal network structure which can be used for all of them as in the case of linear filters. 8.2.1 Dynamic Range Modification Contrast stretching, clipping, thresholding, and window slicing are techniques which modify the dynamic range of a given picture. For this group of techniques, the processes have a generalized form as follows. axx + b\ > x > t0 a2x + b2 t2> x >ti V anx + bn tn > x > The derivation of the network realization of above function is as follows. Let a'{x + 6- x > t{_i 0 otherwise M*) = { \/i E {l,2,...,n} (8.15) (8.16) where Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 94 with It can be verified that «o = Oo = U V = / o ° ( / i , / 2 , ...,/„) where fo = YA=I fi, whose network realization is fo = C / l , / 2 , —,fn 3° According to Example 7.1.10, we know that fi(x) = C < x >u~\ C X-3u~ l | Ci,di D° (8.18) (8.19) (8.20) (8.21) where c, = a\ and di = (a(f,_i + &(•). Substitute this to equation (8.20) , we obtain the whole network as follows y = C / i , / 2 , ...,/„ Z>° {/,• = C M,1,M»2 I c « ' ^ « 3 ° = < x > ti~1 [ Mi2 = rz x z i ' - 1 (8.22) Applying Theorem 6.10, the above network can be simplified to y — C Mll,/i 2l,---,Mnl,Ml2,M22,---,Mn2 | Ci, C2, ..., C n, dX , d 2 , d n 3° ^ = < x >«••-! (8.23) Hi2 = r z X D ' - 1 Based on the network model of (8.23), the network realization of the contrast stretch-ing technique (see section 2.2.1) is v = c < u >°, < u > a,< u > b, r z « z i a , r z u3 b \ a , (/? - a ) , ( a - fl + 7), (va - a/3), (vb - va + a/3 - 67) Z)° Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 95 Similarly, the network realization of the clipping is v =C< u >°, < u >b, C C u3b | /?, -/?, va, (vb - va) D° (8.24) Finaly, the network realization of the thresholding is (8.25) 8.2.2 Order Statistic Filtering Order statistic filtering is a simple yet powerful technique widely used in signal processing called the k th order statistic of the array. An order statistic filter is a linear combination of the order statistics. Variation in the weights results in various members of the order statistic filter (OSF) family, some of which have been intensively studied and widely used in signal processing. Median filters is one example [77, 46, 38, 68]. Order statistic filters have the virtue of being very simple and having the edge preserving property. Their performance is, if not optimal, satisfactory in most cases. To formulate the problem of order statistic filtering in such a way that a network realization can be derived, some functions need to be introduced. They are the rank function, the comparison function, and the order statistic function. Definition 8.1 For X = (xx, x 2 , x n ) , the rank function of Xi is [11, 56]. The core operation is to find the k th largest element in an array. This element is ran k(Xi :X) = k, if X{ is the k th largest element in X (8.26) Definition 8.2 For X — (xx, x 2 , x n ) , the comparison function for x,- and Xj is c(Xi,Xj) = < 0 Xi > Xj 0 Xi = Xj i > j 1 Xi = Xj i < j 1 Xi < Xj (8.27) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 96 Definition 8.3 For X = (a;1} x 2 , x n ) , the order statistic function is os(k : X) = Xi if rank(x< : X) = k (8.28) From the above definitions, we have the following theorem. Theorem 8.1 The rank function rank(xi : X) and the sum of comparison functions related to x,- have the following relationship, n rank(xi : X)— ^2c(xi,Xj) + I j / i (8.29) The core operation in order statistic filtering is to to find the k th largest member of an array of pixels. For instance, the median filtering is to find the { r^) t h element of the input array. Therefore, it is necessary to derive the network realization of the order statistic function first. Let i m = Xi if rankfx,- : X) = k V ' V i e {1,2,...,n} (8.30) 0 otherwise then it is trivial to show that os ( * :X) = (8-31) »'=i that is, os(k:X) = f0o(f1,f2,...,fn) (8.32) where /o = E / i (8-33) which can be realized by a linear neuron as / o = C / i , / 2 , . . . , / „ D ° (8.34) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 97 Let F(Xi) ± { 1 if rank(x,- : X) — k 0 otherwise Vi € {l,2,...,n} (8.35) then, , Xi if F(xi) = 1 fi(x) = { ; ViG{l ,2 , . . . ,n} 0 otherwise According to Example 7.1.11, function fi(x) can be realized as Mx) =C< xu F{xi) | 1, b >a+b, F(xi) | 1, a D° (8.36) (8.37) where a < min{min{xi},min{a;2},min{a;n}} and b > max{max{ii}, m a x{ i2} max{in}} — a. According to Example 7.1.8, the network realization of F(xi) is F(xi) =CC rank(xt- : X) • f c , C rank(x,- : X) Zik+1 | 1, -1 D° According to Fact 7.1, the network realization of the rank function is (8.38) Since rank(xi : X) =C c(xi,xi), ...,c(xi,xi_1),c(xi,xi+1), ...,c(xi,xn) D 1 «Z/j ^ x% c(xi,Xj) - \ if z > j 0 otherwise - l (8.39) (8.40) and 1 Xj > Xi if i < j 0 otherwise according to Example 7.1.4, and Fact 7.3, the network realization of c(xi,Xj) is C C Xi, Xj | — 1,1Z3° j - 1 D - 1 if i > j (8.41) C Xi,Xj | - 1, I n 0 if i < j i € {l,2,...,n} (8.42) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 98 Therefore, the network realization of the order statistic function is V = C / i , / 2 , ...,/„ D° {/,-=C Pi,F(Xi) | l ,<O 0 {pi =< xu F(xi) | 1,6 > a + 6 {F(xi) = C Hiufii2 I 1,-1 D° pu = LZ rank(xj : X)Z3k pi2 = C rank(xj : X)3k+1 {rank(xj : X) =C d x (x j , X i ) , ^ ( X J , X i_ j ) , C 2 (XJ, xi+1),C2(XJ, x„) D _ 1 {a\(xj, Xj) =C c^x^Xj) | - 1 D _ 1 Ci (x ,- ,X j ) = Cx,-,Xj | 1, —lZZf c 2(x t",Xj) — LZXj, Xj | l , l d where i £ {1,2, ...,n}. The above network can be simplified to V —, C Pl,Pll, Hl2, A*2, H21, H22-, A*n, Pnl, Pn2 | 1, O, ~«, 1, «, ~ « , 1, «, ~« 3° {/X,- =< X,-, / i j i , ^ j 2 | 1, 6, -6 > a + 6 /^ti — CZ CJI , . . . , C j j _ j , Cj j_|_i ,...,Cj > n | 1,..., 1,1,...,Id fii2 — C CJI, C j ( j _ i , C t , i + i , C j j n I — 1 , — 1,1, lZZ\ k 1 + 1 (8.43) (8.44) id. CXi,Xj | l , - l n ° if i > j C X J , X J | - 1,1D° if i < j This network has a depth of 4 and a size of 2n(2n + 1). It is shown in Figure 8.2 for the case n = 3. This network is called OSnet and is denoted as OSnet(k,n), where n is the number of inputs and k means that the network's output is the k th largest element of the input array. This network's schematical representation is shown in Figure 8.3. The OSnet can be made adjustable by changing Hij LZ Ci\, ..., Cj ti—\, , •• •, Q,n | 1,1,...,lZ] k-i+j-1 (8.45) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 99 Figure 8.3: The Schematical Representation of OSnet Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 100 1 Xi x2 r© Figure 8.4: The Schematical Representation of Adaptive OSnet where j € {1,2}, to ,%+\i ••••> c t , n , X s (8.46) | - 1 , . . , - 1 , 1 , . . . , 1 , - 1 ^ ' — 1 where xs is used to change the value of k. This new network model is called adaptive OSnet. Its schematical representation is shown in Figure 8.4 Median Filtering Median filtering is a simple operation in which the median value of the data within a sliding window centered at a point is taken as the output of that point. Median filtering has been shown to be effective in suppressing impulsive noise components while preserving edges of a signal. It has been applied in several areas of digital signal processing which include speech processing and image enhancement. For an array of (2m + 1) numbers, the median is the (m + l) th largest number. Therefore, an OSnet(m + 1,2m + 1) can implement the median filter. Detailed account of this network can be found in [83]. Separable Median Filtering. When median filtering is extended to two or more dimensions, signal features are generally distorted or deleted. Separable median filtering proposed in [68] yields somewhat better results in preserving signal features. Separable median filters consist of two types of one-dimensional median filters — one oriented in the horizontal direction and the other in the vertical. More explicitly, the Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 101 output value, y(l,k), at position (l,k) is given by v(l, k) = median{z(/ — L, k),z(l, k),z{l + L, k)} where z is defined as z(p, q) = median{u(p, q-K),..., u(p, q),u(p, q + K)} (8.47) (8.48) and u(l, k) are the sample values of the input signal. A separable median filter with a window size of (2L + 1) x (2i^+l) can be implemented by using (2L + 2) or (2K + 2) OSnets depending on whether the horizontal direction or the vertical direction is median filtered first. The network of a separable median filter with the horizontal direction filtered first is shown in Figure 8.5. u(l-LJc-K) u(l-LJc) u(l-L,k+K) u(l,k-K) u(l,k) u(l,k+K) u(l+L,k-K) u(l+L,k) u(l+L,k+K) z(l-LJc) z0» 0 v(W z(l+L,k) Figure 8.5: A Max/Median Filter Realized by Using OSNets Max/Median Filtering. Max/Median filtering is proposed by Arce and McLough-lin [8] in order to achieve better preservation of signal features. In the two dimensional case, if samples in lines separated by 45° is taken into account, the Max/Median filtering is defined as y{l, k) = max{z(s, 1), z(s, 2), z(s, 3), z{s,4)} (8.49) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 102 where z(s,l) = median{a(/, k - M), ...,a(l,k), ...,a(/, k + M)} z(s,2) = median{a(/ — M, k), ...,a(/, k), ...,a(l + M, k)} z(s,3) = median{a(/ + M,k — M),a(l,k),a(l - M,k + M)} z(s,4) = median{a(/ - M , k - M ) , ...,a(/, fc), ...,a(/ + M, k + M)} where a(/, A;) is the pixel's value of the input picture at position (/, k), and y(l, k) is the output of the filter at the same position. A Max/median filter with a window size of (2M + 1) x (2Af + 1) can be realized by five OSNets as shown in Figure 8.6. aG^-M) —r-a(W — T aOJc+M) — -a(l-M,k) — r aQ+MJc) — -a(l+MJc-M) — --4 a(l-MJc+M) —' -a(l-MJc-M) — -a 0 « —f-a(l+MJc+M) — L z(s,D z(s,2) z(s,3) z(s,4) Figure 8.6: A Max/median Filter Realized by Using OSNets M a x / M i n Filtering Taking neighborhood min or max is a generalization of shrinking or expanding the l's in a two-valued picture. Iterated local min followed by iterated max can be used to remove high-valued blemishes on a low-valued background. Similarly, iterated local max Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 103 followed by iterated local min can be used to remove low-valued blemishes on a high-valued background. Finding the maximum of an array is equivalent to finding the 1st largest element, therefore, the OSnet(l,n) can be used. To find the minimum of an array of size n is equivalent to find the n th largest element, thus the OSnet(n,n) can be used. There-fore, the Max/Min filtering can be implemented by using OSnet(l,n) and OSnet(n,n) iteratively. Adaptive Order Statistic Filtering A type of adaptive order statistic filters which are called comparison and selection filters (CS filters) is proposed by Lee and Fam [56]. The output yi of the CS filter with parameter h at position / through the input array Xi = (x/_m, ...,x/ + m) is defined as where u\ and v\ are the sample average and sample median respectively, and h is an integer satisfying 1 < h < m. Vi = < os(m + 1 + h : Xi) if u; > V[ os(m + 1 — h : X{) otherwise (8.50) Let 1 if ui > vi (8.51) < 0 otherwise and fi A os(m + l + h:Xi) if F = 1 (8.52) < 0 otherwise (8.53) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 104 then, function (8.50) can be rewritten as yi = fi + h (8-54) which can be realized by a linear neuron as follows yi=Ch,f2D° (8.55) According to Example 7.1.11, we have / i =C< os(m + l + h: X{),F | 1,6 > ° + 6 , F\l,aD° (8.56) and according to Example 7.1.12, we have f2 =c< os(m + l-h: Xt), F | 1, -b >a, F | 1, - a D~a (8.57) In the above two equations, a < min{os(A; : Xi) : k £ fc} and 6 > max{os(A; : X{) : A; £ /C} — a, where fc = {1,2,...,2m -f- 1}. Since os(& : X/) > 0 VA: £ fc, we choose a — 0. Consequently, network realizations of / i and f2 can be simplified to / i =<os(m + l + / i : X , ) , F | l , 6 > 6 (8.58) and / 2=<os(m + l - / i : X j ) , F | l , - 6 > ° (8.59) According to Fact 7.3, the network realization of F is F=LZuhv, | 1,-1 13° (8.60) Since the network realizations of ui and vi are already known, the network realization of the CS filter is y, = C / i , / 2 D ° / i = <os(m + H - / i : X , ) , F | l , 6 > 6 < / 2 = <os(m + l - / i : X / ) , J P | l , - 6 > ° (8.61) {F =C os(m + 1 : X,), | 1, -1 Z3° {vi =C X;_m, X/, X; + m | C, C, C D° Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 105 where c = l/(2m + 1). Three order statistic networks are needed to realize the above network. However, we can combine all the three networks together into one network and reduce the number of weights needed. Let us define a function as follows Z\ = os(m + 1 — h : X\) z2 = F (8.62) z3 = os(m + 1 + h : Xi) where F is defined as in (8.51). Recall the network realization of the order statistics function described earlier, we can now combine three such networks to form a network realization for function defined in (8.62). After simplification, the network realization is as follows. Z\ — C Mn, M12, • Min D° Z2 = (Z M21, M22, M2n, £/-m, X l + m | 1, 1, 1, — C, — C ZJ° z3 = C M31) M32, M3n Z)° {pki =< a;/_m+t_i,MfcoMfc,- I 1, 6, —6 >6 » G K and fc <G {1,2,3} H\i = C c,i,c i 2, ...,cin < H2ki = ZZ qi,c<2,...,c,-n • a;/_ m +t-i, x ( _ m + j _ i | 1,-1 Zf \ii> j IZ x / _ m + , _ i , X ( _ m + j _ ! | - 1,1 if z < j where dk{ = m + 1 + (fc — l)/i — i , and n = 2m + 1, i.e., the number of input elements. This network model has a depth of 4 and a size of n(8n + 5). (8.63) i cij = i e£ and fc € {1,2,3} i,j <E IC Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 106 X l Z i / • V Z2 / * \ -Z3 7 1 h ,b Q) 0 L) 0 Q) Figure 8.7: A Network Model of the CS Filter Now, the CS filer has a network realization as follows: y, = C / i , / 2 D ° (8.64) fi = <z1,z2\l,b>b . /2 = < zz, z2 | 1, -b >° The whole network has a depth of 5 and a size of n(8n + 5) + 6. This network is shown in Figure 8.7. 8.2.3 Directional Filtering Although smoothing techniques can clear a noisy image to certain degree, a common side-effect they have is that edges get blurred. To alleviate the blurring effect, directional filtering is sometimes used. An example is the directional averaging filter described in Chapter Two. Suppose the directional averaging is done along four different directions, say, 0°, 45°, 90°, 135°, then for each direction, an average is calculated as v(m,n:9i) = u ( m - k , n - l ) , 0{ <E { (« - 1)45° : i € X = {1,2,3,4}} (8.65) where We{ is the window along the direction of 0,-, and N is the window size which is the same for all the windows. An optimal direction 6* is chosen such that \u(m, n) — v(m, n : Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 107 9*)\ = min{|u(m, n) — v(m,n : #;)| : i £ I}. The desired result is v(m,n) — v(m,n : 9*) (8.66) Let y = v(m,n), ?/,• = u(m, n : 9*), x = u(m, n), and fc = |y — yi\. We define the following function fi = { yi if rank(</>,- : $) = 4 0 otherwise where $ = (^1? <^>2, <j>3, 4>A)- It is easy to show that 4 2/ = J2f i=i which can be realized by a linear neuron as follows. (8.67) (8.68) y = C / i , / 2 , / 3 , / 4 D ° (8.69) Let A I 1 if rank(<^ ,- : $) = 4 I^i — 0 otherwise then, fi = S Vi ifFi = l 0 otherwise which, according to Example 7.1.11, has a network realization fi=C<yi,Fi\l,b>a+b,Fi\l,aD° (8.70) (8.71) where a < min{min{t/,} : i £ 1} and b > max{max{y,} : i £ X} —a. Since ?/,• > 0 Vi £ X, we choose a = 0. Consequently, the above network can be simplified to fi=<yuFi\l,b>b (8.72) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 108 (8.73) (8.74) i cij (8.75) The network realization of Fi is Ft =CC rank(& : $) ID4, C rank(fa : $) ZI5 | 1, - 1 Z>° Because rank(^ >,- : $) < 4, this network can be simplified to F{x{ : X) =LZ rank(& : *) ZI4 The rank function has a network realization as follows. rank(<^ : $) - C c t l , c , - , , - - i , c,-4 | - 1 , - 1 , 1 , 1 D~* LZ | 1,-1 Z)° iii>j < - l , l Z ) ° if * < i The network realization of fa is fa =C< x, y{ | 1, -1 >°, < x, y{ | - 1 , 1 >°D° The network realization of ?/,• is Vi = C x-, x 2 , x f | d, d,d D° where where cZ = 1/iV, and x^ is the j th element of the sequence X , which is the set of pixels in the window W6i. Note that if N = 2M + 1, x = xf+1. Composing all the network realizations together, we get the network model for the directional averaging filter. After simplification, the network model is as follows. V = C / i , / 2 , / 3 , / 4 D° {/,- =< x], x 2 , ...x?, Fi | d, d , d , b >b i e l {Fi =C C i i , . . . , C i ) , - _ i , C i i j + i , . . . , C , - 4 Z) 4 {cfj =LZ u},p2,n),n) ZZ\° (8.76) (8.77) (8.78) A f + l ..,X AT — d,—a", 1 — d,— c? >° Mt2 = <x, 1 , . . ,xf ,x ,x^+ 1 , . . . ,xf |<*, . . . ,d,d-l , . . . ,d>° where i , j G X. This network model has a depth of 5 and a size of 4(37V + 17). For the case N = 3, it is shown in Figure 8.8, where the connection weights are not shown. Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 109 Figure 8.8: A Network Realization of the DA Filter for the Case N = 3 Chapter 9 N E T W O R K R E A L I Z A T I O N S O F I R T E C H N I Q U E S Image restoration (IR) techniques are classified here, as in Chapter Eight, into two classes: the linear and the non-linear filtering techniques. The inverse filter, the Wiener filter, and so on belong to the first class; the Maximum Entropy filter and the Bayesian methods belong to the second. This chapter shows derivation of LQT networks for realizing some linear and non-linear filters, and a comparison with the Hopfield network which is commonly used for solving optimization problems. 9.1 Network Realizations of Linear Filters For most linear filters, the essential problem is to find the best estimate f such that the following cost function £(f) = a(||g-Hf||2-||n||2) + ||Qf||2 (9.1) is minimum (see Section 2.2). The solution to this problem is f = ( H T H + A Q T Q ) x H T g (9.2) where A = 1/a. For function (9.2), the network realization, as shown in Chapter Eight, is f = C g | W D 0 (9.3) where W = ( H T H + A Q T Q ) - 1 H r (9.4) 110 Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 111 and 0 is a row vector of zeros. However, the complexity of image restoration stems from the difficulty of calculating the inversion ( H T H + AQTQ) \ This is the inversion of a large matrix. Suppose we are dealing with a picture of size 64 x 64 (a very small picture), the matrix ( H T H + AQTQ) has the size of 4096 x 4096. To reduce the computation complexity, Discrete Fourier Transform (DFT) techniques [26] or iterative methods are used [40] Here, we derive a network model which implements the steepest decent method, which is the simplest iterative method. The steepest decent method finds the optimal estimate by making an initial guess and approaching the true optimal estimate step by step. Denoting the estimate at time fc as f(fc), then the difference (f(fc + 1) — f(fc)) is proportional to the opposite direction of the gradient of the cost function E at time k. The gradient vector of E is usually denoted as S7E, which at time k is VE(k) = -2(AQ TQ + H T H ) f + 2 H T g (9.5) Therefore, the estimate at time fc -f 1 is f(fc + l) = i(fc).+ £((AQTQ + HTH)f(fc) - H T g ) = (I + aQ TQ + 6 H T H ) f (fc) - 8HTg (9.6) where 8 takes a small positive value. The optimal 8 is a function of f (fc). For the sake of simplicity, it is usually chosen as a constant [93]. The network realization of function (9.6) is f(fc + l )=Cf(fc )|W3 t (9.7) where W = I + £AQ TQ + 8HTH (9.8) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 112 and t = 6 H r g (9.9) For an n x n, this network has a depth of 1 and a size of n 4 . The task of finding the network realization of linear filters is now reduced to find the weight matrix W and the threshold vector t, which only involve simple matrix operations such as transposition and multiplication. The following are network realizations of two well-known linear filters. Psudo-inverse Filter. For the inverse filter, the weight matrix and the threshold vector are W = I + 6HTH (9.10) t = <5Hrg Wiener Filter. For the Wiener filter, the weight matrix and the threshold vector are W = I + ffif-1^ + 6 H T H ^ t = SWg The matrix Rf is usually assumed diagonal, hence its inverse can be easily calculated. 9.2 Network Realizations of Non-linear Filters Similar, to the network realization of image enhancement techniques, there is no universal network architecture for non-linear image restoration filters. Here, we show the derivation of LQT networks for realizing some of these filters and the limitation of LQT networks. M a x i m u m Entropy Filter The cost function of a maximum entropy filter is J(f) = Flnf - A(||g-Hf|| 2 -||n|| 2 ) (9.12) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 113 The gradient vector is VJ(f) = lnf + 1 - A(2H r(g - Hf)) (9.13) where 1 = ( 1 , 1 , 1 ) T . Based on the steepest decedent method, the estimate at time (ib + 1) is f(k + l) = f(fc)-^(lnf(fc) + l + A(2H T(g-Hf(fc)))) = (I-2A<5HTH)f(fc) + (2<5AHTg-l)-<51nf(fc) (9.14) The realization of the term In f by using LQT networks requires a huge amount of neurons. One way to reduce the network size is to include a new type of neurons which have the logarithm as their activation function. Since the logarithm function can be easily implemented in analog circuits [64], it is not unreasonable to include such a neuron model. Let us denote the natural logarithm activation function as N. Function (9.14) can be realized by a network model shown in Figure 9.1. Where W = I - 2 A £ H T H (9.15) and t = 2<5AHTg - 1 (9.16) M a x i m u m Likelihood Filter The cost function of a maximum likelihood filter is A*) = " | ( g - *{Hf ^ R ^ g - s{Hi}) + lnp(g) + K (9.17) The gradient vector is VJ(f) = B ^ D R - ^ g - 5{Hf}) (9.18) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 114 fi(k) f.OO fn(k) where Figure 9.1: A Network Realization of the Maximum Entropy Filter D 4 D i a g { ^ | x = k} (9.19) and bi are the elements of the vector b = Hf". The estimate at time (fc + 1) is i(k + l) = f(k) - 5 H r D R ^ 1 g + (5HTDR~15{Hf(A;)} = Wf(fc) - t + Qs{Hf(fc)} Assume the noise term n is independent, then ( R " 1 ) 7 = R^ 1 , hence we have (9.20) m r D R ~ 1 s { H f } - m ' R ^ D s j H f } (9.21) The term Ds{Hf} = Diag(s'(b1),s'(b2),..,s'(bn))(s(b1),s(b2),..,s(bn))T = (s'(b1)s(h),s'(b2)s(b2),...,s'(bn)s(K))T (9.22) where s'(i,) = x = bi. The term s'(6,-)s(6,-) is a multiplication of two variables, whose network realization by LQT networks requires a huge amount of neurons. One way to reduce the network size is to use the sigma-pi neurons mentioned in Section 2.1. Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 115 In the case when s(.) is linear, function (9.20) is reduced to i(k + l) = i(k)-SRTR?g + 6H.TTL?s{H.i(k)} = (I + H T R~ 1 H)f(fc) - ^ H T R z 1 g (9.23) This filter then can be realized by the network model of (9.7) with W ^ I + H 3 ! ^ 1 ! ! (9.24) and t = 6RTR?E (9.25) M a x i m u m A Posteriori ( M A P ) Fi lter The cost function of a MAP filter is J&) = -\(s-s{Bi})TR?te-8{Ht}) - f ) T R7 1 (? - f) + lnp(g) + K (9.26) The gradient vector is VJ(f) = H ^ D R - ^ g - s{Ui}) - Kfii - f) (9.27) where D is defined as in (9.19). The estimate at time (k + 1) is f( ib+l) = f(k) — ^ H T D R ~ 1 g + 6H T DR^s{Hf(fc)} + ^ R j 1 ^ ^ ) - f) = (I + ^Rj x ) f ( fc) - < 5R7 1 f -<5H T DRZ 1 g + (5HTDR^1s{Hi(fc)}(9.28) Because of the term 6H rDR^ 15{Hf(fc)}, we have the same problem as in the implemen-tation of the maximum likelihood filter. In the case when s(.) is linear, the MAP filter is reduced to i(k + !) = (! + SRJ1 + 6H TR^n)i(k) - ^ R j 1 ? + H T R^ 1 g) (9.29) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 116 This filter can be realized by the network model of (9.7) with W = I + SR-J1 + 6HTR^H (9.30) and t = ^Ry 1? + H ^ g ) (9.31) 9.3 Comparison with the Hopfield Network Zhou, Chellappa, et al. applied the Hopfield network to solve an image restoration problem [99]. The restoration criterion they chose is that of minimizing the following cost function E = i | | g - H f | | 2 + ^ | | D f ||2 (9.32) where || . || represents the L 2 norm, A is the Langrange multiplier, and f is the restored image. Such a constrained cost function is widely used in the image restoration problems [6] and is also similar to the regularization techniques used in early vision problems [73]. The Hopfield network can minimize the following energy function: EH = - i v T T v - 0Tv, (9.33) where T is the connection matrix, and 0, are the output and the threshold of the i th neuron respectively. By choosing the right connection matrix T and threshold vector 0, the minimization problem of the restoration can then be solved by the Hopfield network. The cost function (9.32) has to be modified to fit the Hopfield network model because an image vector is not binary. Since each pixel takes integer value (usually ranging from 0 to 255), it can be represented by a limited number of neurons, such as f, = J2cjVj (9.34) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 117 An example is Cj = 2^ x \ and L = 8, which is a binary representation of an integer ranging from 0 to 255. Equation (9.34) can be expressed in the matrix form as f = C v (9.35) where C = ci,..,,ct, 0,...,0, 0,...,0 0, ...,0, c i , . . . , c i , 0, ...,0 0,...,0, 0,...,0, Substitute (9.35) to (9.33), we get Ci, ...,cL (9.36) E = i ( g - H C v ) r ( g - H C v ) + i A v T C T D T D C v = I ( g T g - 2 g T H C v + v T C T ( A D T D + H T H ) C v ) Minimizing E is equivalent to minimizing 1 E' = - ( v r C r ( A D T D + H T H ) C v - 2 g T H C v ) (9.37) Comparing (9.37) with (9.33), it can be seen that a Hopfield network can solve the restoration problem by setting its parameters as follows: (9.38) T = C T ( A D r D + H T H ) C e = g T H c The Hopfield network can solve any optimization problem whose cost function can be modified to the form shown in (9.32), which is the case for most image restoration problems. However, some practical problems arise when the Hopfield network is applied to image restoration. Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 118 The first problem is that an enormously large number of neurons are needed for image restoration even when the size of the picture is small. This is because that several neurons are needed to represent a pixel since each neuron of a Hopfield network has only binary output. Suppose eight neurons are used to represent a pixel, then for an n x n image, 8 x n2 neurons (e.g., if n=64, then 32,768 neurons), are needed, that is, the size of the Hopfield network is 64n4. The second problem is that neurons in the Hopfield network have to be updated sequentially, i.e., one at a time, to ensure the convergence of the network [33]. This is a serious restriction, since it nullifies the advantage of parallelism offered by neural networks. Moreover, the need for a huge number of neurons makes this problem much worse. Our network model has a size of n4 neurons, and, most importantly, is guaranteed to work in parallel. Some detailed comparison between our network model and the Hopfield network in solving optimization problem in general are available in [87]. Chapter 1 0 A P P L I C A T I O N S I N O T H E R A R E A S The purpose of this chapter is two-fold. First of all, we shall show that the applicability of our methodology as well as the LQT networks is not limited to the area of image processing. Secondly, we shall show that by using our methodology, better neural network implementations can be easily developed. To serve such purpose, we have chosen some algorithms which are used in areas other than image processing and have known neural network implementations developed by other researchers. 1 0 . 1 Sorting Sorting is one of the most important operations performed by computers. It is believed that 25%-50% of all the works performed by computers consist of sorting data [1]. In addition, there exist other techniques which are based on sorting, for instance, the order statistic filtering. Therefore, increasing the speed of sorting is of great importance and interest. Since the advent of digital computers, many sorting schemes have been proposed in the literature [1, 51, 61]. Some of the sorting schemes are executed in serial or parallel digital computers, while others are executed in special-purpose machines. For bounded fan-in logic circuits (networks of AND, OR, NOT gates), the minimum sorting time achievable is 0(nlog 2n) for serial sorting schemes, and 0(log2 n) for parallel sorting schemes (n here is the number of items to be sorted) [1]. For unbounded fan-in logic circuits, the minimum sorting time is 0(1); however, the circuit size has to be more than 119 Chapter 10. APPLICATIONS IN OTHER AREAS 120 polynomial [13, 22]. It is implicit in [13] that if threshold-logic gates (neurons) are used, the circuit size can be reduced to polynomial. It is certainly possible to achieve constant sorting time using a polynomial-size LQT network since it employs threshold-logic neurons as one of three kinds of basic elements. Here, we derive such a network for sorting integers, and by doing so, give exact size and depth. This network implements the enumeration sorting algorithm, which is the fastest sorting algorithm when implemented in parallel [1]. Given an sequence S = (xi, x 2 , x n ) where Xi is an integer, it can be sorted in two ways: (1) S is sorted to a sequence (2/1,2/2, •••,2/n), where ?/,• < yj iff i < j; and (2) iS" is sorted to a sequence (y1? y 2 , Vn), where j/; > yj iff i < j. Let us refer to the first sequence as an increasingly sorted sequence and the second an decreasingly sorted sequence. Without loss of generality, we shall only consider the decreasingly sorted sequence and denote it as S~. It can be easily shown that for a sequence S of distinct numbers, it has the following relationship with its decreasingly sorted sequence S~: yk = x; iff there are (k — 1) elements in S which is greater than x,-. To include cases where there are some elements in S which are equal, we introduce the following definition and theorem. D e f i n i t i o n 10 .1 x,- t> Xj means that X{ > XJ if i > j or xt- > Xj if i < j. Define Qi = {XJ : Xj > x,-, VXJ £ 5 - (x,)}, we have the following theorem. T h e o r e m 10 .1 Given a sequence S, if we arrange its elements to form an new sequence S' according to the following rule: p(xi) - k, ifl\Gi\ = k-1 where p(xi) means the position of xt- in S', then S' is a decreasingly sorted sequence of S. Chapter 10. APPLICATIONS IN OTHER AREAS 121 PROOF The validity of this theorem is obvious. I To derive a LQT network for sorting, we need to restate the enumeration sorting algorithm in a more mathematically rigorous way. This requires two special functions introduced in Chapter Eight; they are the comparison function and the rank function. The comparison function is defined as 0 xt- > Xj 0 Xi = Xj i > j 1 Xi = Xj i < j 1 Xi j C^X ^ ^ Xj^j — ^ (10.1) and the rank function is defined as (10.2) (10.3) rank(x,- : S) — k, if x,- is the k largest element in S According to Theorem 10.1, the following equation is true. n rank(Xi : S) = ^c (x , - ,Xj ) + 1 J=I Now the enumeration sorting algorithm can be rephased as follows: 1. Calculate c(x;, Xj) G I = {1,2,.., n}, but i ^ j ; 2. Calculate rank(xj : S) Vi € T', 3. Vfc € 2T, Vk = xi if rank(x8- : S) = k. We first derive network realizations for each step and them compose them together to form the whole network. For the case i > j , C^tCj^ X— * 0 XJ I X j 1 otherwise 1 Xi — Xj > 1 0 otherwise (10.4) Chapter 10. APPLICATIONS IN OTHER AREAS 122 The network realization of this function is C ^ X f ^ Xj ^ — 3C^y Xj | 1 j X ]^ For the case i < j, The network realization of this function is 0 3C j %C j 1 otherwise 1 Xj — X{ > 0 0 otherwise For the second step, the network realization of the rank function is rank(xj : S) = C C(XJ,, x i ) , C ( X J , Xj_x) , C(XJ, X J + I ) , c ( x , - , x n ) D _ 1 Substitute (10.5) and (10.7) to (10.8), we have rank(x,- : S) = C L Z X; , x i | — 1,1 Z I 1 , Z x ; , X j _ i | — 1,1 Z I 1 , LZ X i , x i + 1 | - 1,1 Z I 0 , L Z x,-, x n | - 1,1 Z I 0 ! ) " 1 For the third step, let z(i, k) = < Then it can be shown that Xj if rank(xj : S) — k 0 otherwise n Vk = k) t = i which has the following network realization (10.5) (10.6) (10.7) (10.8) (10.9) (10.10) (10.11) (10.12) yk = C z(l,k), z(2, k),z(n, k) D° (10.13) Chapter 10. APPLICATIONS IN OTHER AREAS 123 Let 1 if rank(x,- : S) = k 0 otherwise (10.14) then, X{ F{ — k 0 otherwise z(i, k) = < According to Example 7.11.11, the network realization of this function is (10.15) z{i,k) = C < Xi,Fi | l,b>c,Ft | l,a D° (10.16) where a < min{a:,- : i £ I and x t 6 X}, b > max{i,- : i 6 I and Xi € X} — a, and c = a + b. X is the set of all the possible values X{ may take. The network realization of Fi, according to Example 7.1.6, is Fi = C C ri Z l f c , C rt- | 1 , - 1 3 ° (10.17) where = rank(a;,- : 5). Substitute this equation to (10.11), after simplifying the resulting network realization, we have z(i,k) = C < x i , r Z r , - Z l A \ r Z r i Z ] f c + 1 \l,b,-b> c,nriZlk,\ZriZ2k+1 (10.lo) \l,a,-aD° Composing all the network realizations together and after some simplification, we have the network implementation of the enumeration sorting algorithm: yk = C fJ-l, /il,fc, Hl,k+l, 1*2, H2,ki M2,fc+1, Hn,Hn,k, Hn,k+\ | 1, «, —«, 1, «, — — , 1, —a Z>° { /Xi = < Xi,fiitk,fii,k+i I 1,6, > c {/*,-,/ = C c{xi,xx),...,c(xi,xi_l),c{xi,Xi+l),...,c(xi,xn) Zi1'1 ( 1 ° r z Xi,xj | - l , 1 z i 1 111,-,^ | - 1 , 1 D ° {c(x,-, Xj) = < Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 4 {c(xi,Xj) = < where i , j,k £ 1 and / G k + 1}. There are two special cases in the above network realization, namely, when k — 1 and k — n. When k — 1, = 1 Vi £ X. Therefore, the network realization of ?/i can be further simplified to V\ = C A»i,2» /*2, A*2,2, A»«, A*n,2 I l , - a , l , - a , 1 , - a D ~ n a {pi =< xt,mt2 | 1,-6 > a {fr,2 =• c(x,-, x i ) , C ( X J , x,_i), c(x,-, x i + i ) , c ( x , - , i n) Z)1 ( 1 0 . 2 0 ) LZ Xi,Xj | - 1,1 ZJ1 LZ_ Xj | 1^ 1 ZD where i,j £ J . When A; = n, / ^ j 7 l + 1 = 0 Vi G X. Therefore, the network realization of t / n can be further simplified to Vn = C / / l , / * l,n, /l2, /t2,n, -, /^n, N , n | 1, a, 1, O, 1, fl D ° {m =< Xi,ui>n | 1,6 > c {Pi,n =LZ c(xi, x j ) , c ( x i , x.-.i), c(x,-, x , - + 1 ) , c ( x j , xn) ( 1 0 . 2 1 ) Z^ ^-15 I I 5 1 3^ CZ 3?t? \ 1^ 1 ZD where i , j G X. For an input array of n elements, our network has a depth of 4 and a size of n(n2 + 6n — 5 ) . The computation time of our network is constant irrespective of the size of the input array. It is always four times the processing time of one single neuron. Our network for n — 3 is shown in Figure 1 0 . 1 , where the weights which are not shown have the value of 1. To our knowledge, the aforementioned network is the first published neural network which employs three kinds of neurons to implement the enumerate sorting algorithm [86]. {c(x,-,xj) = < Chapter 10. APPLICATIONS IN OTHER AREAS Figure 10.1: A Neural Network for Sorting an Input Array of Three Elements Chapter 10. APPLICATIONS IN OTHER AREAS 126 Later, Bruck and Siu propose another neural network model which has an architecture similar to that of ours [88]. It is composed of threshold-logic neurons and has a depth of 5. Note that the size of our sorting network is polynomial only with respect to the number of input elements. This is justifiable in applications where the value range of the input elements is fixed, for instance, in image processing, the value range of a pixel is restricted to [0,255]. 10.2 Communication Here, we chose an algorithm which is used in communication and has a well-known neural network implementation called Hamming network. 10.2.1 Improvement over Hamming Network In Chapter Two, we introduced the Hamming network which implements a minimum er-ror classification algorithm (referred to as Hamming Classification (HC) algorithm here). The Hamming network has some shortcomings: (1) the network output is not the ex-emplar; (2) the network has to iterate several times; (3) the matching scores have to be distinct; and (4) the output dynamic range of neurons has to be very large. Here, we derive a network which overcomes the above shortcomings. Assume there are m exemplars, and the size of the input pattern is n. Denote the input pattern as x, the output pattern as y, the i th exemplar as e,-, and the j th element of ei as e\. The HC algorithm is reformulated as follows: 1. Calculate d,- = \{n + x • e,) Vj G J = { 1 , 2 , m } ; 2. Find dk = max(<f,- : i € J); Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 7 3 . Assign j/, = t\ Vi 6 J = { 1 , 2 , n } . The matching score d,- can be expressed as di = \ S ' * i + \ (10-22) whose network realization is dj=Cx\ Sj D _ T ( 1 0 . 2 3 ) From the definition of the rank function, we know that dk = max(dj : j £ J) iff rank(<4 : D) = 1 Define A , -Zfc = < 1 if rank(4 : D) = 1 0 otherwise then it can be shown that (10.24) K = £ c i - * i (10-25) which has a network realization as Vi =C ^ i , 2 2 , 2 m | e\, e 2 , D ° (10.26) The network realization of zk is (recall the example in the previous section) zk =CC rank (4 : D) Z)\ C rank (4 : D) | 1, -1 D° (10.27) Since rank(cffc : D) > 1, C rank((4 : Z)) • 1= 1. According to Theorem 6.3, the above network can be simplified to zk =CC rank (4 : D) Z? | - 1 D~\ (10.28) Chapter 10. APPLICATIONS IN OTHER AREAS 128 The network realization of the rank function (see Section 10.1) is rank(d* : D) -C c(dk, dx),c(dk, dfc-i), c(dk, dk+i),.-, c(dk, dm) D 1 (10.29) where c(dk,dj) = < LZ dk,dj | - 1,1 D 1 if k > j \Zdk,dj | - 1 , 1 ZI0 if k<j Let ch =LZ dk,dj | - 1,1 ZZ\h Substituting (10.23) to (10.31), we have ch = ZZCx\ek D~n/\ Cx\ej D~n/2 \ - 1,1 = LZ x, x | — ek, ej ZJh = LZ x | (ej - ek) Z3h Substituting the above network realization to (10.29), we have rank(dfc : D) - CLZ x \ ekti Z21,LZ x \ efc,fc-i ZI 1 , LZ x | 4 , f c + 1 Z 0 , L Z x | ekim Z3°D~1 (10.30) (10.31) (10.32) (10.33) where ek>j = ej — ek. Substitute (10.33) to (10.28), and then substitute the resulting network realization to (10.26). After some simplification, we have the network realization of the HC algorithm as follows: {fik —LZ pk<i, ...,ft,it-i,ft,Hi, •••,Hk,m ZJ1 LZ x | ekj ZZ)1 if k > j < LZ x | 4 j ZJ° if k < j (10.34) Chapter 10. APPLICATIONS IN OTHER AREAS 129 Figure 10.2: The Network Model for Implementing the HC Algorithm where i £ {l,2,...,ra}, k,j £ {l,2,...,m}, and a,- = Z^LieJ- This network has a depth of 3 and a size of m2(n + 1) — m. It is shown in Figure 10.2 for the case m — n = 3. Our network overcomes the shortcomings of the Hamming network mentioned at the beginning of this section. In particular, since the processing time of the Hamming network is at least nine units of time, the processing speed of our network is at least three times as fast as that of the Hamming network. Furthermore, since our network is perfectly synchronized, input patterns can b e e f e d to the network in a pipeline fashion. The only disadvantage of our network is that the network size is bigger than that of the Hamming network (its network size is m(m + n)). Whether or not this is negligible or tolerable depends on the relative importance of the cost of hardwares for particular applications. Chapter 10. APPLICATIONS IN OTHER AREAS 130 10.3 Optimization Optimization has been an major field of application for neural networks since the pub-lication of Hopfield's papers [35, 36]. In Chapter Nine, we have derived some network models for solving image restoration problems, which are essentially optimization prob-lems. Here, we show the derivation of some network models for solving other optimization problems, and compare the performance of our networks with the Hopfield network which is a popular neural network used for optimization. 10.3.1 Solving Simultaneous Equations The matrix format of the simultaneous equations is expressed as A x = c (10.35) where A is n x m matrix, x is a m x 1 vector, and c is a n x 1 vector. The objective here is to find the vector x which satisfies equation (10.35). Let us define a computational energy function as follows £ = ( A x - c f ( A x - c ) (10.36) This energy function can be shown to be a convex function, therefore any local minimum is also the global minimum [14]. Since E attains value zero if and only if A x = c, the problem of solving the simultaneous equation is now transformed to the problem of minimizing E. If equation (10.35) has a solution, by minimizing E this solution can always be found. If equation (10.35) has no solution, by minimizing E a best fit solution in the L2 sense is found. To find the minimum of E, we use the method of steepest descend. Here the change of x is along the best searching direction, which is the opposite direction of the gradient Chapter 10. APPLICATIONS IN OTHER AREAS 131 of E. Denoting Vi£ as the gradient vector, we get VE = 2 A T A x - 2 A T c = 2 A T ( A x - c ) (10.37) Therefore, the increment of x is A x = -<5AT(Ax - c) (10.38) where 6 takes a small positive value. The optimum 6 is a function of x. Here for the sake of simplicity we choose 5 as a constant. We know that x(Jb + 1) = x(fc) + Ax(Jb) (10.39) This leads to x(fc + l) = (I — <SATA)x(fc) + 6A.Tc = Wx(Jfc) + t (10.40) The network realization of the above function is x(Jb + 1) =C x(Jb) | W D* (10.41) This network has a depth of 1 and a size of ra2. Simultaneous equations when transferred to a optimization problem can also be solved by the Hopfield network—for a detailed account of how the Hopfield network is used for solving optimization problems, please see Chapter Nine. The Hopfield network, or more precisely the discrete version of Hopfield network, can only take binary values. Assume that / neurons are used to encode one number, then the Hopfield network has a depth of 1 and a size of Z x m 2 . That is, it's network size is / times that of our network. Chapter 10. APPLICATIONS IN OTHER AREAS 132 Another problem with the Hopfield network is its oscillatory behaviour, i.e., wondering around a minimum. This phenomenon is due to the simultaneous updating of its neurons. To avoid such behaviour, one way is to update its neurons sequentially, and as a matter of fact, it is the only way to ensure the convergence of the network. However, by doing so, the advantage of parallelism is lost. Our network uses simultaneous updating and has no oscillatory behaviour. It always converges smoothly to the minimum if the search step is properly chosen. The precision of the solution obtained by using the Hopfield network is limited by the number of neurons used. In our network, the precision is proportional to the number of iterations. Some results of using our networks to solve simultaneous equations and some detailed comparison with the Hopfield network are available in [87]. 10.3.2 Matr ix Inversion In image processing and some other areas, matrix inversions are often required. Matrix inversion is a time-consuming task when performed using digital computers. Here, we derive a L Q T network for performing matrix inversion. The objective of finding an inversion of some matrix A is to find another matrix X such that Equation (10.42) is a collection of n simultaneous equations (assuming A is a n x n matrix) as follows: where X; and i; are the i th column vectors of X and I respectively. We know from the A X = 1 (10.42) Ax,- = U Vi € {l,2,...,n} (10.43) Chapter 10. APPLICATIONS IN OTHER AREAS 1 3 3 previous example that the network model for solving the simultaneous equations is x,.(Jfc + 1 ) = c xt- | W D * ' ( 1 0 . 4 4 ) where W = I — 6ATA, and t, = 6ATi,. Therefore, we can solve the matrix inversion by either using the above network n times, or using n such networks at the same time. Chapter 11 CONCLUSIONS 11.1 Summary In this thesis, a systematic method of neural network design, which we call algebraic derivation methodology, is proposed and developed. This work is motivated by the need for a systematic method with which neural networks can be constructed for computational purposes. The lack of such a method has limited the application of neural networks in many areas, particularly the area of image processing. The algebraic derivation methodology consists of the following stages: 1. Find the minimum set of neuron models for a given class of functions; 2. Devise symbolic representations for these neurons and their networks; 3. Establish theorems for manipulating these symbols based on the computational properties of the neurons and their networks; 4. Establish procedures for deriving neural networks from functions; 5. Use these procedures and theorems to derive and simplify network models for spec-ified functions. This methodology is then applied and developed with an emphasis on deriving networks to realize image processing techniques. It is shown informally in Chapter Four and then formally in Chapter Seven that any image processing technique can be realized by networks of LQT neurons. These neurons 134 Chapter 11. CONCLUSIONS 135 can be easily implemented in hardware, hence the implementation of their networks should be straightforward. Symbolic representations of LQT neurons and their networks (LQT networks) are then devised in Chapter Five. Such representations enable us to algebraically manipulate the process of network design so as to derive and simplify LQT networks. Theorems are established in Chapter Six for manipulating symbols mentioned above. These theorems are like theorems of Boolean algebra and used for network simplification. Procedures of deriving networks to realize both single-input and multiple-input func-tions are given in Chapter Seven. Their usages are demonstrated through some simple examples, which in turn are useful in deriving LQT networks in later chapters. To demonstrate the merits of the algebraic derivation methodology, networks for realizing some image enhancement and image restoration techniques, as well as techniques in some other areas are derived in Chapters Eight, Nine, and Ten respectively. Most of these networks are the first known neural network models; in the case there are other known network models, our networks have the same or better performance in terms of computation time. 11.2 Contributions The main contribution of this thesis is the development of the algebraic derivation methodology. Some important features of this methodology are as follows: 1) both the network parameters and the network architectures are derived; 2) neurons and their networks are represented in symbolic forms; and 3) these symbols are to be manipulated so as to yield proper network models. During the development of the algebraic derivation methodology, some specific con-tributions have been made. They are listed as follows. Chapter 11. CONCLUSIONS 136 Formal definitions of neurons and neural networks have been given. Networks so defined can simulate the computation process of any network composed of linear-sum neurons, such as the back-propagation network and the Hopfield network. Therefore, they provide a unified framework for studying the computational properties of neural networks, and may also serve as a model of general purpose neural computers. Symbols to represent LQT neurons and their networks have been devised. Such symbolic representations enable us to algebraically manipulate the process of network design so as to yield proper network models. A type of neural networks called LQT networks have been established with the ob-jective of realizing image processing techniques. These networks use the simplest neuron models, and thus should encounter minimum difficulty when implemented in hardware. The LQT networks are capable of realizing any image processing technique and some techniques in other areas. To simplify LQT networks, that is, to find networks with fewer layers and/or fewer neurons, the concept of network equivalence has been introduced, and theorems of net-work equivalence have been developed. Procedures for deriving neural networks to realize both single-input and multiple-input functions have been developed. They are effective in deriving LQT networks as shown in this thesis. Moreover, they are general procedures, not just restricted to the LQT networks. These theorems and procedures also provide a tool to analyze the computation com-plexity of neural networks. Consequently, not only we can evaluate different networks which realize the same function, but also we can make some concrete comparison between neural networks and other parallel computing machines. Network models have been derived for implementing some well-known algorithms in image processing and some other areas. For example, we have derived networks for Chapter 11. CONCLUSIONS 137 implementing the median filter and the Wiener filter, and a network for sorting. Most of the networks we have derived are the first known such neural network models; in the case there are other known network models, our networks have the same or better performance in terms of computation time. For example, the computation time of our sorting network is the same as that of the sorting network developed by Siu and Bruck [88]; the computation time of our network which implements the Hamming Classification algorithm is at least two times less than that of the Hamming network developed by Lippmann [58]. Some interesting questions are raised during the development of the algebraic deriva-tion methodology, one of which is as follows: what is the minimum number of weights a network must have in order to realize a given function? This is an important question with both theoretical and practical values. It is evident, (see [30] for example), that any problem solvable by digital computers can be solved by neural networks in a fixed amount of time irrespective of the problem size. However, the number of weights is pro-portional to the problem size. Consequently, we may encounter such a situation that a neural network may solve a given problem in, say, nanoseconds, but it may take millions of years to construct such a network. Therefore, it is important to know the minimum number of weights needed to solve a problem by neural networks. To our knowledge, this important question hasn't been seriously considered in the field of neural networks. 11.3 Future Work The algebraic derivation methodology is developed with an emphasis on deriving networks to realize image processing techniques. It is desirable to expand this methodology to form a unified theory of algebraic derivation. To do so, we may start with identifying basic functions used in many areas such that a minimum set of neuron models can be Chapter 11. CONCLUSIONS 138 constructed, in much the same way as in Boolean algebra where the basic functions are AND, OR, and NOT. Of course, the basic functions we are looking for are at a much higher computation level so as to build more efficient computing machines. As shown in Chapter Nine, some unconventional neuron models are more efficient for realizing certain functions. One of such neuron models is the so-called sigma-pi neuron (see [15]). Since it is not difficult to implement such neurons in hardware [64], there is no reason to exclude them. We may also need to search for other neuron models which are either computationally more efficient or easier to implement in hardware. That is, we look for the basic neuron models from both perspectives of pure computation and hardware implementation. This task is an essential part of the task mentioned in the previous paragraph. Another purpose of developing a theory of network design is to evaluate the efficiency of neural networks as computing machines. A solid theory is needed for us to make concrete comparisons between neural networks and other parallel computing machines. Ultimately, neural networks have to be implemented in hardware for any practical use. Since the neuron models used in our networks already have corresponding hard-ware implementations, it is reasonable to expect easy implementations of these networks. However, hardware problems, such as the effect of inaccuracy in setting up network pa-rameters and thermal noise in analog circuits, have to be taken into consideration. Some theoretical and experimental works are certainly needed. Fault tolerance is one of the characteristics of many existing neural network paradigms, and is considered a remedy to the inherent imperfection of analog implementations [64]. Fault tolerance is believed due to the distributed nature of rule representation in neu-ral networks and the redundancy of neurons. Our networks, however, are derived with an emphasis on minimizing the number of neurons, thus eliminating such redundancy. Hence, it is expected that our networks are less fault tolerant than networks having Chapter 11. CONCLUSIONS 139 component redundancy. This, however, may be solved by using some fault tolerance techniques such as those already developed for digital computers. No matter what the result will be, such an approach of explicitly building in fault tolerance ability may give us some hints to better understanding of the fault tolerance ability of both artificial and biological neural networks. In this thesis, we are only concerned with deriving networks for solving problems which have analytical solutions. There are also many interesting problems (e.g. pattern recognition problems) which only have partial solutions. How to utilize this partial knowledge is thus important. The underlining principle of our methodology may be applied to help create better learning schemes with which not only network parameters but also network architectures are adjusted in the learning process. Or we may derive the network architecture based on the characteristics of the problem, and then let the network adjust its parameters in the learning process. Appendix A T H E G E N E R A L I Z E D D E L T A R U L E The generalized delta rule is an iterative gradient search algorithm. The objective of the algorithm is to minimize the mean square error between the actual output of a back-propagation network and the desired output. Step 1 Initialize Weights and Thresholds: Set all weights and neuron thresholds to small random values. Step 2 Present Input and Desired Outputs: Present a continuous valued input vector Xo,Xi,...,xn_i and specify the desired out-put ao,di,...,dm-i- If the network is used as a classifier then all desired outputs are typically set to zero except for that corresponding to the class the input is from. The input could be new on each trial or samples from a training set could be presented cyclically until weights stabilize Step 3 Calculate Actual Outputs: Calculate outputs yo,yi,...,ym-i-Step 4 Adapt Weights: Use a recursive algorithm starting at the output neurons and working back to the first hidden layer. Adjust weights by Wji(h + 1) = Wji(h) + t]SjXi (A.l) 140 Appendix A. THE GENERALIZED DELTA RULE 141 In this equation Wji(h) is the weight from the i th hidden neuron or from the i th element of the input to the jth neuron at time h, X{ is either the output of the i th neuron or the i th element of the input , rj is a gain term, and 8j is an error term for the j " 1 neuron. If the j th neuron is an output neuron, then = VA1 ~ Vj)(dJ ~ Vi) (A-2) where dj is the desired output of the j th neuron and yj is the actual output. If the j th neuron is an hidden neuron, then Sj = Xj(l - Xj) ^2 SuWkj (A.3) fc where k is over a l l neurons in the layers above neuron j. Thresholds are adapted in a similar manner by assuming they are connection weights on links from auxiliary constant-valued inputs. Convergence is sometime faster if a momentum term is added and weight changes are smoothed by Wji(h + 1) = wji(h) + rjSjXi + \(wji(h) - w{j(h - 1)) (A.4) where 0 < A < 1. Step 5 Repeat by Going to Step 2. Appendix B H O P F I E L D N E T W O R K A L G O R I T H M The Hopfield network algorithm is described as follows. Step 1 Assign Connection Weights Wij - V«,i€{l,2 , . . ,n} (B.l) 0 * =j where m is the total number of exemplars and n is the total number of neurons in the network. In this formula Wij is the connection weight from the jth neuron to the i th neuron; and x* £ {0,1} is the i th element of the exemplar for class s. Step 2 Initialize with the Input Pattern yi(0) = xi V t € { l , 2 , . . . , n } (B.2) In this formula yi(k) is the output of the i th neuron at time k and X{ £ {0,1} is the i th element of the input pattern. Step 3 Iterate Until Convergence + = E - = i U W ; ( * ) V i e { l,2 , . . , n } Vi(k + l) = fT{vi(k+l)) V i€ { l,2 , . . . , n } where function fx is the activation function of the threshold-logic neuron. The process is repeated until the network output remains unchanged with further itera-tions. The network output then represents the exemplar pattern that best matches the input pattern. 142 Bibliography Akl, S.G., Parallel Sorting Algorithms, Vol. 12 in Werner Rheinboldt (ed.), Notes and Reports in Computer Science and Applied Mathematics, Orlando: Academic Press, 1985. Albert, A. Regression and the Moore-Penrose Pseudoinverse, Academic Press: New York, 1972. Aleksander, I. (ed.), Neural Computing Architectures: The Design of Brain-like Machines, The MIT Press: Cambridge, Massachusetts, 1989. Aleksander, I., Thomas, W.V., and Bowden, P.A., "WISARD, A Radical Step Forward in Image Recognition," Sensor Review, Vol. 4, No. 3, pp. 120-124, 1984. Anderson, J.A., "A Simple Neural Network Generating An Interactive Memory," Mathematical Biosciences, Vol. 14, pp. 197-220, 1972. Andrews, H.C. and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, NJ: Prentice-Hall, 1977. Arbib, M.A. , Brains, machines, and mathematics, 1987. Arce, G.R. and M.P. McLoughlin, "Theoretical Analysis of the Max/Median Fil-ter," IEEE Trans, on Acoust., Speech, Signal Processing, Vol. ASSP-34, No. 1, pp. 60-69, Jan. 1987. Barto, A.G. , R.S. Sutton, and C.W. Anderson, "Neuronlike Adaptive Elements That Solve Difficult Learning Control Problems," IEEE Trans, on Systems, Man, and Cybernetics, Vol. SMC-13, No. 5, Sept./Oct., 1983. Baum, E.B., J. Moody, and F. Wilczek, "Internal Representations for Associative Memory," NSF-ITP-86-138 Institute for Theoretical Physics, University of Califor-nia, Santa Barbara, California, 1986. Bovik, A.C. , T.S. Huang, and D.C. Munson, Jr., "A generalization of median fil-tering using linear combinations of order statistics," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-31, pp. 1342-1350, Dec. 1983. Carpenter, G.A. and S. Grossberg, "The ART of Adaptive Pattern Recognition by A Self-Organizing Neural Network," IEEE Computer, March 1988, pp. 77-88. 143 Bibliography 144 Chandra, A.K. , L. Stockmeyer, and U. Vishkin, "Constant Depth Reducibility," Siam J. Comput., Vol. 13, pp. 423-439, 1984. Cooper, L. and D. Steinberg, Introduction to Methods of Optimization, Saunders: Philadelphia, 1970. Durbin, R. and D.E. Rumelhart, "Product Units: a Computationally Powerful and Biologically Plausible Extension to Back-propagation Networks," Neural Compu-tation, Vol. 1, 1989, pp. 133-142. Eberhart, Russell C , "Standardization of Neural Network Terminology," Neural Networks, Vol. 1, No. 2, June, 1990, pp. 244-245. Elman, J.L., "Learning the Hidden Structure of Speech," Institute for Cognitive Science, University of California at San Diego, ICS Report 8701, Feb., 1987. Feldman J.A. and D.H. Ballard, "Connectionist Models and Their Properties," Cognitive Science, Vol. 6, pp. 205-254, 1982. Fox, G.C. and P.C. Messina, "Advanced Computer Architectures," IEEE Comput-ers Magazine, pp. 67-74, 1989. Frei, W., "Image Enhancement by Histogram Hyperbolization," Computer Graph-ics and Image Processing, Vol. 6, pp. 286-294, 1977. Fukushima, K. and S. Miyake, "Neocognitron: A New Algorithm For Pattern Recognition Tolerant of Deformations And Shifts In Position," Pattern Recogni-tion, Vol. 15, No. 6, pp. 455-469, 1982. Furst M . , J.B. Saxe and M . Sipser, "Parity, Circuits and the Polynomial-time Hierarchy," Proc. 22nd IEEE Symposium on Foundations of Computer Science, 1981, pp. 260-270. Gallager, R.G., Information Theory and Reliable Communication, John Wiley & Sons, New York, 1968. Gevins, A.S. and N.H. Morgan, "Applications of Neural-Network (NN) Signal Pro-cessing in Brain Research," IEEE Trans, on Acoustic Signal Speach Processing, Vol. 36, No. 7, July, 1988. Goles Eric and Servet Martinez, Neural and Automata Networks: Dynamical Be-haviour and Applications, Kluwear Academic Publishers: Boston, 1990. Gonzalez, R.C. and P. Wintz, Digital Image Processing, Reading, MA:Addison-Wesley, 1977. Bibliography 145 Grossberg, S., The Adaptive Brain I: Cognition, Learning, Reinforcement, and Rhythm, and The Adaptive Brain II: Vision, Speech, Language, and Motor Control, Elsevier/North-Holland, Amsterdam, 1986. Hagiwara, M . , "Accelerated Back Propagation Using Unlearning Based on Hebb Rule," in Proc. of IJCNN'90, 1-617-620, Jan., 1990. Hebb, D.O., The Organization of Behavior, John Wiley &; Sons: New York, 1949. Hecht-Nielsen, R., "Theory of the Back-propagation Neural Networks," In Proc. of the Intl. Joint Conf. on Neural Networks, Washington, D.C., New York: IEEE Press, pp. 1593-606, 1989. Hecht-Nielsen, R., "Counter-Propagation Networks," Proc. of IEEE First Interna-tional Conference on Neural Networks, New York, 1987. Helstrom, C.W., "Image Restoration by the Method of Least Squares," J. Opt. Soc. Amer., Vol. 57, Mar. 1967, pp. 297-303. Hopfield, J.J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Natl. Acad. Sci. USA, Vol. 79, 2554-2558, April 1982. Hopfield, J.J., "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons", Proc. Natl. Acad. Sci. USA, Vol. 81, 3088-3092, May 1984 Hopfield, J.J., and D.W. Tank, "Computing with Neural Circuits: A Model," Sci-ence, Vol. 232, 625-633, August 1986. Hopfield, J.J., and D.W. Tank, "Simple 'Neural' Optimization Networks: An A/D Convert, Signal Decision Circuit, and a Linear Programming Circuit," IEEE Trans, on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541, May 1986. Hopfield, J.J. and D.W. Tank, "Neural Computation of Decisions in Optimization Problems," Biol. Cybern. Vol. 52, pp. 141-152, 1985. Huang, T.S. (Ed.), Two-Dimensional Digital Signal Processing II: Transforms and Median Filters, New York: Springer-Verlag, 1981. Huang, T.S., W.F. Schreiber, and O.J. Tretiak, "Image Processing," Proc. IEEE Vol. 59, pp. 1586-1609, 1971. Huang, T.S., D.A. Barker and S.P. Berger, "Iterative Image Restoration," Applied Optics, Vol. 14, No. 5, pp. 1165-1168, May, 1975. Bibliography 146 [41] Hummel, R .A . , "Image Enhancement By Histogram Transformation," Comput. Graph. Image Pore. Vol. 6, pp. 184-195, 1977. [42] Hunt, B.R. , "Digital Image Processing," I E E E Proc , Vol. 63, pp. 693-708, Apri l , 1975. [43] Irie, B. and S. Miyake, "Capabilities of Three-layered Perceptron," in Proc. of IJCNN'89, pp. 1-641-648, June, 1989. [44] Izui, Y . and A . Pentland, "Speed Up Back Propagation," in Proc. of IJCNN'90, pp. 1-639-642, Jan. 1990. [45] Jain, A . K . , Fundamentals of Digital Image Processing, Prentice-Hall: New Jersey, 1988. [46] Jayant, N.S., "Average and median-based smoothing techniques for improving speech quality in the presence of transmission errors," I E E E Trans. Commun., Vol. Com-24, pp. 1043-1045, Sept. 1976. [47] Judd, J.S., "Learning in Networks Is Hard," Proc. of the First Intl. Conf. on Neural Networks, pp. 685-692, IEEE, San Diego, California, June, 1987. [48] Judd, J.S., Neural Network Design and the Complexity of Learning, The MIT Press, Cambridge, Massachussetts, 1990. [49] Kandel, E.R., and J . H . Schwartz, Principles of Neural Science, Elsevier, New York, 1985. [50] Klassen, M.S. and Y . H . Pao, "Characteristics of the Functional-link Net: A Higher Order Delta Rule Net," Proc. of IEEE Second Annual International Conference on Neural Networks, June San Diago, C A , 1988. [51] Knuth, D.E . , The Art of Computer Programming: Sorting and Searching, Vol. 3, Reading, M A : Addison-Wesley, 1973. [52] Kohonen, T., "Self-organized Formation of Topologically Correct Feature Maps," Biological Cybernetics, Vol. 43, pp. 59-69, 1982. [53] Kohonen, T., "Correlation Matrix Memories," IEEE Trans, on Computers, C-21, pp. 353-359, 1972. [54] Kosko, B. , "Adaptive Bidirectional Associative Momories," Applied Optics, Vol. 26, pp. 4947-4960, 1987. [55] Kreins, E.R. and L . J . Allison, "Color Enhancement of Nimbus High Resolution Infrared Radiometer Data," Appl . Opt. Vol . 9, pp. 681, March 1970. Bibliography 147 [56] Lee, Y . H . and A.T. Fam, "An edge gradient enhancing adaptive order statistic filter," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-35, No. 5, pp. 680-695, May 1987. [57] Lewis, P.M. II, and C.L. Coates, Threshold Logic, John Wiley & Sons: New York, 1967. [58] Lippmann, R.P., "An Introduction to Computing with Neural nets," IEEE ASSP Magazine pp. 4-22, April 1987. [59] Lippmann, R.P., B. Gold, and M.L. Malpass, "A Comparison of Hamming and Hopfield Neural Nets for Pattern Classification," MIT Lincoln Laboratory Technical Report, TR-769, 1987. [60] Lippmann, R.P., "Pattern Classification Using Neural Networks," IEEE Commu-nications Magazine, Vol. 27, No. 11, pp. 47-64, Nov., 1989. [61] Lorin, H. , Sorting and Sort System, Reading, MA: Addison-Wesley, 1975. [62] Matthews, M.B., E.S. Moschytz, "Neural Network Nonlinear Adaptive Filtering Using the Exteneded Kalman Filter Algorithm," in Proc. of the Intl. Neural Net-work Conf., July 9-13, 1990, Paris, pp. 115-118. [63] McCulloch, W.S. and W. Pitts, "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics, 5, pp. 115-133, 1943. [64] Mead, C.A., Analog VLSI and Neural Systems, Addison-Wesley Publishing Com-pany, 1989. [65] Minsky, M . and S. Papert, Perceptron: An Introduction to Computational Geome-try, Expanded Edition, The MIT Press, 1988. [66] Mota-oka, T. and M . Kitsuregawa, The Fifth Generation Computer: The Japanese Challenge, John Wiley & Sons, 1985. [67] Nakagawa, Y. and A. Rosenfeld, "A note on the use of local min and max operations in digital picture processing," IEEE Trans. Sys., Man, and Cybern. Vol. SMC-8, No. 8, pp. 632-635, Aug. 1978. [68] Narendra, P.M. "A separable median filter for image noise smoothing," IEEE Trans. Pattern Anal. Mech. Intell., Vol. PAMI-3, No. 1, pp. 20-29, Jan. 1981. [69] Owens, A.J . and D.L. Filkin, "Efficient Training of the Back-propagation Network by Solving a system of Stiff Ordinary Differential Equations," in Proc. of IJCNN'89, pp. 77-381-386, June, 1989. Bibliography 148 Pao, Y . H . , Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, 1989. Parten, C.R., R .M. Pap, and C. Thomas, "Neurocontrol Applied to Telerobotics for the Space Shuttle," in Proc. of the Intl. Neural Network Conf., July 9-13, 1990, Paris,pp. 229-236. Peeling, S.M., R.K. Moor, and M.J. Tomlinson, "The Multi-Layer Perceptron as a Tool for Speech Pattern Processing Research," in Proc. IoA Autumn Conf. on Speech and Hearing, 1986. Poggio, T., V. Torre, and C. Koch, "Computational vision and regularization the-ory," Nature, Vol. 317, pp. 314-319, Sept. 1985. Posch, T.E., "Models of the Generation and Processing of Signals by Nerve Cells: A Categorically Indexed Abridged Bibliography," USCEE Report 290, August 1968. Powell, P.G. and B.E. Bayer, "A Method for the Digital Enhancement of Unsharp Grainy Photographic Images," Proc. Int. Conf. Electronic Image Processing, IEEE, UK, pp. 197-183, July 1982. Pratt, W.K., "Median Filtering," in Semi-annual Report, Image Processing Insti-tute, Univ. of Southern California, Sept. 1975, pp. 116-123. Rabiner, L.R., M.R. Sambur, and C.E. Schmidt, "Applications of a nonlinear smoothing algorithm to speech processing," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, No. 6, pp. 55-557, Dec. 1975. Rosenblatt, R., Principles of Neurodynamics, New York, Spartan Books, 1959. Rosenblatt, R., "The Perceptron: A Probabilistic Model for Information Storage and Organization In The Brain," Psychological Review 65, pp. 386-408, 1957. Rumelhart, D.E., G.E. Hinton, and R.J. Williams, "Learning Internal Representa-tions by Error Propagation," in D.E. Rumelhart &; J.L. McClelland (Eds.), Paral-lel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Fundations, MIT Press, 1986. Sawai, H. , A. Waibel, P. Haffner, M . Miyatake, and K. Shikano, "Paral-lelism, Hierarchy, Scaling in Time-delay Neural Networks for Spotting Japanese Phonemes/CV-syllables," in Proc. of IJCNN'89, pp. 77-81-88, June, 1989. Sejnowski, T., and C R . Rosenberg, "NETtalk: A Parallel Network That Learns to Read Aloud," Johns Hopkins Univ. Technical Report JHU/EECS-86/01, 1986. Bibliography 149 [83] Shi, Pingnan and Rabab K. Ward, "Using the Hopfield neural network to enhance binary noisy images," Proceedings of the Canadian Conference on Electrical and Computer Engineering, Vancouver, B.C. Canada, Nov. 3-4, 1988, pp. 760-763. [84] Shi, Pingnan and Rabab K. Ward, "Using the perceptron to enhance binary noisy images," presented at the SCS Western Multiconference, San Diego, California, January 4-6, 1989. [85] Shi, Pingnan and Rabab K. Ward, "A neural network implementation of the median filter," Proceedings of 1989 IEEE Pacific Rim Conference on Communications, Victoria, B.C., Canada, June 1-2, 1989, pp. 513-516. [86] Shi, Pingnan and Rabab K. Ward, "A neural network structure for sorting non-negative integers in fixed time," Proceedings of 1989 Canadian Conference on Elec-trical and Computer Engineering, Montreal, Canada, Sept. 17-20, 1989, pp. 420-423. [87] Shi, Pingnan and Rabab K. Ward, "The case for abandoning the biological resem-blance restriction: An example of neural network solution of simultaneous equa-tions," Proceedings of 1990 International Joint Conference on Neural Networks, San. Diego, California, June 17-21, 1990, pp. I l l 875-882. [88] Siu, Kai-Yeung, and Jehoshua Bruck, "Neural Computation of Arithmetic Func-tions," IEEE Proceedings, Vol. 78, No. 10, Oct. 1990, pp. 1669-1675. [89] Takeda M . and J.W. Goodman, "Neural Networks for Computation: Number Representations and Programming Complexity," Applied Optics, Vol. 25, No. 18, pp. 3033-3046, 15 Sept. 1986. [90] Tukey, J.W., Exploratory Data Analysis, Addison-Wesley: Reading, Massachusetts, 1971. [91] Van Trees, H.L., Detection, Estimation, and Modulation Theory, Vol. I, John Wiley & Sons: New York, 1967. [92] Waibel, A., T. Hanazawa, G. Hinton, K. Shikano, and K. Larg, "Phoneme Recogni-tion Using Time-delay Neural Networks," ATR Internal Report TR-1-0006, Oct.30, 1987. [93] Walsh, G.R., Methods of Optimization, John Wiley & Sons: New York, 1975. [94] Wasserman, P.D., Neural Computing: Theory and Practice, Van Nostrand Rein-hold: New York, 1989. Bibliography 150 [95] Widrow, B. and M.E. Hoff, "Adaptive Switching Circuits", 1960 IRE WESCON Conv. Record, Part 4, 96-104, August 1960. [96] Widrow, B. and R. Winter, "Neural Nets for Adaptive Filtering and Adaptive Pattern Recognition," IEEE Computer Magazine, pp. 25-39, March, 1988. [97] Widrow, B. S.D. Stearns, Adaptive Singal Processing, Prentice-Hall: New Jersey, 1985. [98] Woods, R.E. and R.C. Gonzalez, "Real Time Digital Image Enhancement," Proc. IEEE 69, pp. 643-644, 1981. [99] Zhou, Y.T., R. Chellappa, and et al., "Image Restoration Using a Neural Network," IEEE Trans, on Acoust., Speech, and Sig. Proc, pp. 1141-1151, Vol. 36, No. 7,1988.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Algebraic derivation of neural networks and its applications...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Algebraic derivation of neural networks and its applications in image processing Shi, Pingnan 1991
pdf
Page Metadata
Item Metadata
Title | Algebraic derivation of neural networks and its applications in image processing |
Creator |
Shi, Pingnan |
Publisher | University of British Columbia |
Date Issued | 1991 |
Description | Artificial neural networks are systems composed of interconnected simple computing units known as artificial neurons which simulate some properties of their biological counterparts. They have been developed and studied for understanding how brains function, and for computational purposes. In order to use a neural network for computation, the network has to be designed in such a way that it performs a useful function. Currently, the most popular method of designing a network to perform a function is to adjust the parameters of a specified network until the network approximates the input-output behaviour of the function. Although some analytical knowledge about the function is sometimes available or obtainable, it is usually not used. Some neural network paradigms exist where such knowledge is utilized; however, there is no systematical method to do so. The objective of this research is to develop such a method. A systematic method of neural network design, which we call algebraic derivation methodology, is proposed and developed in this thesis. It is developed with an emphasis on designing neural networks to implement image processing algorithms. A key feature of this methodology is that neurons and neural networks are represented symbolically such that a network can be algebraically derived from a given function and the resulting network can be simplified. By simplification we mean finding an equivalent network (i.e., performing the same function) with fewer layers and fewer neurons. A type of neural networks, which we call LQT networks, are chosen for implementing image processing algorithms. Theorems for simplifying such networks are developed. Procedures for deriving such networks to realize both single-input and multiple-input functions are given. To show the merits of the algebraic derivation methodology, LQT networks for implementing some well-known algorithms in image processing and some other areas are developed by using the above mentioned theorems and procedures. Most of these networks are the first known such neural network models; in the case there are other known network models, our networks have the same or better performance in terms of computation time. |
Subject |
Neural networks (Computer science) -- Design and construction Image processing -- Mathematical models |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-02-18 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0100784 |
URI | http://hdl.handle.net/2429/31511 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1991_A1 S44.pdf [ 6.9MB ]
- Metadata
- JSON: 831-1.0100784.json
- JSON-LD: 831-1.0100784-ld.json
- RDF/XML (Pretty): 831-1.0100784-rdf.xml
- RDF/JSON: 831-1.0100784-rdf.json
- Turtle: 831-1.0100784-turtle.txt
- N-Triples: 831-1.0100784-rdf-ntriples.txt
- Original Record: 831-1.0100784-source.json
- Full Text
- 831-1.0100784-fulltext.txt
- Citation
- 831-1.0100784.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0100784/manifest