"Applied Science, Faculty of"@en . "Electrical and Computer Engineering, Department of"@en . "DSpace"@en . "UBCV"@en . "Shi, Pingnan"@en . "2011-02-18T18:55:42Z"@en . "1991"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "Artificial neural networks are systems composed of interconnected simple computing units known as artificial neurons which simulate some properties of their biological counterparts.\r\nThey have been developed and studied for understanding how brains function, and for computational purposes.\r\nIn order to use a neural network for computation, the network has to be designed in such a way that it performs a useful function. Currently, the most popular method of designing\r\na network to perform a function is to adjust the parameters of a specified network until the network approximates the input-output behaviour of the function. Although some analytical knowledge about the function is sometimes available or obtainable, it is usually not used. Some neural network paradigms exist where such knowledge is utilized; however, there is no systematical method to do so. The objective of this research is to develop such a method.\r\nA systematic method of neural network design, which we call algebraic derivation methodology, is proposed and developed in this thesis. It is developed with an emphasis on designing neural networks to implement image processing algorithms. A key feature of this methodology is that neurons and neural networks are represented symbolically such that a network can be algebraically derived from a given function and the resulting network can be simplified. By simplification we mean finding an equivalent network (i.e., performing the same function) with fewer layers and fewer neurons. A type of neural networks, which we call LQT networks, are chosen for implementing image processing algorithms.\r\nTheorems for simplifying such networks are developed. Procedures for deriving such networks to realize both single-input and multiple-input functions are given.\r\nTo show the merits of the algebraic derivation methodology, LQT networks for implementing\r\nsome well-known algorithms in image processing and some other areas are developed by using the above mentioned theorems and procedures. Most of these networks\r\nare the first known such neural network models; in the case there are other known network models, our networks have the same or better performance in terms of computation\r\ntime."@en . "https://circle.library.ubc.ca/rest/handle/2429/31511?expand=metadata"@en . "A L G E B R A I C D E R I V A T I O N O F N E U R A L N E T W O R K S A N D ITS A P P L I C A T I O N S I N I M A G E P R O C E S S I N G by PINGNAN SHI B. A. Sc. (Electrical Engineering), Chongqing University, 1982 A. Sc. (Electrical Engineering), The University of British Columbia, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES THE DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1991 \u00C2\u00A9 Pingnan Shi, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Artificial neural networks are systems composed of interconnected simple computing units known as artificial neurons which simulate some properties of their biological counter-parts. They have been developed and studied for understanding how brains function, and for computational purposes. In order to use a neural network for computation, the network has to be designed in such a way that it performs a useful function. Currently, the most popular method of de-signing a network to perform a function is to adjust the parameters of a specified network until the network approximates the input-output behaviour of the function. Although some analytical knowledge about the function is sometimes available or obtainable, it is usually not used. Some neural network paradigms exist where such knowledge is utilized; however, there is no systematical method to do so. The objective of this research is to develop such a method. A systematic method of neural network design, which we call algebraic derivation methodology, is proposed and developed in this thesis. It is developed with an emphasis on designing neural networks to implement image processing algorithms. A key feature of this methodology is that neurons and neural networks are represented symbolically such that a network can be algebraically derived from a given function and the resulting network can be simplified. By simplification we mean finding an equivalent network (i.e., performing the same function) with fewer layers and fewer neurons. A type of neural networks, which we call LQT networks, are chosen for implementing image processing al-gorithms. Theorems for simplifying such networks are developed. Procedures for deriving such networks to realize both single-input and multiple-input functions are given. To show the merits of the algebraic derivation methodology, LQT networks for im-plementing some well-known algorithms in image processing and some other areas are developed by using the above mentioned theorems and procedures. Most of these net-works are the first known such neural network models; in the case there are other known network models, our networks have the same or better performance in terms of compu-tation time. i n Table of Contents Abstract ii List of Tables vii List of Figures viii Acknowledgement x A Glossary of Symbols xi 1 I N T R O D U C T I O N 1 2 B A C K G R O U N D 6 2.1 Artificial Neural Networks 6 2.1.1 Neuron Models 8 2.1.2 Network Models 12 2.2 Image Processing 17 2.2.1 Image Enhancement 17 2.2.2 Image Restoration 26 3 D E F I N I T I O N S A N D N O T A T I O N S 32 4 R E A S O N S F O R A L G E B R A I C D E R I V A T I O N 40 4.1 Drawbacks of the Learning Approach 41 ' 4.2 The Advantages of The Analytical Approach 46 iv 4.2.1 Designing the Hamming Network 47 4.2.2 Designing the Parity Network 48 4.3 The Algebraic Derivation Methodology 50 5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T -W O R K S 52 5.1 Neuron Models 52 5.2 LQT Networks 55 6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S 59 6.1 Network Equivalence 59 6.2 Network Quality 64 6.2.1 Network Depth 64 6.2.2 Network Size 65 6.3 Criterion for Network Simplification 66 7 D E R I V A T I O N P R O C E D U R E S 67 7.1 Realization of SISO Functions 68 7.2 Realization of MISO Functions 80 8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S 88 8.1 Network Realizations of Linear Techniques 88 8.2 Network Realizations of Non-linear Filters 93 8.2.1 Dynamic Range Modification 93 8.2.2 Order Statistic Filtering . 95 8.2.3 Directional Filtering 106 9 N E T W O R K R E A L I Z A T I O N S O F IR T E C H N I Q U E S 110 v 9.1 Network Realizations of Linear Filters 110 9.2 Network Realizations of Non-linear Filters 112 9.3 Comparison with the Hopfield Network 116 10 A P P L I C A T I O N S I N O T H E R A R E A S 119 10.1 Sorting 119 10.2 Communication 126 10.2.1 Improvement over Hamming Network 126 10.3 Optimization 130 10.3.1 Solving Simultaneous Equations 130 10.3.2 Matrix Inversion 132 11 C O N C L U S I O N S 134 11.1 Summary 134 11.2 Contributions 135 11.3 Future Work 137 Appendices 140 A T H E G E N E R A L I Z E D D E L T A R U L E 140 B H O P F I E L D N E T W O R K A L G O R I T H M 142 Bibliography 143 v i List of Tables 4.1 The Exclusive OR 44 4.2 Network Parameters For Solving XOR Problem 44 4.3 Network Parameters For Solving Parity-3 Problem 45 4.4 Network Parameters For Solving Parity-4 Problem 45 4.5 Number of Learning Steps 46 vii List of Figures 2.1 A Biological Neuron 9 2.2 An Artificial Neuron 9 2.3 Typical Activation Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-logic; and (d) Sigmoid 10 2.4 A Three-layer Back-propagation Network 13 2.5 The Hopfield Network 14 2.6 The Hamming Network 15 2.7 Directional Smoothing Filter 21 2.8 Color Image Enhancement 26 3.1 The Schematical Representation of a Neuron 33 3.2 The Schematical Representation of the Input Neuron 35 3.3 Three Ways of Setting Up an Input Neuron 35 3.4 The Back-propagation Network 36 3.5 The Hopfield Network 36 3.6 The Hamming Network 37 4.1 Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem 43 4.2 Realizing Function (4.4) 49 4.3 Realizing Function (4.5) 49 4.4 The Network for Solving the Parity Problem 50 viii 5.1 The Contrast Stretching Function 53 5.2 Typical LQT Network Topologies 55 5.3 Representing Part of a Network 56 5.4 A LQT network 58 8.1 Network Architecture for Linear Filtering 89 8.2 Network Realization of the Order Statistic Function for n = 3 99 8.3 The Schematical Representation of OSnet 99 8.4 The Schematical Representation of Adaptive OSnet 100 8.5 A Max/Median Filter Realized by Using OSNets 101 8.6 A Max/median Filter Realized by Using OSNets 102 8.7 A Network Model of the CS Filter 106 8.8 A Network Realization of the DA Filter for the Case iV = 3 109 9.1 A Network Realization of the Maximum Entropy Filter 114 10.1 A Neural Network for Sorting an Input Array of Three Elements 125 10.2 The Network Model for Implementing the HC Algorithm 129 ix Acknowledgement First of all, I would like to acknowledge the indispensable support of my wife, Angela Hui Peng. She has maintained an environment under which I have been able to devote all my energy to this thesis. Moreover, she did all the drawings in this thesis, which are far better than I could have done. I would like to thank my supervisor, Professor Rabab K. Ward for her guidance and support. I am grateful to her for introducing me to this challenging and fascinating world of artificial neural networks, and providing me the opportunity to explore it. Special thanks are due to Professor Peter Lawrence for his kindness and recommen-dation of a book written by P.M. Lewis, which is invaluable for my research. Thanks are also due to Professor Geoffery Hoffmann, whose courses on neural net-works and immune systems gave me fundamental understandings of the two related fields, and inspiration to continue my research on the former one. Finally, I would like to thank John Ip, Doris Metcalf, Kevin O'Donnell, Robert Ross, Qiaobing Xie, Yanan Yin, and many others in this and other departments who have made my study here pleasant and productive. x A Glossary of Symbols Notation C x1,...,xn | w1,...,w2 3 V 3 u,n 0 < Xx,...,Xn | ltf!,...,\u00C2\u00AB;2 > AxB (Z X i , . . . , X n | 10!,...,102 \u00E2\u0080\u00A2 {| X U . . . , X n | W1,...,W2 |} o I Explanation Membership relation \"belong to\" and its negative \"does not belong to\" Strict containment relations Linear neuron For all There exists one element Union, intersection Empty set Besides the usual \"minus\" operator, it also denotes set difference Quasi-linear neuron Cartesian product of sets A and B Threshold-logic neuron Either one of the three neuron types / is a function form set X to y Composition of two functions Sets of natural numbers, integers, real numbers, positive real numbers including zero, and binary numbers. Cardinality of the set X, length of the sequence S, absolute value of the relative number x. End of proof xi I End of solution Implication operators A , V , 0 Logic AND, OR, XOR x, x Vector ||x|| L2 norm of vector x X Matrix 1,1 Vector of one's 0,0 Vector of zero's 0,1 Zero or unit matrices {} Denote a set; various forms are {x^, {x\P(x)}, and { x i , x n j , etc. f, g Original and distorted one-dimensional images U, V Original and distorted two-dimensional images W Window for processing images xn Chapter 1 I N T R O D U C T I O N In recent years artificial neural networks have been successfully applied to solve prob-lems in many areas, such as pattern recognition/classification [60, 70], signal processing [24, 62], control [71, 9], and optimization [35, 37]. The essence \"of neural network problem solving is that the network performs a function which is the solution to the problem. Currently, the most popular method of designing a network to perform a function is through a learning process, in which parameters of the network are adjusted until the network approximates the input-output behaviour of the function. Although some ana-lytical knowledge about the function is sometimes available or obtainable, it is usually not used in the learning process. There are some neural network paradigms where such knowledge is utilized; however, there is no systematic method to do so. It is the objective of this thesis to develop such a method. Many tasks, such as those in image processing, require high computational speed. Examples are real-time TV signal processing and target recognition. Conventional von-Neumann computing machines are not adequate to meet the ever increasing demand for high computational speed [66]. Among non-conventional computing machines, artificial neural networks provide an alternative means to satisfy this demand [19]. The study of artificial neural networks has been inspired by the capability of the nervous system and motivated by the intellectual curiosity of human beings to understand themselves. The nervous system contains billions of neurons which are organized in networks. 1 Chapter 1. INTRODUCTION 2 A neuron receives stimuli from its neighboring neurons and sends out signals if the cu-mulative effect of the stimuli is strong enough. Although a great deal about individual neurons is known, little is known about the networks they form. In order to under-stand the nervous system, various network models, known as artificial neural networks, have been proposed and studied [78, 52, 33, 12, 21]. These networks are composed of simple computing units, referred to as artificial neurons since they approximate their biological counterparts. Although these networks are extremely simple compared with the complexity of the nervous system, some exhibit similar properties. In addition to understanding the nervous system, artificial neural networks (referred to as neural networks hereafter) have been suggested for computational purposes. Mc-Culloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was designed with the intention for computational purposes. Widrow proposed in 1960 [95] a network model called Madaline, which has been successfully used in adaptive signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his colleagues rediscovered the generalized delta rule [80], which enables a type of net-works known as the back-propagation networks, or multi-layer perceptrons, to implement any arbitrary function up to any specified accuracy [30]. These contributions have laid down the foundation for applying neural networks to solve a wide range of computational problems, especially those digital computers take a long time to solve. From the computational point of view, a neural network is a system composed of interconnected simple computing units, each of which is identical to the others to some extent. Here follows an important question: How can these simple computing units be used collectively to perform complex computations which they can not perform individ-ually? Chapter 1. INTRODUCTION 3 This problem can be attacked from two directions: (1) assuming a network model (usually inspired by certain parts of the nervous system), adjusting its parameters until it performs a given function; (2) assuming a set of simple computing units, constructing a network in such a way that it performs a given function. The application of the back-propagation networks is an example of the former approach; the designing of the Hopfield network is an example of the latter approach. Due to reasons given in Chapter Four, we have chosen the second approach. Most existing network models were designed for the purpose of understanding how the brain works. As a result, these models are restricted by the characteristics of the nervous system. We, however, are mainly concerned with applying some computing principles believed to be employed in the nervous system, such as collective computing, massive parallelism and distributed representation, to solve computational problems. Consequently, the network models we are going to develop shall be restricted only by today's hardware technology. Therefore, a more precise definition of the task we are undertaking is due here. The problem we are concerned with is, assuming a finite supply of simple computing units, how do we organize them in a computationally efficient way such that a network so formed can perform a given function. By simple, we mean that these units can be mass produced with today's hardware technology. By computationally efficient, we mean that the network so formed can perform the function in a reasonably short period of time. The latter restriction is necessary not only because computational speed is our main concern but also because there exist an infinite number of networks which can perform the same function. Although a lot can be learnt from the works of researchers such as Kohonen, Hop-field, and Lippmann on the constructing processes of their network models, a systemat-ical method of network design has not yet been developed. They relied mainly on their Chapter 1. INTRODUCTION 4 imagination, intuition, as well as inspiration from their knowledge of the nervous sys-tem. However, to effectively use neural networks as computing machines, a systematical method of network design is very much desired. It is our objective to develop such a method. In this thesis, we propose a methodology of network design, which consists of the following five stages: 1. Find the minimum set of neuron models for a given class of functions; 2. Devise symbolic representations for these neurons and their networks; 3. Establish theorems for manipulating these symbols based on the computational properties of the neurons and their networks; 4. Establish procedures for deriving neural networks from functions; 5. Use these procedures and theorems to derive and simplify network models for spec-ified functions. We call this methodology algebraic derivation of neural networks. This methodology is developed with an emphasis on deriving neural networks to realize image processing techniques. Procedures and theorems for network derivation are developed. They are used to construct network models for realizing some image processing techniques and techniques in other areas. The results we have obtained show that our methodology is effective. This thesis consists of eleven chapters. Chapter Two provides some background in-formation on neural networks and image processing. The materials in the rest of the thesis are believed to be new and are organized as follows. Chapter Three gives formal definitions of neurons, neural networks, and related concepts. Chapter Four elaborates Chapter 1. INTRODUCTION 5 on the reasons for our approach; we shall show the limitations of learning as used in back-propagation networks and the limitations of conventional network design method as exemplified by the Hamming network. Chapter Five finds a set of neuron models which are used to form networks for realizing image processing techniques, and provides symbolic representations for these neurons and their networks. Chapter Six develops theorems for network simplification based on some basic computational properties of the neurons and their networks. Chapter Seven gives procedures of algebraic derivation and explains them through some examples, which in turn are useful in network derivations in later chapters. Chapters Eight and Nine show examples of deriving neural networks to realize image enhancement and restoration techniques respectively. Chapter Ten con-tains examples of deriving neural networks for realizing some techniques in other areas. Chapter Eleven concludes our work and speculates on the future work. Chapter 2 B A C K G R O U N D This chapter gives necessary background information on neural networks and image pro-cessing. Section 2.1 gives an overview of neural networks with an emphasis on their appli-cations for computing. Concepts of neurons and neural networks are introduced. Some neuron and neural network models are described. Among the network models proposed in literature, the back-propagation network, the Hopfield network, and the Hamming network are chosen to be described in more detail since they are good representatives of their classes and also they will be referred to further in later chapters. Section 2.2 gives an overview of problems in image enhancement and restoration and techniques used to solve them. 2.1 Artificial Neural Networks People have long been curious about how the brain works. The capabilities of the nervous system in performing certain tasks such as pattern recognition are far more powerful than today's most advanced digital computers. In addition to satisfying intellectual curiosity, it is hoped that by understanding how the brain works we may be able to create machines as powerful as, if not more powerful than, the brain. The nervous system contains billions of neurons which are organized in networks. A neuron receives stimuli from its neighboring neurons and sends out signals if the cu-mulative effect of the stimuli is strong enough. Although a great deal about individual neurons is known, little is known about the networks they form. In order to understand 6 Chapter 2. BACKGROUND 7 the nervous system, various network models, known as artificial neural networks, have been proposed and studied. These networks are composed of simple computing units, referred to as artificial neurons since they approximate their biological counterparts. Work on neural networks has a long history. Development of detailed mathematical models began more than 40 years ago with the works of McCulloch and Pitts [63], Hebb [29], Rosenblatt [78], Widrow [95] and others [74]. More recent works by Hopfield [33, 34, 35], Rumelhart and McClelland [80], Sejnowski [82], Feldman [18], Grossberg [27], and others have led to a new resurgence of the field in the 80s. This new interest is due to the development of new network topologies and algorithms [33, 34, 35, 80, 18], new analog VLSI implementation techniques [64], and some interesting demonstrations [82, 35], as well as by a growing fascination with the functioning of the human brain. Recent interest is also driven by the realization that human-like performances in areas such as pattern recognition will require enormous amounts of computations. Neural networks provide an alternative means for obtaining the required computing capacity along with other non-conventional parallel computing machines. In addition to understanding the nervous system, neural networks have been suggested for computational purposes ever since the beginning of their study. McCulloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was designed with the intention for computational purposes. Widrow proposed in 1960 [95] a network model called Madaline, which has been successfully used in adaptive signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his colleagues rediscovered the generalized delta rule [80], which enables a type of networks known as the back-propagation networks to implement any arbitrary function up to any specified accuracy. In European, researchers such as Aleksander, Caianiello, and Kohonen Chapter 2. BACKGROUND 8 have made conscious applications of computational principles believed to be employed in the nervous system [3]. Some commercial successes have been achieved [4]. These contributions have laid down the foundation for applying neural networks to solve a wide range of computational problems, especially those digital computers takes a long time to solve. 2.1.1 Neuron Models The human nervous system consists of IO 1 0 - 1 0 u neurons, each of which has 103 -104 connections with its neighboring neurons. A neuron cell shares many characteristics with the other cells in the human body, but has unique capabilities to receive, process, and transmit electro-chemical signals over the neural pathways that comprise the brain's communication system. Figure 2.1 shows the structure of a typical biological neuron. Dendrites extend from the cell body to other neurons where they receive signals at a connection point called a synapse. On the receiving side of the synapse, these inputs are conducted to the body. There they are summed, some inputs tending to excite the cell, others tending to inhibit it. When the cumulative excitation in the cell body exceeds a threshold, the cell \"fires\"\u00E2\u0080\u0094sending a signal down the axon to other neurons. This basic functional outline has many complexities and exceptions; nevertheless, most artificial neurons model the above mentioned simple characteristics. A n artificial neuron is designed to mimic some characteristics of its biological counter-part. Figure 2.2 shows a typical artificial neuron model. Despite the diversity of network paradigms, nearly all neuron models \u00E2\u0080\u0094 except for few models such as the sigma-pi neuron (see [15]) \u00E2\u0080\u0094 are based upon this configuration. In Figure 2.2, a set of inputs labeled (xi,X2, ...,XN) is applied to the neuron. These inputs correspond to the signals into the synapses of a biological neuron. Each signal Chapter 2. BACKGROUND 9 From other neurons Dendrites To other neurons Figure 2.1: A Biological Neuron XN (Ol (Oi f(.) Activation Fuction Figure 2.2: A n Art i f i c ia l Neuron Xi is mult ipl ied by an associated weight w, 6 i w 2 , W N } , before it is applied to the summation block, labeled Each weight corresponds to the \"strength\" of a single biological synapse. The summation block, corresponding roughly to the biological cell body, adds up al l the weighted inputs, producing an output e = Yl? w%Xi. e is then compared wi th a threshold t. The difference a = e \u00E2\u0080\u0094 t is usually further processed by an activation function to produce the neuron's output. The activation function may be a simple linear function, a threshold-logic function, or a function which more accurately simulates the nonlinear transfer characteristic of the biological neuron and permits more Chapter 2. BACKGROUND 10 general network functions. Most neuron models vary only in the forms of their activation functions. Some ex-amples are the linear, the quasi-linear, the threshold-logic, and the sigmoid activation functions. The linear activation function is fL(ct) = a The quasi-linear activation function is a if a > 0 0 otherwise The threshold-logic activation function is 1 if a > 0 0 otherwise M 0 and g(m,n) is a suitably defined gradient at (m, n). A commonly used gradient function is the discrete Laplacian. g(m, n) = u(m, n) \u00E2\u0080\u0094 ^(u(m \u00E2\u0080\u0094 1, n) + u(m, n \u00E2\u0080\u0094 1) + u(m + l,n) + ii(m, n \u00E2\u0080\u0094 1)) (2.16) Low-pass, Band-pass, and High-pass Filtering Low-pass filters are useful for noise smoothing and interpolation. High-pass filters are useful in extracting edges and in u(m,n) + \g(m, n) (2.15) Chapter 2. BACKGROUND 22 sharpening images. Band-pass filters are useful in the enhancement of edges and other high-pass image characteristics in the presence of noise. If hijp(rn,n) denotes a FIR low-pass filter, then a FIR high-pass filter, hjjp(rn,n), can be defined as hffp(m,n) = 8(m,n) \u00E2\u0080\u0094 hLp(m,n) (2-17) where !(m,n) + vx(m + l,n)], 0(ra, n) is also called the general-ized homomorphic transform, H, of the image u(m,n). The generalized homomorphic linear filter performs zero-memory operations on the transform of the image followed by inverse .//-transform. The homomorphic transformation reduces the dynamic range of the image in the transform domain and increases it in the cepstral domain [39]. Pseudocoloring In addition to the requirements of monochrome image enhancement, color image en-hancement may require improvement of color balance or color contrast in a color image. Enhancement of color images becomes a more difficult task not only because of the added dimension of the data but also due to the added complexity of color perception [55]. A practical approach to developing color image enhancement algorithms is shown in Figure 2.8. The input color coordinates of each pixel are independently transformed into another set of color coordinates, where the image in each coordinate is enhanced by its own (monochrome) image enhancement algorithm, which could be chosen suitably from the foregoing set of algorithms. The enhanced image coordinates T[, T\u00C2\u00A3 are inverse transformed to R', G', B' for display. Since each image plane Tk(m,n), k G {1,2,3}, Chapter 2. BACKGROUND 26 T i ( monochrome \u00E2\u0080\u00A2li \u00E2\u0080\u009E R' image enhancemant inverse oordinate sformation coordinate T 2 monochrome Ti ( inverse oordinate sformation G* cd comversion image enhancemant inverse oordinate sformation Cl. cn \u00C2\u00B0 S B monochrome T3' ( B' ( image enhancemant Figure 2.8: Color Image Enhancement is enhanced independently, care has to be taken so that the enhanced coordinates T'k are with the color gamut of the R-G-B system. The choice of color coordinate system Tk, k 6 {1,2,3}, in which enhancement algorithms are implemented may be problem-dependent. 2.2.2 Image Restoration Any image acquired by optical, electro-optical or electronic means is likely to be degraded by the sensing environment. The degradation may be in the form of sensor noise, blur due to camera misfocus, relative object-camera motion, random atmospheric turbulence, and so on. Image restoration is concerned with filtering the observed image to minimize the effect of degradations. The effectiveness of image restoration filters depends on the extent and the accuracy of the knowledge of the degradation process as well as on the filter design criterion. Image restoration techniques are classified according to the type of criterion used. Image restoration differs from image enhancement in that the latter is concerned more with accentuation or extraction of image features rather than restoration of degradations. Image restoration problems can be quantified precisely, whereas enhancement criteria are Chapter 2. BACKGROUND 27 difficult to represent mathematically. Consequently, restoration techniques often depend only on the class or ensemble properties of a data set, whereas image enhancement techniques are much more image dependent. The most common degradation or observation model is g = H f + n (2.33) where g is the observed or degraded image, f is the original image and n is the noise term. The objective of image restoration is to find the best estimate $ of the original image f based on some criterion. Unconstrained Least Squares Filters From Equation (2.33), the noise term in the degradation model is given by n = g - H f (2.34) In the absence of any knowledge about n, a meaningful criterion is to seek an f such that H f approximates g in a least-squares sense, that is, J(f) = ||g-Hf||2 (2.35) is minimum, where ||x|| is the L2 norm of vector x. Inverse Filter Solving Equation (2.35) for f yields f = ( H 2 ^ ) \" 1 ^ (2.36) If H is a square matrix and assuming that H - 1 exists, Equation (2.36) reduced to f = H _ 1 g (2.37) This filter is called inverse filter [26]. Chapter 2. BACKGROUND 28 Constrained Least-Squares Filters In order that the restoration filters have more effect than simply inversions, a constrained least-square filter might be developed in which the constraints allow the designer addi-tional control over the restoration process. Assuming the norm of the noise signal ||n||2 is known or measurable a posteriori from the image, the restoration problem can be reformulated as minimizing ||Qf||2 subject to ||g \u00E2\u0080\u0094 H f ||2 = ||n||2. By using the method of Langrange multipliers, the restoration problem becomes finding f such that J(f) = ||Qf||2 + a(||g-Hf||2-||n||2) (2.38) is minimum. The solution to Equation (2.38) is f = ( H T H + 7 Q T Q ) - 1 H r g (2.39) where 7 \u00E2\u0080\u0094 \u00E2\u0080\u0094. Pseudo-inverse Filter. If it is desired to minimize the norm of f, that is, Q = I, then an estimate f is given by f = ( H T H + 7 I ) - 1 H T g (2.40) In the limit as 7 \u00E2\u0080\u0094> 0, the resulting filter is known as the pseudo-inverse filter [2]. Wiener Filter. Lef Rf and R n be the correlation matrices of f and n, defined respectively as Rf = \u00C2\u00A3 { f f r } (2.41) and R n = \u00C2\u00A3{nn r } (2.42) Chapter 2. BACKGROUND 29 where E{.} denotes the expected value operation. By defining Q r Q 4 R ^ R Q (2.43) and substituting this expression to Equation (2.39), we obtain f = ( H T H + 7 R f 1 R n ) - 1 H T g (2.44) which is known as the Wiener filter[32]. M a x i m u m Entropy Filter. If the image f is normalized to unit energy, then each /,\u00E2\u0080\u00A2 scalar value can be interpreted as a probability. Then the entropy of the image f would be given by Entropy = - f T In f (2.45) where lnf refers to componentwise natural logarithms, that is lnf = ( l n / 1 , l n / 2 , . . . , l n / N ) T (2.46) If constrained least-squares approach is applied as before, then the negative of the entropy could be minimized subject to the constraint that ||g \u00E2\u0080\u0094 H f ||2 = ||n||2. Thus the objective function becomes J(f) = Flnf - a(||g - H f|| 2 -||n|| 2 ) (2.47) The solution to this equation is f = exp{- l - 2aH T (g - Hf)) (2.48) Baysian Methods In many imaging situations, for instance, image recording by film, the observation model is non-linear as g = s{Hf} + n (2.49) Chapter 2. BACKGROUND 30 where -s{x} is a componentwise non-linear function of vector x. The Bayes estimation problem associated with Equation (2.49) is to find an estimate f such that max p(f|g) (2.50) where p(f |g) is the density function of f given g. From Bayes' law, we have p ( f | g ) = K^M) ( 2 5 1 ) p(g) Maximizing the above equation requires that a priori probability density on the right-hand side be defined. M a x i m i m u m A Posteriori Estimate Under the assumption of Gaussian statistics for f and n, with covariance Rf and R n , the MAP estimate is the solution of minimizing the following function [6] lnp(f|g) = - | ( g - a{m})TR?(s - s{Hf}) - i ( f - ? f R f - 1 ( f - f ) + lnp(g) + ^ (2.52) where K is a constant factor and ? is the mean of f . The solution is { = f + R f H F D R ^ g - s{Ui}) (2.53) where D is a diagonal matrix defined as D ^ D i a g { ^ | x = 6,} (2.54) and bi are the elements of the vector b = Hf. Equation (2.53) is a nonlinear matrix equation for f, and since f appears on both sides, there is a feedback structure as well [42]. Analogies in the estimation of continuous waveforms have been derived for communication theory problems [91]. Chapter 2. BACKGROUND 31 Maximimum Likelihood Estimate Associated with the M A P estimate is the maximum likelihood (ML) estimate, which is derived by assuming that p(f|g) = p(g|f); that is, the vector f is a nonrandom quantity. Accordingly, Function (2.52) is reduced to lnp(f |g) = - i ( g - 5 { H f } ) T R i 1(g - s{Hf}) + Inp(g) + K (2.55) The solution of minimizing the above function is H T D R ; 1 ( g - a { H f } ) = 0 (2.56) That is g = s{m] (2.57) Chapter 3 D E F I N I T I O N S A N D N O T A T I O N S The field of neural networks has attracted many people from different disciplines. The diversity of their backgrounds is reflected in the variety of terminologies. Although some efforts are being made to address the terminology problem, a standard terminology is still yet to come [16]. For the sake of clarity and some other reasons stated later in this chapter, we give our definitions of neurons, neural networks, and other related concepts here. Definition 3.1 A neuron is a simple computing element with n inputs x\,x2, \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2^n (n > 1) and one output y. It is characterized by n + 1 numbers, namely, its threshold t, and the weights w\, w2, ...,wn, where W{ is associated with X{. A neuron operates on a discrete time k = 1,2,3,4,..., its output at time k + 1 being determined by its inputs at time k according to the following rule The function of a neuron is to map points in a multi-dimensional space Xn to points in a one-dimensional space y, that is, n y{k + l) = fa(EwMk)-t) (3.1) where fa(.) is a monotonic function. fa-xn->y (3.2) From the definition, fa is a composite function fa = fi o f2 (3.3) 32 Chapter 3. DEFINITIONS AND NOTATIONS 33 Xi x,\u00E2\u0080\u0094 xN Figure 3.1: The Schematical Representation of a Neuron where h-.n^y (3.4) and f2:Xn-*K (3.5) Function f2 is a linear function in the sense that /2(Ai\u00C2\u00A3i + X2x2) = A1/2(x1) + A2/2(x2) VAx, A2 \u00E2\u0082\u00AC K (3.6) A neuron is defined on the triple (Xn, y, f) where / is an element of the set J- of activation functions. Let's denote the linear activation function as L, the threshold-logic activation function as T, and the quasi-linear activation function as Q. A neuron is schematically represented as that shown in Figure 3.1, where A \u00C2\u00A3 T. Definition 3.2 A neural network is a collection of neurons, each with the same time scale. Neurons are interconnected by splitting the output of any neuron into a number of lines and connecting some or all of these to the inputs of other neurons. A output may thus lead to any number of inputs, but an input may come from at most one output. A neuron is denoted as p. The i th neuron in a network is denoted as pi. The set of all the indices which specify the sequence of neurons is denoted as X. For a network of n neurons, T = { 1 , 2 , n } . Chapter 3. DEFINITIONS AND NOTATIONS 34 The output of a neuron is called the state of the neuron. The state of pi at time k is denoted as Si(k). In this thesis, we assume that the states of all the neurons in a network are updated at the same time instance. Such updating is known as synchronous updating [25]. At k = 0, the network is initialized with appropriate values, and the network then updates itself until it halts when certain criterion is met. The function of a neural network with N inputs and M outputs is to map points in a multi-dimensional space to points in another multi-dimensional space, that is, it performs the following function / : XN -\u00E2\u0080\u00A2 yM (3.7) The input of a network is denoted as x or x , and the output of a network is denoted as V or y . Neurons in a network are classified into three types: input neurons, output neurons, and hidden neurons. An input neuron is a neuron whose initial state is an element of the network input x; an output neuron is a neuron whose state at the time when the network halts is an element of the network output y; a hidden neuron is a neuron which does not belong to either one of the first two classes. A neuron can be both input and output neuron. The input neuron is somewhat special. For example, in a feedforward network, an input neuron has only one input line. If the input x \u00E2\u0082\u00AC B, then a threshold-logic neuron can be used as the input neuron; if x \u00E2\u0082\u00AC V, then a quasi-linear neuron can be used as the input neuron; if x \u00E2\u0082\u00AC 72., then a linear neuron has to be used as the input neuron. In all the cases, a linear neuron can always be used as the input neuron. Therefore, for the sake of simplicity, input neurons are always linear neurons unless specified otherwise. An input neuron is schematically shown in Figure 3.2, which represents an linear neuron with t = 0 and w \u00E2\u0080\u0094 1. Chapter 3. DEFINITIONS AND NOTATIONS 35 0 Figure 3.2: The Schematical Representation of the Input Neuron 0 0 (a) (b) (c) (d) Figure 3.3: Three Ways of Setting Up an Input Neuron There are three ways to set up an input neuron: (1) 5(0) = x, s(k > 0) = 0; (2) s(k > 0) = x; and (3) s(k) = x(k). These three cases are shown in Figure 3.3. To represent either one of the three settings, an input neuron without the input line is used (see Figure 3.3.d). At time k = 0, the input neurons are loaded with the network input, and the rest are loaded usually with zeros. The computation time Tp of a network is the time between when the network is initialized and when the solution appears on the output neurons. Our definitions of neuron and neural network are not arbitrarily chosen. They provide a unified framework to describe many kinds of neural network models. For instance, an 2 x 2 x 1 back-propagation network can be implemented by our network as that shown in Figure 3.4, where S denotes the sigmoid activation function. At time k = 0, the input neurons are loaded with network input x, and the rest of the neurons with zeros. At k = 2, the state of the output neuron is the true output of the network\u00E2\u0080\u0094hence the computation time of this network is 2. If the input neurons are set up as in Figure 3.3.b, then s5(k > 2) = s5(k = 2). Feedback network can also be implemented by our networks. For instance, a four-neuron Hopfield network is implemented as shown in Figure 3.5. Note that here every Chapter 3. DEFINITIONS AND NOTATIONS 36 Figure 3.5: The Hopfield Network neuron is both the input and output neuron. After the network is initialized, it updates itself until s{(k = h + 1) = Si(h) ^ Si(h - 1) Vi h are the true network output\u00E2\u0080\u0094hence, h is considered the computation time of this network. Another example is the implementation of an 2 x 2 x 2 Hamming network as shown in Figure 3.6. In Lippmann's original definition of Hamming network, there is a control problem that the lower subnet has to be removed after the upper subnet is initialized with the output of the lower subnet (see [58]). Since the network itself has no means to do so, an external mechanism has to be employed, for example, a set of synchronized switches between the lower subnet and upper subnet. In our networks, such a control problem is easily solved by setting the input neurons as shown in Figure 3.3.a. Chapter 3. DEFINITIONS AND NOTATIONS 37 Y i Y 2 Figure 3.6: The Hamming Network Neurons in our networks are usually numbered in an one-dimensional manner. Never-theless, they can be spatially arranged in a two or greater dimensional manner as shown in previous examples. In those cases, the neurons are numbered either from top-to-bottom and left-to-right (see Figure 3.4), or from left-to-right and bottom-to-top (see Figure 3.5 and Figure 3.6). There are two kinds of connections from pi to pj. One is from pi directly to pj, and the other is from pi to some other neurons and then to pj. The former is called direct connection, and the latter indirect connection. There are also two distinctive direct connections between two neurons, say, pi and pj, in a network. One is from the output of />,\u00E2\u0080\u00A2 to the input of pj, whose strength or weight is denoted as Wji. The other is from the output of pj to the input of pi, whose weight is denoted as Wij. The matrix W whose ijth element is is called the weight matrix of the network. A vector whose ith element is the threshold of pi is called threshold vector and is denoted as t. A vector whose ith element is the type of activation function of pi is called activation vector and is denoted as a. A neural network is fully defined by the triple (W,t ,a) , Chapter 3. DEFINITIONS AND NOTATIONS 38 which specifies what function the network performs. Neurons in a network are usually grouped in layers. A layer of neurons is a group of neurons each of which has no direct connections with others in the same group. Symbol Ch denotes the set of all neurons belong to layer h, and C denotes the set of all the possible Ch- From the definitions, we know that the following is true E\Ch\ = \I\ (3.8) h The communication time tc(i,j) from pi to pj is defined as the shortest period of time it takes for data in pi to reach pj. For example, in the Hamming network shown in Figure 3.6, tc(2, 3) = 1 and ic(2,5) = 2. If there is no connection between two neurons, then tc(i,j) = tc(j,i) = oo. The communication time ti(k,h) from layer k to layer h is defined as tt(k, h) = ma,x{tc(i,j) : i \u00E2\u0082\u00AC Ck, j \u00E2\u0082\u00AC Ch] (3.9) A matrix C whose ij th element is defined as 1 if wu ^ 0 3 T (3.10) 0 otherwise is called connectivity matrix. The architecture Ap of a network is defined by the connec-tivity matrix of the network. The configuration Cp of a network is defined by the triple (C,t,a). The architecture of a network specifies the structure of the network; there are two major classes of network architectures, namely, feedforward and feedback. The configuration of a network specifies what class of functions this network can perform. The connectivity matrix C of a network contains a great deal of information about the properties of the network. Fact 3.1 A neural network has a feedforward architecture iffcij A Cji = 0 Vz, j \u00E2\u0082\u00AC X. Chapter 3. DEFINITIONS AND NOTATIONS 39 Fact 3.2 A neural network has a feedback architecture if3i,j 6 T such that c t J Ac,,- = 1. Fact 3.3 There is a direct connection between pi and pj iff c,j V Cji = 1 Fact 3.4 There is an indirect connection between pi and pj iff there exists a non-empty sequence (kt, k2,A;n) such that either cklj A ck2kl A. . . A cfcnfcn_1 A cikn = 1 or ckli A ckikl A ...Acknkn_1Acjkn = l. The neural networks of our definition are, mathematically speaking, a class of au-tomata networks [25]. Yet, our networks differ from the automata networks of the usual sense in that the state of our networks is inherently infinite. Moreover, our networks are different from that defined in the field of automata networks where neural networks are defined as systems of McCulloch-Pitts (threshold-logic) neurons (see [7, 25]). Note that such networks are only a subset of our networks. Nevertheless, analytical tools used in the field of automata networks are useful in the analysis of our networks. The analysis of our networks is certainly an interesting and challenging subject. How-ever, the main interest of this thesis is on the synthesis part, that is, how to map an function to a network such that the network can perform the function. In other words, we are interested in finding procedure(s) P such that P:A^JV (3.11) where A is the set of functions and J\f is the set of the triple (W,t, a). Chapter 4 REASONS FOR ALGEBRAIC DERIVATION The essence of neural network problem solving is that the network performs a function which is a solution to the problem. The function can be viewed as a computing rule which states how to compute a given input to get the correct output. This rule is coded in the network architecture, the weights, the thresholds, as well as the activation functions. A neural network is a collection of interconnected simple computing units, each of which can be viewed as a computational instruction. Thus constructing a network to solve a problem is equivalent to arranging these instructions such that a computing rule is formed which solves the problem. This is similar to programming in digital computers. There are basically two ways of \"programming\" a neural network: (1) analytically de-riving the network architecture and the network parameters based on a given computing rule; or (2) assuming a network architecture and adjusting network parameters until a proper computing rule is formed. Adjusting network parameters in such a meaningful way is referred to as learning or training in the field of neural networks [94]. Let us refer to the first approach as analytical approach, and the second as learning approach. There are two typical network models which are representatives of the two approachs respectively. One is the Hamming network [59], which represents the former approach. The other is the back-propagation network [80], which represents the latter approach. The learning approach is necessary when no computing rule is available or at least not completely available. The role of learning here is to find a computing rule, which solves 40 Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 41 the given problem, based upon partial solution of the problem. When the computing rule is completely known, the learning approach is still useful in the sense that here the role of learning is to automate the process of coding the computing rule. However, there are some practical problems associated with the learning approach, one of which is that it usually takes a very long period of time to train a network. This problem is caused by several factors which shall be discussed in the next section. When the computing rule is known, it is sensible to utilize it in constructing neural networks. The analytical approach ensures proper network architectures and proper network parameters. However, the existing network models designed using the analytical approach are only capable of solving a narrow range of problems. Moreover, these network models are more or less hand crafted, heavily based upon the imagination and intuition of those who developed these models. There is still no systematic method of network design. The objective of this thesis is to develop such a method. In the following sections, we shall demonstrate the drawbacks of the learning approach with the example of training the back-propagation network to solve the parity problem; we shall also describe the analytical approach in more details with examples of designing the Hamming network, as well as designing a network model to solve the parity problem. We then discuss the current situation of the analytical approach, and propose our method. 4.1 Drawbacks of the Learning Approach The back-propagation network is the most popular network model used in neural network applications because it can be trained to implement any function to any specified degree of accuracy [30]. However, there are some practical problems with its learning process. The first problem is that learning usually takes a long period of time even for a small Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 42 scale problem. One cause of this problem is due to the back-propagation algorithm itself. Although many improved versions have been developed [69, 28, 44], the fastest version is still time consuming. Another cause is that it is difficult to know in advance how many neurons in each layer are required. These \"magic\" numbers have to be found, at least for the time being, through trial-and-error. This means that after days, even weeks of training, one has to start the training process all over again if the number of neurons in the network are not enough. The second problem is that after training a network to implement a certain function, one rarely gains any insight that would help in training another network to implement a similar function, e.g., a function which is in the same family as the previous function. The nature of image enhancement and restoration problems makes the first problem even more severe. Taking the example of implementing a median filter with the window size of three (the simplest case), assuming that each pixel takes integer values ranging from 0 to 255, then there are 2563 = 16,777,216 possible input patterns. In order to achieve zero error rate, the training set may have to consist of all the input patterns and their corresponding correct outputs. Such a training set is astronomically enormous, needless to say for the cases where the window size is bigger. To underscore the com-plexity of learning, suppose we have the means to have neural networks learn one pattern every 1 ms, then to implement the above mentioned median filter, it would take a net-work about 5 hours to learn all the patterns. However, with the same learning rate, to implement a median filter with a window size of five, it would take a network about 35 years to learning all the possible patterns. The first problem is well documented in the literature [92, 43, 81], hence we shall only illustrate the second problem here through the example of training the back-propagation network to solve the parity problem. This problem is impossible for the Perceptron to solve [65], but is solvable by the back-propagation network [80], which is one of the Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 43 (a) (b) (c) Figure 4.1: Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem. reasons for the fame of this network model. We shall start with the XOR problem, which is a special case of the parity problem when the problem size n = 2, and continue with cases when n \u00E2\u0080\u0094 3 and n \u00E2\u0080\u0094 4. Example 4.1: The Parity Problem. The truth table of the Exclusive OR (XOR) is shown in Table 4.1. The network architecture we have chosen is shown in Figure 4.1. The generalized delta rule (see Appendix A) is used to train the network. All the patterns in Table 4.1 are included in the training set. The result after the network has learnt how to solve the XOR problem, i.e., for every input pattern its output is the same as that in Table 4.1, is shown in Table 4.2. Similarly, we carried out the training for n = 3 and n = 4. The network architectures for solving these problems are also shown in Figure 4.1. The network parameters after training are shown in Table 4.3 and Table 4.4 respectively. The number which has pj to Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 44 Table 4.1: The Exclusive OR input output Xi x2 z 0 0 0 0 1 1 1 0 1 1 1 0 Table 4.2: Network Parameters For Solving XOR Problem Pi Pi P3 4.146451 -2.968075 P4 1.379474 -2.464423 h U -1.889660 0.380817 P3 P4 Ps -3.351054 2.622306 h -1.845504 the top and pi to the left is the value of the weight of the connection from pj to />,-. The number which has to the top is the value of the threshold of neuron pi. By looking at the three tables of network parameters, it is difficult to generalize what the parameters of the network will be for cases where the problem size is five or larger. Consequently, one has to start from scratch to train networks for these cases. The number of learning steps required for the network to have learnt how to solve the parity problem is shown on Table 4.5 for n = 2,3,4,5 and rj = 0.10,0.15,0.20,0.25, where r] is the gain term of the generalized delta rule (see Appendix A). Here, the learning step is defined as the interval during which all the network parameters are being updated Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 45 Table 4.3: Network Parameters For Solving Parity-3 Problem Pi P2 P3 P4 -4.396586 5.399827 -3.631473 P5 -1.965704 2.695202 -0.726498 Pe 1.025570 -0.856012 2.323546 U h U -1.486861 -1.788286 -0.119762 P5 Pe Pi -4.192873 4.991574 -2.708791 t? 0.151018 Table 4.4: Network Parameters For Solving Parity-4 Problem Pi P2 P3 P4 Ps -4.865602 -6.202526 5.522527 -6.278468 Pe -3.045859 4.738367 -4.957426 4.048179 Pr -6.845311 4.469164 -2.124128 4.499167 Ps -3.836795 -2.465528 2.714290 -2.221879 te -3.129540 3.778817 2.953945 -3.396108 Ps Pe P7 P9 -4.868719 5.105030 -5.718770 6.778777 1.817817 Chapter 4. REASONS FOR ALGEBRAIC DERIVATION Table 4.5: Number of Learning Steps 46 n 7/ = 0.10 n = .15 n = 0.20 7/ = 0.25 2 9830 6485 4885 4003 3 30652 20581 15490 12527 4 94317 80384 69852 75428 5 170604 120640 89585 67585 once. The learning step is assumed constant, which implies that the updating is done in parallel. Note that the number of learning steps increases in a manner that it soon gets very large as the problem size increases. As a matter of fact, the so-called loading problem (see [48]), which concerns how long it takes a neural network to have learnt every pattern in the training set with the respect to the set size, turns out to be NP-complete [47, 48]. Moreover, for the back-propagation network to solve the parity problem, the training set may have to include every possible pattern, which means that the size of the training set is two to the power of the problem size, i.e., the size of the training set increases exponentially as the problem size increases. Therefore, training a network for the cases of larger problem size not only means starting all over again, but may be infeasible when the problem size is fairly large. Finally, the network architectures shown in Figure 4.1 were chosen because we knew they work from experiments done by other researchers [80]. Otherwise, extra amount of work would have had to be done on finding the right network architectures. 4.2 The Advantages of The Analytical Approach The analytical approach is demonstrated with examples of designing the Hamming net-work, as well as designing a network model to solve the parity problem. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 47 4.2.1 Designing the Hamming Network A classic problem in communication theory is that given a set of binary patterns, called exemplars, for an given binary pattern, how to determine which exemplar is the closest to the input pattern. This problem occurs when binary fixed-length signals are sent through a memoryless binary symmetric channel. The optimum minimum error classifier in this case calculates the Hamming distance to the exemplar for each class and selects that class with the minimum Hamming distance [23]. The Hamming distance is the number of bits in the input which do not match the corresponding exemplar bits. The Hamming network developed by Lippmann [58] implements this algorithm. Let us denote the input pattern and the exemplar as v and e respectively. Each element of both vectors are binary and belong to the set {+1,-1}. Suppose there are M exemplars {ei, e2,e^}, then the Hamming distance between an input pattern and the i th exemplar is d^^N-v-Z) (4.1) where N is the number of elements in the vector and v \u00E2\u0080\u00A2 e,- is the inner product of two vectors. Since minimizing d{ is equivalent to maximizing 6, = N \u00E2\u0080\u0094 di = |(A/ + v \u00E2\u0080\u00A2 e;), the optimum minimum error algorithm can be restated as follows: 1. Calculate 6,- Vt \u00E2\u0082\u00AC { 1 , 2 , A f } ; 2. Find the index k such that bk = max{6j : i \u00C2\u00A3 { 1 , 2 , M } } ; 3. Output the exemplar vector eV To implement this algorithm, the Hamming network is structured in two subnets: the lower subnet calculates bi (the i th matching score) and the upper subnet finds the index k. Note that the Hamming network does not implement the third step. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 48 In the lower subnet, one neuron is enough to calculate each fe, by setting the weights to vo'- = ^e\, the threshold to 0j \u00E2\u0080\u0094 \u00E2\u0080\u0094 y , and the activation function to u if a > u a if u> a > 0 (4.2) 0 otherwise It is assumed that this neuron operates in its linear range, that is, it is essentially a linear neuron. The lower subnet consists of M such neurons. For the upper subnet, Lippmann used a feedback network known as lateral inhibitive network or winner-take-all network, which mimics the heavy use of lateral inhibition evident in the biological neural networks of the brain [49]. The weights in this subnet are set as follows w kl = < 1 if k = I (4.3) -e iik^l where 0 < e < All thresholds are set to zero. Neurons in this subnet have the same activation functions as in (4.2) and also operate in the linear range. After this subnet converges, only one neuron's output is positive and the rest are zeros. The one with none-zero output corresponds to the exemplar which has the minimum Hamming distance to the input pattern. 4.2.2 Designing the Parity Network Now let us use the analytical approach to construct a neural network which solves the parity problem. A function which solves this problem is such that it outputs 1 if there are odd number of l's in the input pattern; 0 otherwise. This function is the so called parity function. Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 49 Let A 1 if E ? = i * ; > \u00C2\u00AB V i e { i , 2 , . MJV} (4.4) 0 otherwise Then the following function N N y = E(-i) ,'~ 1* = Efc* (4.5) i=i t=i is equivalent to the parity function in the sense that the input-output behaviours are the same. Zi can be realized by a threshold-logic neuron shown in Figure 4.2, and y can also be realised by a threshold-logic neuron shown in Figure 4.3. Therefore, the whole network can be built as that shown in Figure 4.4. Xi \u00E2\u0080\u00A2N Figure 4.2: Realizing Function (4.4) Zi y N Figure 4.3: Realizing Function (4.5) Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 50 y Figure 4.4: The Network for Solving the Parity Problem As shown by this and previous examples, one can easily derive a network using the an-alytical approach. Note that both the network architectures and the network parameters are simultaneously derived. The advantages offered by the analytical approach are as follows: (1) no training is needed; (2) the network architecture derived is proper if not optimal; and (3) complete understanding of the role of every individual neuron. Although a lot can be learnt from the work of researchers such as Hopfield, Koho-nen and Lippmann on the constructing processes of their network models, they have not yet developed a systematical method of network design. They relied mainly on their imagination, intuition, as well as inspiration from their knowledge of the nervous sys-tem. However, to effectively use neural networks as computing machines, a systematical method of network design is very much needed. 4.3 The Algebraic Derivation Methodology Here we propose a methodology of network design, which consists of the following five stages: 1. Find the minimum set of neuron models for a given class of functions; Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 51 2. Devise symbolic representations for these neurons and their networks; 3. Establish theorems for manipulating these symbols based on the computational properties of the neurons and their networks; 4. Establish procedures for deriving neural networks from functions; 5. Use these procedures and theorems to derive and simplify network models for spec-ified functions. During our study on network design, we found that it is very useful to symbolically represent these units and their networks. These symbols can then be manipulated to yield proper network models. Because symbol manipulation is an important feature of our methodology, we call it. algebraic derivation of neural networks. The approach of symbolically representing computing units and manipulating these symbols was used before in [57]. However, they were only concerned with designing networks of threshold-logic neurons to realize logical functions. Here, we are concerned with designing networks of diverse neuron types to realize more complex functions. Chapter 5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T W O R K S Our task is to map functions or algorithms to neural networks. As required by the algebraic derivation methodology, the first step is to determine a minimum set of neuron models; the second step is to devise symbolic representations for the chosen neuron models and their networks. These two steps are completed in this chapter. 5.1 Neuron Models Since we restrict the choice to the class of neuron models defined in Chapter Three, the task in the first step is reduced to deciding what types of activation functions to use. Because of practical restrictions, these functions have to be as simple as possible. Many techniques in image enhancement and restoration involve a weighted summa-tion of neighboring pixels and passing the sum to a function to determine the value of processed pixel (see Section 2.2). This function is usually piecewise. An example is the function used in the contrast stretching technique, which is shown in Figure 5.1. Denote the contrast stretching function as fc{%)- We can write another function as f(x) = afQ(x) + p?fQ(x -a) + 7/o(x - b) (5.1) where functions /Q(.) are shown in Figure 5.1 as dotted fines. By proper choice of coefficients, we can ensure that f(x) = fc(x) Vxe[0,c] (5.2) 52 Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS53 y o d x \(2) Figure 5.1: The Contrast Stretching Function Therefore, we can rewrite many functions used in image processing as sums of several quasi-linear functions. Hence, the quasi-linear activation function is chosen. where fh is a linear function. Linear activation function is also needed. A lot of logic operations are employed in image processing techniques, therefore, the threshold-logic activation function is also required. Since functions used in image processing techniques can be expressed as compositions of these three functions, we choose them to form the minimum set of activation functions. Because neither one of these functions can be realized by combinations of the other two, this set is indeed minimum. The neuron models we have just chosen are linear, quasi-linear, and threshold-logic neurons, which are referred to as LQT neurons. They are the simplest among all the neuron models proposed in the literature, hence should have minimum difficulty in their hardware implementations. In Chapter Seven, a formal proof is given which shows that any image processing technique can be realized by networks of these neurons. Note that f(x) = / i O (fQ{x)JQ(x - a),fQ(x - b)) (5.3) Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS54 For the convenience of algebraic derivation, symbols are used to represent LQT neu-rons. The linear neuron is denoted as y =C xi,x2,...,xn | w-l,w2,...,wn D t (5.4) The quasi-linear neuron is denoted as y =< x1,x2,...,xn | w1,w2,...,wn >* (5.5) The threshold-logic neuron is denoted as y =C xux2,...,xn \w1,w2,...,wnZl t (5.6) Where xt- represents the i th input to the neuron, W{ the weight associated with x,-, and t the threshold of the neuron. To represent either one of LQT neurons, the following notation is used. V = {|xi,x2,...,xn \w1,w2,...,wn\} t (5.7) If Wi = 1 Vi \u00E2\u0082\u00AC { 1 , 2 , n } , then the neuron is denoted as y = {|\u00C2\u00A3i,X2, ...,\u00C2\u00A3\u00E2\u0080\u009E!}' (5-8) Sometimes, it is more convenient to use the following vector representations y = {\x\ w\Y (5.9) or r/ = {|x|w|r (5.10) where x = x. \u00E2\u0080\u0094 (xi,x2,xn) and w \u00E2\u0080\u0094 w = (w\, w2,wn). Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS55 5.2 LQT Networks A network which is composed of LQT neurons is called a LQT network. Al l the networks derived in this thesis are such networks. Three typical topologies of LQT networks are illustrated in Figure 5.2. Both networks in Figure 5.2.a and Figure 5.2.b are feedforward, and the network in Figure 5.2c is feedback. Note that the network shown in Figure 5.2.b is different from conventional feedforward neural networks in that some neurons in the third layer take inputs from the input neurons. y,(k) y,(k) y3(k) y4(k) (a) (b) (c) Figure 5.2: Typical LQT Network Topologies The notation of a network is similar to that of a neuron. The network input is denoted by x and the network output is denoted by y. The output of the i th neuron is denoted as Hi. Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS56 The notation of the network shown in Figure 5.2.a is V = C < C x i , x 2 I w3Uw32Zit3, LZ x 1 ? x 2 | w41,w42-Jiu | w53, w54 >'5, Note that the input neurons are not explicitly expressed here. When all the variables in the notation are independent, the notation represents a whole network. Part of the network can also be represented, such as y = C Ps,pe | w 7 5 , w76Dt7 (5-12) which represents the output neuron, and y =C< nz, PA I \u00E2\u0084\u00A253, wb4> ts, H I \u00E2\u0084\u00A275, w76D t7 (5.13) which represents a part of the network as shown in dashed lines in Figure 5.3. The Figure 5.3: Representing Part of a Network notation of the network shown in Figure 5.2.b is y = C < Xi , L Z X i , X 2 | ltf31,tU 3 2Z]' 3 ,IZ X i , X 2 | W41,W42Z\U | U>51,U>53, \u00E2\u0084\u00A254 ><5, < L Z x i , x 2 | w31,w32Z} t3,n x i , x 2 | w41, wA2-3u,x2 \ w63, w64, w62 >'6 (5.14) | ^ 75,^76 D t7 Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS57 Another way of representing a whole network is fi5 = < xufi3,p4\w51,w53,w54 >ts = < M3,M4,X2 | W63, ^ 6 4 , W 6 2 >'6 (5.15) ^3 = C XUX2 IW31, w32n t3 Hi = (Z X i ,X 2 I W 4i,l0 4 2]* 4 where symbol { represents one layer of neurons. Note that input neurons are not explicitly expressed in this form either. The form used in equations (5.11) and (5.14) is called compact form, and the form used in (5.15) is called layered form. The layered form is easier to comprehend. Sometimes, a neuron can be grouped to either one of several layers. An example is shown in Figure 5.4, where the neuron with threshold t7 can be grouped either to the second or to the third layer. To avoid ambiguity, in this thesis such neurons are always grouped to the layer closest to the output. In this case, the neuron with the threshold t7 is grouped to the third layer, that is, the network's layered form representation is y = C p5,fie,^7 \wS5,w8e,w87 D ts ^5 = rz p3,p41 w53,w54zi Me = IZ fl3,P4 | ^ 6 3 , W64Z1 - rz x2 \ w723 t7 M3 = \ZXl7X2\ 1031,1032 \u00E2\u0080\u00A2 / i 4 = rz xx,x2 \ w4l, w42]u (5.16) 1*3 The symbolic representation of a feedback network is different from the representation of a feedforward network. Since feedback networks are dynamic in the sense they have to iterate many times, an indication of the iterations has to be included in the representation. Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS58 (5.17) Figure 5.4: A LQT network The representation of the feedback network shown in Figure 5.2.c is yt(h + l) = LZ yi(h),y2(h),y3(h),y4(h) | wu,w12, i ^ , ^ Z)*1 y2(h + l) = LZ yi{h),y2(h),y3(h),y4(h) \wn,w22,w23,w24Zlt2 y3(h+l) = H yi(h),y2(h),y3(h),y4(h) \w31,w32,w33,w34Z}t3 y4(h + l) = LZ yi(h), y2(h), y3(h), y4(h) \ w41, w42, w43, w44 Z2U where yi(h) is the i th element of the network output at the h th iteration. Vector y(0) = (yi(0), 2/2(0), 2/3(0), y4(0))T are the network input. Note that neurons in this network are both input and output neurons. Another representation of the above feedback network is as follows where and y(h + l) = L Z y ^ l W z 1 y(h) = (y1(h)1y2(h)Mh)Mh)? t \u00E2\u0080\u0094 (ti, t2, t3, t4)T and W is the matrix composed of u^'s. (5.18) (5.19) (5.20) Chapter 6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S As required in the third step of the algebraic derivation methodology, this chapter es-tablishes some theorems based on the computational properties of LQT networks. These theorems are useful in simplifying and evaluating network realizations. Also, we shall discuss some factors which affect the quality of a LQT network, and give a criterion for network simplification. 6.1 Network Equivalence Definition 6.1 If a function f(x) and a Network NA satisfy the following condition where X is the definition domain of f, then f is said to be realized by NA, or NA is said to be the network realization of f. A well defined LQT network can only realize one function; but a function can be realized by many different networks, thus comes the concept of network equivalence. Definition 6.2 // Network A and Network B satisfy the following condition where X is the definition domain of both NA and NB, then Network A and Network B are said to be equivalent. f(x) = NA{x) VxeX (6.1) NA(x) = NB(x) VX\u00E2\u0080\u009E|}* (6.3) and Ng is VB = ^zil,zia,...,Zin | wil,Wi9,...,win^t (6.4) where sequence {z^, z , - 2 , Z i n } and {w^, W{2, ...,u;,\u00E2\u0080\u009E} are permutations of {z\,z2,...,zn} and {wi, w 2 , w n } respectively, then NA and NB are equivalent. Theorem 6.2 Let NA be VA = {\zuz2,...,zn | w1,w2,...,wn\u00C2\u00A7 t (6.5) If W{ = 0, then NA is equivalent to VB = {|2i,...,*t-i,*t+i,-,*n | \u00C2\u00AB?!,...,u;.-_i,\u00C2\u00AB;,-+!,.\u00E2\u0080\u009E,U7\u00E2\u0080\u009E|}* (6.6) Theorem 6.3 Let NA be VA = {\zi,z2,...,zn | wuw2,...,wn$* (6.7) If Zi == a, where a is a constant, then NA is equivalent to NB = {\z1,...,zi_i,zi+1,...,zn |iwi,...,u7,-_i,u;i +i,...,\u00C2\u00AB; n|} t~ 0 ,\" i (6.8) Theorem 6.4 Let NA be VA = l > i , z 2 , z n | wx,w2,wn\Y (6.9) Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 61 If Zi = Zj and i < j, then NA is equivalent to NB = { | * i,...,Zi,...,Zj_i,Zj + i , . . . , z n |ti^,...,tx>,- + u;j,...,u;J-_i,u;,-+i,...,u;n|}* (6.10) or NC = {|zi,...,Z,-_i,2,-+i,...,Zj,...,2\u00E2\u0080\u009E | +\u00C2\u00AB\u00C2\u00BB,\u00E2\u0080\u00A2,...,U>\u00E2\u0080\u009E|}' (6.11) Theorem 6.5 / /TV^ is y>i = C z I tu \u00E2\u0080\u00A2 * (6.12) and NB is yB=Cz\ 0w Z\pt (6.13) where P > 0, then NA and NB are equivalent. Theorem 6.6 Let NA be VA = C ZX, Z2, *n I wi, w2,u)\u00E2\u0080\u009ED( (6-14) and NB be VB = < Zx,Z2,...,Zn | V)i,W2,...,Wn > ' . (6.15) If the output of NA is always greater than or equal to zero, then NA and NB are equivalent. Theorem 6.7 Let NA be yA = C zx,z2,...,zn | wx,w2,...,wn^ (6.16) and NB be yB = C z i , z 2 , . . . , z n | Wi,w2,...,wnZi+s (6.17) where 1 > <5 > 0. If the output of NA is either zero or one, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 62 Theorem 6.8 Let NA be VA =< zx,z2,...,zn | wi,w2,...,wn >* (6.18) and NB be VB =C * I , z 2 , z n | w-i, w2,u)Jt+< (6.19) where 1 > 6 > 0. If the output of NA is either zero or one, then NA and NB are equivalent. Theorem 6.9 Let NA be yA = {W,z2,...,znf (6.20) and NB be VB = { k i , \u00E2\u0080\u00A2\u00E2\u0080\u00A2, Z i - i , C Zi, ...,zi+m D'Szi+m+i, \u00E2\u0080\u00A2\u00E2\u0080\u00A2-,Zn|}<2 (6.21) where i G {1 ,2 , ra} anc? n > i + m > 1. Ift = t1+t2, then NA and NB are equivalent. Theorem 6.10 let NA be {\zx,z2,...,zn \w1,w2,...,wn\} t (6.22) and NB be {| C zx | ^ 3 \" , C 2 2 |u>2 3 < 2 , . . . ,C z n | u>\u00E2\u0080\u009E D ( \" l }^ 1 (6.23) IfYJi=\ti = t, then NA and NB are equivalent. Theorem 6.11 Let NA be yA = {iCz\wD t* |a|}<2 (6.24) and NB be yB = {\z\aw\y (6.25) Ift \u00E2\u0080\u0094 at. +12, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS Theorem 6.12 If NA is yA = {\Cza,zbDt> |a|}<2 and NB is yB = { \ C z a , z b \ a , a D a h |}<2 then NA and NB are equivalent. Theorem 6.13 Let NA be yA = {\ti |a|}<2 and NB be yB = {\at> |}<2 lfa>0, then NA and NB are equivalent. Theorem 6.14 If NA is yA = C {\zx,z2,...,zn \wx,w2,...,wn\u00C2\u00A7tl Dh and NB is VB = {\z1,Z2,...,Zn | iyi,W 2,..., Wn^ where t = ti + t2, then NA and NB are equivalent. Theorem 6.15 Let NA be yA = \u00C2\u00AB zx,z2, ...,zn | wuw2, ...,wn >tl>'2 and NB be yB =< Zi, z 2 , - . , zn | w1,w2,wn >* If t = tx + t2 and t2 > 0, then NA and NB are equivalent. Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 64 The theorems mentioned so far can be easily verified by writing out the functions realized by both networks NA and NB, and checking if the functions are equivalent. Example 6.1 Prove Theorem 6.15 is true. SOLUTION The function realized by NA is yA = fd(fQ(zw-t1)-t2) (6.34) If zw \u00E2\u0080\u0094 ta > 0, then VA = IQ(ZW - h - t2) (6.35) otherwise, VA = 0 (6.36) This implies VA = VB = IQ(ZW - t) (6.37) which is the function realized by NB- Therefore, NA = NB- tt 6.2 Network Quality Since a function can be realized by an infinite number of LQT networks, a criterion is needed for us to choose a proper network among all its equivalents. Obviously, this choice is based on the quality of the networks. There are several factors which affect the quality of a network. They are introduced in the following sections. 6.2.1 Network Depth The depth of a LQT network is defined as the longest path from the input neurons to the output neurons. For the sake of simplicity, it is assumed that all the LQT neurons have the same processing time, which is taken as one time unit. For a feedforward network, its depth equals its computation time. For a feedback network, its computation time Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 65 depends not only on the convergence property of the algorithm it implements but also on its depth. Since our objective of using neural networks is to achieve high computation speed, the depth of a network is thus an important factor in choosing one network over another. For a feedforward network, its depth also corresponds to the number of layers it has. Denote the depth of a network NA as D(NA) and the number of layers as L(NA), the following relationship exists 6.2.2 Network Size Neural networks eventually have to be implemented in hardware, hence we should be aware of their size. It is natural to think the size of a network in terms of the number of neurons in the network. However, since a neuron is allowed to have indefinite number of inputs, it is more proper to measure the size of a network in terms of the number of connections. Moreover, in VLSI implementation, it is the connections which occupy most of the space on a chip. The number of connections in a neural network equals the number of weights. Since it is easier to count the number of weights in the symbolic representation of a LQT network, we define the network size as the number of weights in the network. Let us denote the number of neurons in network NA as N(NA) and its network size as S(NA). It is clear that D(NA) = L(NA) - 1. (6.38) N(NA) < S(NA). (6.39) Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 66 6.3 Criterion for Network Simplification Both network depth and network size have to be considered when choosing a network over its equivalents. In this thesis, the following simple criterion is used: Given two networks NA and NB which realize the same function, if D(NA) < D(NB) and S(NA) < S(NB), or if D(NA) = D(NB) and S(NA) < S(NB), then NA is chosen over NB, and NA is said to be better than NB- The process of choosing a network over another is referred to as network simplification. The theorems in Section 6.1 are developed for such purpose. Obviously, a network with the shortest depth and the minimum size is the best among all the equivalent networks. Unfortunately, such a case is rare. The network with the shortest depth often does not have the minimum size. How to effectively balance these two factors is still an on-going research. In the following chapters, we only give upper bounds on network depth and size for some functions. There are also some other factors which affect the quality of a network. For example, since there are inevitably some noise in any hardware implementation, how tolerant a network is to noise is certainly a good measure of network quality. However, due to time constraints, these factors are not considered in this thesis. Chapter 7 D E R I V A T I O N P R O C E D U R E S This chapter provides procedures of deriving LQT networks to realize functions. The usage of these procedures is demonstrated through some examples, which in turn are useful in deriving networks in later chapters. A function which can be realized by a LQT network is referred to as a realizable function; the network which realizes the function is referred to as the network realization of the function. A function which has a known network realization is called a realized function. There are four classes of functions: single-input single-output (SISO) functions, single-input multiple-output (SIMO) functions, multiple-input single-output (MISO) functions, and multiple-input multiple-output (MIMO) functions. We shall only describe procedures for realizing SISO and MISO functions since realizing SIMO or MIMO functions is just multiple applications of these procedures. From the definitions of LQT neurons, the following statements are true. Fact 7.1 The network realization for the following function n (7.1) is y = C x1,x2,...,xn | w1,w2,...,wn D* (7.2) 67 Chapter 7. DERIVATION PROCEDURES 68 Fact 7.2 The network realization for the following function E ? = i W{Xi - t if J2i=1 w{Xi > t 0 otherwise (7.3) is y =< x x , x 2 , . . . , x n \wuw2,...,wn > ' (7.4) Fact 7.3 The network realization for the following function \ i ifU=i^i>t ,7_, y = \ (7-5) [ 0 otherwise is y =n x 1 , x 2 , ...,xn | wt,w2, ...,wn \u00E2\u0080\u00A2 * (7.6) Now that we have established network realizations for the above three simple func-tions, we can work on network realization of more complex functions. 7.1 Realization of SISO Functions The procedure of deriving a network to realize a SISO function / is as follows: step 1. Call decompose(/); step 2. Compose all network realizations to form a network realization of / . step 3. Apply theorems in Chapter Six to simplify the resulting network. In the above procedure, decompose(/) is a recursive subroutine: decompose(/): 1. Decompose the function / as / = fo 0 ( / 1 , / 2 , - , / n ) Chapter 7. DERIVATION PROCEDURES 6 9 such that f0 has a known network realization; 2. For all i \u00E2\u0082\u00AC { 0 , 1 , 2 , n } , if fi is a realized function, replace it with its network realization; else, call decompose(/,) 3. Return. The derivation procedure may be viewed as consisting of two stages: step one is the analysis stage, and steps two and three are the synthesis stage. The use of this procedure is demonstrated through the following examples. Functions in these examples are real unless otherwise specified. Example 7.1.1 Given a function F whose output is either 0 or 1 , find a network realization of its complement F. S O L U T I O N Since F = l - F ( 7 . 7 ) according to Fact 7 . 1 , the network realization of F is F = c F | - I D \" 1 ( 7 . 8 ) t) Example 7.1.2 Find a network realization of the following function a if x > t 0 otherwise ( 7 . 9 ) S O L U T I O N Let 1 iix>t 0 otherwise ( 7 . 1 0 ) then V = a / i ( * ) ( 7 . 1 1 ) Chapter 7. DERIVATION PROCEDURES 70 which can be decomposed to 2/ = / o \u00C2\u00B0 / i (7.12) where / 0 = a fa. F rom Fact 7.3 and Fact 7.1, we know both functions / 0 and / i have known network realizations as fo=Cfi\aD\u00C2\u00B0 (7.13) and fi(x) =C x3* (7.14) Therefore, the network realization of function (7.9) is y = C C xZi* | a D\u00C2\u00B0 (7.15) Example 7.1.3 F i n d a network realization of the following function S O L U T I O N Let y = s a if t2 > x > ti 0 otherwise a if x > t\ 0 otherwise (7.16) (7-17) and a if x > t2 0 otherwise then which can be decomposed to y(x) = fi(x)-f2(x) y = fo\u00C2\u00B0 (/i,/2) (7.18) (7.19) (7.20) Chapter 7. DERIVATION PROCEDURES 71 where / 0 = fi \u00E2\u0080\u0094 f%. From Fact 7.1, fo can be realized by a linear neuron, / o = C / 1 , / 2 | l , - l D \u00C2\u00B0 (7.21) From previous example, both fx and / 2 can be realized as h{x) = C C x3h | oc D\u00C2\u00B0 (7.22) and / 2(x) = C C x3t2 \ a D\u00C2\u00B0 (7.23) Therefore, the network realization of y(x) is y(x) = C C C xl]h | a D\u00C2\u00B0, C C xZ)'2 | a D\u00C2\u00B0 | 1, - 1 D\u00C2\u00B0 (7.24) Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to y(x) =CC\u00E2\u0082\u00AC1 xii* 1 | a D \u00C2\u00B0 , c n xZi t2 | - a D\u00C2\u00B0D\u00C2\u00B0 (7.25) Applying Theorem 6.10 again, this network can be further simplified to y(x) =CO xZiu,\Zx-3t2 | a , - a D\u00C2\u00B0 (7.26) tt E x a m p l e 7 .1 .4 Find a network realization of the following function a if x > t 0 otherwise (7.27) SOLUTION Function y(x) can be decomposed to y = foof1{x) (7.28) where fo = + -t 0 otherwise F r o m Fact 7.1, we know that F r o m Fact 7.3, we know that fo =C fi | - a D' fi(x) = C x | - 1 \u00E2\u0080\u00A2 -t Substituting this to (7.31), we have y(x) = f0(x) = C C x\ - 1 | - a D~a Example 7.1.5 F i n d a network realization of the following function S O L U T I O N Let and then a if x = t 0 otherwise A W = { a if x > t 0 otherwise a if x > t 0 otherwise y(x) = /i(x) - / 2 ( x ) Chapter 7. DERIVATION PROCEDURES 73 which can be decomposed to V = / o \u00C2\u00B0 ( / i , / 2 ) where fo = fi \u00E2\u0080\u0094 / 2 , whose network realization according to Fact 7.1 is / o = C / i , / 2 | l , - l D\u00C2\u00B0 (7.38) (7.39) We know that both /i(x) and /2(x) have network realizations (see Example 7.1.2 and Example 7.1.4) as / i (x) =CC x \u00E2\u0080\u00A2* |oO\u00C2\u00B0 (7.40) and / 2 (x) =CC x | - 1 | - a D~a (7.41) Substituting (7.40) and (7.41) to (7.39), we have y(x) = /o(x) = C C C x \u00E2\u0080\u00A2* | a D\u00C2\u00B0, C C x | - 1 | - a D \" a | 1, - 1 D\u00C2\u00B0 (7.42) Applying Theorem 6.10 and Theorem 6.11, we can simplify this network to y(x) = C C x C x | - 1 | a,a Da . (7.43) It can be rather tedious to follow the procedure explicitly, from now on we only follow it implicitly, and put more emphasis on the simplification process. Example 7.1.6 If function (7.34) is an integer function, find its network realization. S O L U T I O N Let a if x > t 0 otherwise (7.44) and a if x > t + 1 0 otherwise (7.45) Chapter 7. DERIVATION PROCEDURES 74 then whose network realization, according to Fact 7.1, is y= 0 0 otherwise (7.67) Chapter 7. DERIVATION PROCEDURES anc \u00E2\u0080\u0094x if \u00E2\u0080\u0094x > 0 0 otherwise then y(x) = fi(x) + f2(x) According to Fact 7.1 and Fact 7.3, the network realization is y = C < x > \u00C2\u00B0 , < x | - 1 > \u00C2\u00B0 D \u00C2\u00B0 Example 7.1.10 Find a network realization of the following function ax + b if x > t 0 otherwise S O L U T I O N Let and f2(x) = { -t iix>t otherwise 1 if x > t 0 otherwise then y(x) = afi(x) + (at + 6)/2(x) Recall Fact 7.1, Fact 7.2, and Fact 7.3, we have y = C < x >*, C x \a,at + b D\u00C2\u00B0 Chapter 7. DERIVATION PROCEDURES 78 Example 7.1.11 Find a network realization of following function y= < x if F = 1 0 otherwise (7.76) where F is a function whose output is either 0 or 1. S O L U T I O N Let x-a if F = 1 0 otherwise (7.77) where a < x Vx \u00C2\u00A3 X, and A\" is the definition domain of function (7.76). Then y(x) = f.ix) + aF whose network realization, according to Fact 7.1, is y=CMx),F\l,aD\u00C2\u00B0 fi(x) can be expressed as AGO = where b > x \u00E2\u0080\u0094 a Vx \u00C2\u00A3 A\ Let x - a - f e F iix-a-bF>0 0 otherwise then according to Fact 7.2, and, according to Fact 7.1, / u (x ) = x - a - bF fi(x) =< fn(x) >\u00C2\u00B0 (7.78) (7.79) (7.80) (7.81) (7.82) / n (x ) = C z , F | l , - & D \u00C2\u00B0 (7.83) Chapter 7. DERIVATION PROCEDURES Hence, h{x) =< x,F\l,-b>a The network realization of F, according to Example 7.1.1, is F = c F | - l y 1 Substituting (7.85) to (7.84) yields Mx) =a+b Substituting this to (7.79) yields y=Ca+b,F\l,aD\u00C2\u00B0 tt Example 7.1.12 Find a network realization of the following function x if F = 0 0 otherwise where F is a function whose output is either 0 or 1 SOLUTION According to the previous example, the network realization y =C< x,F | l , 6 > \u00C2\u00B0 + b , F | l , a D\u00C2\u00B0 and recall Example 7.1.1, we know that F=cF\ - ID-1 Substituting this equation to (7.88) and after simplification, we have y =C< x,F\ l , - f e> a , F | 1,-a D~a y = s Chapter 7. DERIVATION PROCEDURES 80 7.2 Realization of M I S O Functions This section shows the network realization of MISO functions. The procedure of network derivation of MISO functions is as follows. step 1. Decompose function / as / = / i \u00C2\u00B0 h where fr is a linear function; step 2. Proceed with the procedure for realizing SISO functions to find a network realization of fY; step 3. Compose the network realization found in step 2 and the network realization of /L to form the network realization of / ; step 4. Using theorems in Chapter Six to simplify the resulting network. The use of this procedure is shown through the following examples. Example 7.2.1 Find a network realization of the following function / :Z>?-2> 2 (7.91) where \u00C2\u00A3>, is a finite set of numbers. SOLUTION We consider the following two cases. Case I: / is a linear function. Since we can express / as f = Y,biXi + b0 (7.92) t=i where Xi \u00C2\u00A3 X^. According to Fact 7.1.1, we have f(x)=Cxux7,...,xn \bub2, ...A Db0 (7.93) Case II: / is a non-linear function. Chapter 7. DERIVATION PROCEDURES 81 Here we introduce a linear function as follows fL : V3 (7.94) and a finite non-linear function fi : X>3 - V2 (7.95) We now find the network realizations for both / i and fr,, and then compose them to form the network realization of / . Let Mi\u00C2\u00B1\Di\ V i e {1,2,3} and Zi denotes an element in V3, and a,- = f\{zi). According to Example 7.1.7, the network realization of fi is fi{z) = C C 2 D * 1 , \u00E2\u0080\u00A2 2 | - 1 I T * 1 , C z Z]ZM3, \u00E2\u0080\u00A2 2 | - 1 \u00E2\u0080\u00A2 - * w 3 (7.96) | Ol, \u00C2\u00AB1, % 3 ) Z M 3 ^ a where a = a t - According to Fact 7.1, z = JL(X) = C x1,x2,...,xn\bl,b2,...,bn Dbo (7.97) Substituting (7.97) to (7.96) and after simplification, we have y = C C x i , ...,xn\bi, ...,bn I f 1 , C X i , . . . ,x n| - 6 i , - b n Zi~ci, \u00E2\u0080\u00A2 xi , . . . ,x\u00E2\u0080\u009E|&i, ...,&\u00E2\u0080\u009E U c \u00C2\u00AB 3 , C xj , . . . , x \u00E2\u0080\u009E | - & i , . . . , - & \u00E2\u0080\u009E \u00E2\u0080\u00A2 \" ^ (7.98) where c, = z; + fe0- if This example shows that any multi-variable finite function can be realized by a LQT network with a depth at most of 2 and a size at most of 2Ma(n + 1). It leads to the following theorem. Chapter 7. DERIVATION PROCEDURES 82 Theorem 7.2 Any multi-variable finite function is realizable. Since a pixel takes integer value from 0 to 255, any image processing technique can be expressed as finite functions. According to this theorem and Theorem 7.1, it is clear that the LQT networks are capable of realizing any image processing technique. In Example 7.2.1, we made an assumption / = f\ o //,, here we show under what condition this assumption is true. Denote f2 = / i o / L , we have the following theorem. Theorem 7.3 The necessary condition for f = fi is thatVxi, x2 E if f{x\) 7^ f(x2), then fL(xx) ^ / L ( X 2 ) . P R O O F If there exist Xi, x2 E Z>\" s u c n t n a t f(xi) / f(x2) but /L(\u00C2\u00A3*I) = /L (^ 2 ) , then fx o = /1 0 h(x2)- This implies that / 2(x x) = f2(x2), hence / 2 ^ / . I Corollary 7.3.1 ITie necessary condition for f = f2 is M3 > M2. P R O O F If M3 < M2, then there must exist a pair x a , x2 E T>\u00E2\u0084\u00A2 such that f(xi) ^ f{x2) but fL(xi) = h(x2) I Theorem 7.4 Functions ft and fL are not unique. P R O O F This theorem will be shown to be true in some of the following examples. I As shown in Example 7.2.1, the size of the network realization of a non-linear multi-variable finite function is proportional to M3. Therefore, according to Corollary 7.3.1, the best projection function /j, is the one whose codomain size (M3) is equal to the codomain size (M2) of the non-linear function itself. Here, we introduce a projection function which is proper for all multi-variable finite integer functions. z = Y,xiM1 i-1 ' (7.99) 8 = 1 Chapter?. DERIVATION PROCEDURES 83 Such a function is called one-to-one projection since / L ( X I ) ^ / L ^ ) iff \u00C2\u00A31 7^ %2 Vx*i, x2 \u00E2\u0082\u00AC which implies M 3 = M x n . Another projection function, which is proper only for a certain class of finite integer functions, is as follows z = X > < (7.100) which has the property that Af3 = n(M\ \u00E2\u0080\u0094 1) + 1. We will refer to functions (7.99) and (7.100) as projection I and projection II. The following examples show network derivation of some multi-variable finite integer functions. Example 7.2.2 Find a network realization of the AND function y \u00E2\u0080\u0094 Xi A x2 A . . . A xn (7.101) SOLUTION Using projection I /L{X) = \u00C2\u00A3\"=1 2 , _ 1a; t-, the AND function can be ex-pressed as V = fx o h{x) (7.102) where ' 1 if z > 2 n - l AW = (7.103) 0 otherwise which, according to Fact 7.1, has the following network realization / i ( z ) = C z D 2 \" - 1 (7.104) According to Fact 7.1, the network realization for J L { X ) is h(x) = C xux2, xn I 1 , 2 , 2 \" \" 1 D \u00C2\u00B0 (7.105) Substituting it to (7.104) yields y = C C x u x 2 , x n I 1 , 2 , . . , 2 \" \" 1 D \u00C2\u00B0 D 2 \" \" 1 (7.106) Chapter 7. DERIVATION PROCEDURES 84 which, according to Theorem 6.10, can be further simplified to y=CI :r i ,S2, - . . , *\u00C2\u00BB | l , 2 , . . , 2 \" - 1 z T - 1 (7.107) The resulting network has a depth of 1 and a size of n. If we use projection II / L ( X ) = 5Z\"=i and chose 1 if z > n 0 otherwise then Since and hence (7.108) y = fiofL(x) (7.109) h(z) =C ^ \u00E2\u0080\u00A2\u00C2\u00BB (7.110) h{x)=CxuX2,...,xnD0 (7-111) y = d x i , x 2 , . . . , s B (7.112) The resulting network also has a depth of 1 and a size of n. (J Example 7.2.3 Find a network realization of the OR function y = Xl V x 2 V,. . . , Vx n (7.113) SOLUTION Using the projection I / L ( X ) = 2 ,_1x,-, the OR function can be expressed as y = fxofL(x) (7.114) where ' 1 if z > 1 Mz) = { ~ (7.115) 0 otherwise Chapter 7. DERIVATION PROCEDURES 85 From previous example, the network realization can be easily derived as If projection II is used, then we have the following network y = C xi,x2, ...,xnZi Note that both the two networks have a depth of 1 and a size of n. ft Example 7.2.4 Find the network realization of an XOR function y \u00E2\u0080\u0094 X\ \u00C2\u00A9 x2 \u00C2\u00A9 ... 0 xn (7.116) SOLUTION Using projection II JL(X) = \u00C2\u00A3\"=i xi-> t n e XOR function can be expressed as w here Let then Since y = / i \u00C2\u00B0 h(x) In = 1 if z is odd 0 if z is even 1 if z > i 0 otherwise /i(*) = E ( - i r 7 i , ( * ) t=i hi = C z (7.117) (7.118) (7.119) (7.120) (7.121) hence h(z) = C C z l l ' ^ z D 2 , C z | 1,-1, (-I)*- 1 3\u00C2\u00B0 (7.122) Chapter 7. DERIVATION PROCEDURES 86 5ince h(x) \u00E2\u0080\u0094C xi,x2,xn D \u00C2\u00B0 (7.123) we have (7.124) y = C C xi,x2,xn Z]1, C #i, x 2 , a ; n \u00E2\u0080\u00A2 2 , \u00E2\u0080\u00A2 x l 5 x 2 , x n I c \u00E2\u0080\u0094 i ) \" - 1 ^ \u00C2\u00B0 The resulting network has a depth of 2 and a size of ra2 + n. For the sake of comparison, let us use projection I. Since it is difficult to express the non-linear function / i for an arbitrary n , we only show a special case when n \u00E2\u0080\u0094 3. Let fL(x) = J2?-1*i and AC*) = t ' = l 0 z = 0 1 z = 1 1 z = 2 0 z = 3 1 z = 4 0 z = 5 0 z = 6 1 z = 7 then y(s) = / i \u00C2\u00B0 h(x) According to Example 7.1.8, we have (7.125) (7.126) (7.127) fi(z) = c r z z z i 1 , r z z z i 3 , r z z z i 4 , r z z z i 5 , r z z z i 7 |.i,\u00E2\u0080\u00941,1,\u00E2\u0080\u00941,1 D \u00C2\u00B0 (7.128) Chapter 7. DERIVATION PROCEDURES 87 hence, y = C C Xi,x2,x3 | 1,2,4 X i , x 2 , x 3 | 1,2,4 Z l 3 , C Xi, x 2 , x 3 | 1,2,4 C x ! , x 2 , x 3 | l , 2 , 4 D 5 , C I x 1 , x 2 , x 3 | 1,2,4 | 1 , - 1 , 1 , - 1 , 1 D\u00C2\u00B0 Note that this network has a size of 20 while the network derived using projection II has a size of 12 (n = 3). Remark Example 7.2.4 shows that using projection II results in smaller network size, this is generally true. Although in both Examples 7.2.2 and Example 7.2.3, using projection II has no advantage in terms of network depth and network size, the magnitude of weights is much smaller. This is important in hardware implementation, since the value range of the weights have to be limited. Chapter 8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S In this chapter, image enhancement (IE) techniques are re-categorized according to their realization difficulty into two classes: linear and non-linear filtering techniques. Examples of realizing some of the techniques are provided here to show the effectiveness of the algebraic derivation approach. Since the linear techniques are easy to realize, they are treated briefly with two examples; the rest of this chapter is devoted to the realization of non-linear techniques. 8.1 Network Realizations of Linear Techniques A two-dimensional image can be projected to a one-dimensional image. For instance, an image U whose size is K x L can be mapped to a one-dimensional image g with the following relationship 9i = Ukji Vfce {1,2, ..,#}, V/G{1,2,...,I} (8.1) where i \u00E2\u0080\u0094 (k \u00E2\u0080\u0094 1)L + I. This mapping is one-to-one, that is, U can be uniquely mapped back from g using equation (8.1). For all the linear filtering techniques, the enhancement process can be expressed, in matrix form, as f = E g + t (8.2) where g is the image to be enhanced and f is the enhanced image. For an input image 88 Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 89 of size n and an enhanced image of size m, the network realization of this function is fi = C gii92i\u00E2\u0080\u0094 ,9n | e n ,e 1 2 , . . . ,e l n Dh h \u00E2\u0080\u0094 C g1,g2,...,gn | e 2i, e 2 2 , e 2 n D < 2 fm \u00E2\u0080\u0094 ^ 9li92i \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2>9n | emli em2i \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2> emn 3* A more compact notation of this network is f = C g | E D * The architecture of (8.3) is shown in Figure 8.1. en (8.3) (8.4) Figure 8.1: Network Architecture for Linear Filtering Now, for the linear filtering techniques, the task left is to map those techniques to the form shown in (8.2). The following two examples illustrate such mapping. Example 8.1 Find a network realization of the averaging filter. SOLUTION Suppose the following image U Ul,l \u00C2\u00AB 1 , 2 \"1,3 Ul,4 \"2,1 \"2,2 \"2,3 \"2,4 \u00C2\u00AB3,1 \"3,2 \"3,3 \"3,4 \"4,1 \"4,2 \"4,3 \"4,4 (8.5) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 90 is to be enhanced by the equally weighted averaging filter of (2.10) and the window of the filter is 3 x 3, then the processing, in the matrix form, is where with 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 f \u00E2\u0080\u0094 (/l\u00C2\u00BB/2, \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2,/l6)T g = {gi,g2,--,gi6)T g = A g (8.6) (8.7) (8.8) 9i = uKl Vfc,Ze {1,2,3,4} Comparing (8.6) with (8.2), we obtain the weight matrix and the threshold vector as Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 91 follows E = A t = 0 where 0 represents the zero vector whose elements are all zeros, ft Example 8.2 Find a network realization of the zooming technique. S O L U T I O N Suppose the following image U = (8.9) (8.10) \"1,1 \"1,2 \u00C2\u00AB 2 , 1 \"2,2 is to be magnified two times by the the replication technique described in Section 2.2 such that the resulting image has the size of 4 x 4, then the zooming process, in the Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 92 matrix form, is where with f = 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 g = R g f \u00E2\u0080\u0094 ( / i , h, fie)T (8.11) (8.12) (8.13) 9i = Uk,i Vfc,/ \u00E2\u0082\u00AC {1,2} Comparing (8.11) with (8.2), we obtain the weight matrix and the threshold vector as Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 93 follows E = R t = 0 (8.14) 8.2 Network Realizations of Non-linear Filters There is no unified theory of non-linear filtering techniques. As a result, there is no universal network structure which can be used for all of them as in the case of linear filters. 8.2.1 Dynamic Range Modification Contrast stretching, clipping, thresholding, and window slicing are techniques which modify the dynamic range of a given picture. For this group of techniques, the processes have a generalized form as follows. axx + b\ > x > t0 a2x + b2 t2> x >ti V anx + bn tn > x > The derivation of the network realization of above function is as follows. Let a'{x + 6- x > t{_i 0 otherwise M*) = { \/i E {l,2,...,n} (8.15) (8.16) where Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 94 with It can be verified that \u00C2\u00ABo = Oo = U V = / o \u00C2\u00B0 ( / i , / 2 , ...,/\u00E2\u0080\u009E) where fo = YA=I fi, whose network realization is fo = C / l , / 2 , \u00E2\u0080\u0094,fn 3\u00C2\u00B0 According to Example 7.1.10, we know that fi(x) = C < x >u~\ C X-3u~ l | Ci,di D\u00C2\u00B0 (8.18) (8.19) (8.20) (8.21) where c, = a\ and di = (a(f,_i + &(\u00E2\u0080\u00A2). Substitute this to equation (8.20) , we obtain the whole network as follows y = C / i , / 2 , ...,/\u00E2\u0080\u009E Z>\u00C2\u00B0 {/,\u00E2\u0080\u00A2 = C M,1,M\u00C2\u00BB2 I c \u00C2\u00AB ' ^ \u00C2\u00AB 3 \u00C2\u00B0 = < x > ti~1 [ Mi2 = rz x z i ' - 1 (8.22) Applying Theorem 6.10, the above network can be simplified to y \u00E2\u0080\u0094 C Mll,/i 2l,---,Mnl,Ml2,M22,---,Mn2 | Ci, C2, ..., C n, dX , d 2 , d n 3\u00C2\u00B0 ^ = < x >\u00C2\u00AB\u00E2\u0080\u00A2\u00E2\u0080\u00A2-! (8.23) Hi2 = r z X D ' - 1 Based on the network model of (8.23), the network realization of the contrast stretch-ing technique (see section 2.2.1) is v = c < u >\u00C2\u00B0, < u > a,< u > b, r z \u00C2\u00AB z i a , r z u3 b \ a , (/? - a ) , ( a - fl + 7), (va - a/3), (vb - va + a/3 - 67) Z)\u00C2\u00B0 Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 95 Similarly, the network realization of the clipping is v =C< u >\u00C2\u00B0, < u >b, C C u3b | /?, -/?, va, (vb - va) D\u00C2\u00B0 (8.24) Finaly, the network realization of the thresholding is (8.25) 8.2.2 Order Statistic Filtering Order statistic filtering is a simple yet powerful technique widely used in signal processing called the k th order statistic of the array. An order statistic filter is a linear combination of the order statistics. Variation in the weights results in various members of the order statistic filter (OSF) family, some of which have been intensively studied and widely used in signal processing. Median filters is one example [77, 46, 38, 68]. Order statistic filters have the virtue of being very simple and having the edge preserving property. Their performance is, if not optimal, satisfactory in most cases. To formulate the problem of order statistic filtering in such a way that a network realization can be derived, some functions need to be introduced. They are the rank function, the comparison function, and the order statistic function. Definition 8.1 For X = (xx, x 2 , x n ) , the rank function of Xi is [11, 56]. The core operation is to find the k th largest element in an array. This element is ran k(Xi :X) = k, if X{ is the k th largest element in X (8.26) Definition 8.2 For X \u00E2\u0080\u0094 (xx, x 2 , x n ) , the comparison function for x,- and Xj is c(Xi,Xj) = < 0 Xi > Xj 0 Xi = Xj i > j 1 Xi = Xj i < j 1 Xi < Xj (8.27) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 96 Definition 8.3 For X = (a;1} x 2 , x n ) , the order statistic function is os(k : X) = Xi if rank(x< : X) = k (8.28) From the above definitions, we have the following theorem. Theorem 8.1 The rank function rank(xi : X) and the sum of comparison functions related to x,- have the following relationship, n rank(xi : X)\u00E2\u0080\u0094 ^2c(xi,Xj) + I j / i (8.29) The core operation in order statistic filtering is to to find the k th largest member of an array of pixels. For instance, the median filtering is to find the { r^) t h element of the input array. Therefore, it is necessary to derive the network realization of the order statistic function first. Let i m = Xi if rankfx,- : X) = k V ' V i e {1,2,...,n} (8.30) 0 otherwise then it is trivial to show that os ( * :X) = (8-31) \u00C2\u00BB'=i that is, os(k:X) = f0o(f1,f2,...,fn) (8.32) where /o = E / i (8-33) which can be realized by a linear neuron as / o = C / i , / 2 , . . . , / \u00E2\u0080\u009E D \u00C2\u00B0 (8.34) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 97 Let F(Xi) \u00C2\u00B1 { 1 if rank(x,- : X) \u00E2\u0080\u0094 k 0 otherwise Vi \u00E2\u0082\u00AC {l,2,...,n} (8.35) then, , Xi if F(xi) = 1 fi(x) = { ; ViG{l ,2 , . . . ,n} 0 otherwise According to Example 7.1.11, function fi(x) can be realized as Mx) =C< xu F{xi) | 1, b >a+b, F(xi) | 1, a D\u00C2\u00B0 (8.36) (8.37) where a < min{min{xi},min{a;2},min{a;n}} and b > max{max{ii}, m a x{ i2} max{in}} \u00E2\u0080\u0094 a. According to Example 7.1.8, the network realization of F(xi) is F(xi) =CC rank(xt- : X) \u00E2\u0080\u00A2 f c , C rank(x,- : X) Zik+1 | 1, -1 D\u00C2\u00B0 According to Fact 7.1, the network realization of the rank function is (8.38) Since rank(xi : X) =C c(xi,xi), ...,c(xi,xi_1),c(xi,xi+1), ...,c(xi,xn) D 1 \u00C2\u00ABZ/j ^ x% c(xi,Xj) - \ if z > j 0 otherwise - l (8.39) (8.40) and 1 Xj > Xi if i < j 0 otherwise according to Example 7.1.4, and Fact 7.3, the network realization of c(xi,Xj) is C C Xi, Xj | \u00E2\u0080\u0094 1,1Z3\u00C2\u00B0 j - 1 D - 1 if i > j (8.41) C Xi,Xj | - 1, I n 0 if i < j i \u00E2\u0082\u00AC {l,2,...,n} (8.42) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 98 Therefore, the network realization of the order statistic function is V = C / i , / 2 , ...,/\u00E2\u0080\u009E D\u00C2\u00B0 {/,-=C Pi,F(Xi) | l , a + 6 {F(xi) = C Hiufii2 I 1,-1 D\u00C2\u00B0 pu = LZ rank(xj : X)Z3k pi2 = C rank(xj : X)3k+1 {rank(xj : X) =C d x (x j , X i ) , ^ ( X J , X i_ j ) , C 2 (XJ, xi+1),C2(XJ, x\u00E2\u0080\u009E) D _ 1 {a\(xj, Xj) =C c^x^Xj) | - 1 D _ 1 Ci (x ,- ,X j ) = Cx,-,Xj | 1, \u00E2\u0080\u0094lZZf c 2(x t\",Xj) \u00E2\u0080\u0094 LZXj, Xj | l , l d where i \u00C2\u00A3 {1,2, ...,n}. The above network can be simplified to V \u00E2\u0080\u0094, C Pl,Pll, Hl2, A*2, H21, H22-, A*n, Pnl, Pn2 | 1, O, ~\u00C2\u00AB, 1, \u00C2\u00AB, ~ \u00C2\u00AB , 1, \u00C2\u00AB, ~\u00C2\u00AB 3\u00C2\u00B0 {/X,- =< X,-, / i j i , ^ j 2 | 1, 6, -6 > a + 6 /^ti \u00E2\u0080\u0094 CZ CJI , . . . , C j j _ j , Cj j_|_i ,...,Cj > n | 1,..., 1,1,...,Id fii2 \u00E2\u0080\u0094 C CJI, C j ( j _ i , C t , i + i , C j j n I \u00E2\u0080\u0094 1 , \u00E2\u0080\u0094 1,1, lZZ\ k 1 + 1 (8.43) (8.44) id. CXi,Xj | l , - l n \u00C2\u00B0 if i > j C X J , X J | - 1,1D\u00C2\u00B0 if i < j This network has a depth of 4 and a size of 2n(2n + 1). It is shown in Figure 8.2 for the case n = 3. This network is called OSnet and is denoted as OSnet(k,n), where n is the number of inputs and k means that the network's output is the k th largest element of the input array. This network's schematical representation is shown in Figure 8.3. The OSnet can be made adjustable by changing Hij LZ Ci\, ..., Cj ti\u00E2\u0080\u0094\, , \u00E2\u0080\u00A2\u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Q,n | 1,1,...,lZ] k-i+j-1 (8.45) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 99 Figure 8.3: The Schematical Representation of OSnet Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 100 1 Xi x2 r\u00C2\u00A9 Figure 8.4: The Schematical Representation of Adaptive OSnet where j \u00E2\u0082\u00AC {1,2}, to ,%+\i \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2> c t , n , X s (8.46) | - 1 , . . , - 1 , 1 , . . . , 1 , - 1 ^ ' \u00E2\u0080\u0094 1 where xs is used to change the value of k. This new network model is called adaptive OSnet. Its schematical representation is shown in Figure 8.4 Median Filtering Median filtering is a simple operation in which the median value of the data within a sliding window centered at a point is taken as the output of that point. Median filtering has been shown to be effective in suppressing impulsive noise components while preserving edges of a signal. It has been applied in several areas of digital signal processing which include speech processing and image enhancement. For an array of (2m + 1) numbers, the median is the (m + l) th largest number. Therefore, an OSnet(m + 1,2m + 1) can implement the median filter. Detailed account of this network can be found in [83]. Separable Median Filtering. When median filtering is extended to two or more dimensions, signal features are generally distorted or deleted. Separable median filtering proposed in [68] yields somewhat better results in preserving signal features. Separable median filters consist of two types of one-dimensional median filters \u00E2\u0080\u0094 one oriented in the horizontal direction and the other in the vertical. More explicitly, the Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 101 output value, y(l,k), at position (l,k) is given by v(l, k) = median{z(/ \u00E2\u0080\u0094 L, k),z(l, k),z{l + L, k)} where z is defined as z(p, q) = median{u(p, q-K),..., u(p, q),u(p, q + K)} (8.47) (8.48) and u(l, k) are the sample values of the input signal. A separable median filter with a window size of (2L + 1) x (2i^+l) can be implemented by using (2L + 2) or (2K + 2) OSnets depending on whether the horizontal direction or the vertical direction is median filtered first. The network of a separable median filter with the horizontal direction filtered first is shown in Figure 8.5. u(l-LJc-K) u(l-LJc) u(l-L,k+K) u(l,k-K) u(l,k) u(l,k+K) u(l+L,k-K) u(l+L,k) u(l+L,k+K) z(l-LJc) z0\u00C2\u00BB 0 v(W z(l+L,k) Figure 8.5: A Max/Median Filter Realized by Using OSNets Max/Median Filtering. Max/Median filtering is proposed by Arce and McLough-lin [8] in order to achieve better preservation of signal features. In the two dimensional case, if samples in lines separated by 45\u00C2\u00B0 is taken into account, the Max/Median filtering is defined as y{l, k) = max{z(s, 1), z(s, 2), z(s, 3), z{s,4)} (8.49) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 102 where z(s,l) = median{a(/, k - M), ...,a(l,k), ...,a(/, k + M)} z(s,2) = median{a(/ \u00E2\u0080\u0094 M, k), ...,a(/, k), ...,a(l + M, k)} z(s,3) = median{a(/ + M,k \u00E2\u0080\u0094 M),a(l,k),a(l - M,k + M)} z(s,4) = median{a(/ - M , k - M ) , ...,a(/, fc), ...,a(/ + M, k + M)} where a(/, A;) is the pixel's value of the input picture at position (/, k), and y(l, k) is the output of the filter at the same position. A Max/median filter with a window size of (2M + 1) x (2Af + 1) can be realized by five OSNets as shown in Figure 8.6. aG^-M) \u00E2\u0080\u0094r-a(W \u00E2\u0080\u0094 T aOJc+M) \u00E2\u0080\u0094 -a(l-M,k) \u00E2\u0080\u0094 r aQ+MJc) \u00E2\u0080\u0094 -a(l+MJc-M) \u00E2\u0080\u0094 --4 a(l-MJc+M) \u00E2\u0080\u0094' -a(l-MJc-M) \u00E2\u0080\u0094 -a 0 \u00C2\u00AB \u00E2\u0080\u0094f-a(l+MJc+M) \u00E2\u0080\u0094 L z(s,D z(s,2) z(s,3) z(s,4) Figure 8.6: A Max/median Filter Realized by Using OSNets M a x / M i n Filtering Taking neighborhood min or max is a generalization of shrinking or expanding the l's in a two-valued picture. Iterated local min followed by iterated max can be used to remove high-valued blemishes on a low-valued background. Similarly, iterated local max Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 103 followed by iterated local min can be used to remove low-valued blemishes on a high-valued background. Finding the maximum of an array is equivalent to finding the 1st largest element, therefore, the OSnet(l,n) can be used. To find the minimum of an array of size n is equivalent to find the n th largest element, thus the OSnet(n,n) can be used. There-fore, the Max/Min filtering can be implemented by using OSnet(l,n) and OSnet(n,n) iteratively. Adaptive Order Statistic Filtering A type of adaptive order statistic filters which are called comparison and selection filters (CS filters) is proposed by Lee and Fam [56]. The output yi of the CS filter with parameter h at position / through the input array Xi = (x/_m, ...,x/ + m) is defined as where u\ and v\ are the sample average and sample median respectively, and h is an integer satisfying 1 < h < m. Vi = < os(m + 1 + h : Xi) if u; > V[ os(m + 1 \u00E2\u0080\u0094 h : X{) otherwise (8.50) Let 1 if ui > vi (8.51) < 0 otherwise and fi A os(m + l + h:Xi) if F = 1 (8.52) < 0 otherwise (8.53) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 104 then, function (8.50) can be rewritten as yi = fi + h (8-54) which can be realized by a linear neuron as follows yi=Ch,f2D\u00C2\u00B0 (8.55) According to Example 7.1.11, we have / i =C< os(m + l + h: X{),F | 1,6 > \u00C2\u00B0 + 6 , F\l,aD\u00C2\u00B0 (8.56) and according to Example 7.1.12, we have f2 =c< os(m + l-h: Xt), F | 1, -b >a, F | 1, - a D~a (8.57) In the above two equations, a < min{os(A; : Xi) : k \u00C2\u00A3 fc} and 6 > max{os(A; : X{) : A; \u00C2\u00A3 /C} \u00E2\u0080\u0094 a, where fc = {1,2,...,2m -f- 1}. Since os(& : X/) > 0 VA: \u00C2\u00A3 fc, we choose a \u00E2\u0080\u0094 0. Consequently, network realizations of / i and f2 can be simplified to / i = 6 (8.58) and / 2= \u00C2\u00B0 (8.59) According to Fact 7.3, the network realization of F is F=LZuhv, | 1,-1 13\u00C2\u00B0 (8.60) Since the network realizations of ui and vi are already known, the network realization of the CS filter is y, = C / i , / 2 D \u00C2\u00B0 / i = 6 < / 2 = \u00C2\u00B0 (8.61) {F =C os(m + 1 : X,), | 1, -1 Z3\u00C2\u00B0 {vi =C X;_m, X/, X; + m | C, C, C D\u00C2\u00B0 Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 105 where c = l/(2m + 1). Three order statistic networks are needed to realize the above network. However, we can combine all the three networks together into one network and reduce the number of weights needed. Let us define a function as follows Z\ = os(m + 1 \u00E2\u0080\u0094 h : X\) z2 = F (8.62) z3 = os(m + 1 + h : Xi) where F is defined as in (8.51). Recall the network realization of the order statistics function described earlier, we can now combine three such networks to form a network realization for function defined in (8.62). After simplification, the network realization is as follows. Z\ \u00E2\u0080\u0094 C Mn, M12, \u00E2\u0080\u00A2 Min D\u00C2\u00B0 Z2 = (Z M21, M22, M2n, \u00C2\u00A3/-m, X l + m | 1, 1, 1, \u00E2\u0080\u0094 C, \u00E2\u0080\u0094 C ZJ\u00C2\u00B0 z3 = C M31) M32, M3n Z)\u00C2\u00B0 {pki =< a;/_m+t_i,MfcoMfc,- I 1, 6, \u00E2\u0080\u00946 >6 \u00C2\u00BB G K and fc j IZ x / _ m + , _ i , X ( _ m + j _ ! | - 1,1 if z < j where dk{ = m + 1 + (fc \u00E2\u0080\u0094 l)/i \u00E2\u0080\u0094 i , and n = 2m + 1, i.e., the number of input elements. This network model has a depth of 4 and a size of n(8n + 5). (8.63) i cij = i e\u00C2\u00A3 and fc \u00E2\u0082\u00AC {1,2,3} i,j b . /2 = < zz, z2 | 1, -b >\u00C2\u00B0 The whole network has a depth of 5 and a size of n(8n + 5) + 6. This network is shown in Figure 8.7. 8.2.3 Directional Filtering Although smoothing techniques can clear a noisy image to certain degree, a common side-effect they have is that edges get blurred. To alleviate the blurring effect, directional filtering is sometimes used. An example is the directional averaging filter described in Chapter Two. Suppose the directional averaging is done along four different directions, say, 0\u00C2\u00B0, 45\u00C2\u00B0, 90\u00C2\u00B0, 135\u00C2\u00B0, then for each direction, an average is calculated as v(m,n:9i) = u ( m - k , n - l ) , 0{ ,- : $) = 4 0 otherwise where $ = (^1? <^>2, 3, 4>A)- It is easy to show that 4 2/ = J2f i=i which can be realized by a linear neuron as follows. (8.67) (8.68) y = C / i , / 2 , / 3 , / 4 D \u00C2\u00B0 (8.69) Let A I 1 if rank(<^ ,- : $) = 4 I^i \u00E2\u0080\u0094 0 otherwise then, fi = S Vi ifFi = l 0 otherwise which, according to Example 7.1.11, has a network realization fi=Ca+b,Fi\l,aD\u00C2\u00B0 (8.70) (8.71) where a < min{min{t/,} : i \u00C2\u00A3 1} and b > max{max{y,} : i \u00C2\u00A3 X} \u00E2\u0080\u0094a. Since ?/,\u00E2\u0080\u00A2 > 0 Vi \u00C2\u00A3 X, we choose a = 0. Consequently, the above network can be simplified to fi=b (8.72) Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 108 (8.73) (8.74) i cij (8.75) The network realization of Fi is Ft =CC rank(& : $) ID4, C rank(fa : $) ZI5 | 1, - 1 Z>\u00C2\u00B0 Because rank(^ >,- : $) < 4, this network can be simplified to F{x{ : X) =LZ rank(& : *) ZI4 The rank function has a network realization as follows. rank(<^ : $) - C c t l , c , - , , - - i , c,-4 | - 1 , - 1 , 1 , 1 D~* LZ | 1,-1 Z)\u00C2\u00B0 iii>j < - l , l Z ) \u00C2\u00B0 if * < i The network realization of fa is fa =C< x, y{ | 1, -1 >\u00C2\u00B0, < x, y{ | - 1 , 1 >\u00C2\u00B0D\u00C2\u00B0 The network realization of ?/,\u00E2\u0080\u00A2 is Vi = C x-, x 2 , x f | d, d,d D\u00C2\u00B0 where where cZ = 1/iV, and x^ is the j th element of the sequence X , which is the set of pixels in the window W6i. Note that if N = 2M + 1, x = xf+1. Composing all the network realizations together, we get the network model for the directional averaging filter. After simplification, the network model is as follows. V = C / i , / 2 , / 3 , / 4 D\u00C2\u00B0 {/,- =< x], x 2 , ...x?, Fi | d, d , d , b >b i e l {Fi =C C i i , . . . , C i ) , - _ i , C i i j + i , . . . , C , - 4 Z) 4 {cfj =LZ u},p2,n),n) ZZ\\u00C2\u00B0 (8.76) (8.77) (8.78) A f + l ..,X AT \u00E2\u0080\u0094 d,\u00E2\u0080\u0094a\", 1 \u00E2\u0080\u0094 d,\u00E2\u0080\u0094 c? >\u00C2\u00B0 Mt2 = \u00C2\u00B0 where i , j G X. This network model has a depth of 5 and a size of 4(37V + 17). For the case N = 3, it is shown in Figure 8.8, where the connection weights are not shown. Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 109 Figure 8.8: A Network Realization of the DA Filter for the Case N = 3 Chapter 9 N E T W O R K R E A L I Z A T I O N S O F I R T E C H N I Q U E S Image restoration (IR) techniques are classified here, as in Chapter Eight, into two classes: the linear and the non-linear filtering techniques. The inverse filter, the Wiener filter, and so on belong to the first class; the Maximum Entropy filter and the Bayesian methods belong to the second. This chapter shows derivation of LQT networks for realizing some linear and non-linear filters, and a comparison with the Hopfield network which is commonly used for solving optimization problems. 9.1 Network Realizations of Linear Filters For most linear filters, the essential problem is to find the best estimate f such that the following cost function \u00C2\u00A3(f) = a(||g-Hf||2-||n||2) + ||Qf||2 (9.1) is minimum (see Section 2.2). The solution to this problem is f = ( H T H + A Q T Q ) x H T g (9.2) where A = 1/a. For function (9.2), the network realization, as shown in Chapter Eight, is f = C g | W D 0 (9.3) where W = ( H T H + A Q T Q ) - 1 H r (9.4) 110 Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 111 and 0 is a row vector of zeros. However, the complexity of image restoration stems from the difficulty of calculating the inversion ( H T H + AQTQ) \ This is the inversion of a large matrix. Suppose we are dealing with a picture of size 64 x 64 (a very small picture), the matrix ( H T H + AQTQ) has the size of 4096 x 4096. To reduce the computation complexity, Discrete Fourier Transform (DFT) techniques [26] or iterative methods are used [40] Here, we derive a network model which implements the steepest decent method, which is the simplest iterative method. The steepest decent method finds the optimal estimate by making an initial guess and approaching the true optimal estimate step by step. Denoting the estimate at time fc as f(fc), then the difference (f(fc + 1) \u00E2\u0080\u0094 f(fc)) is proportional to the opposite direction of the gradient of the cost function E at time k. The gradient vector of E is usually denoted as S7E, which at time k is VE(k) = -2(AQ TQ + H T H ) f + 2 H T g (9.5) Therefore, the estimate at time fc -f 1 is f(fc + l) = i(fc).+ \u00C2\u00A3((AQTQ + HTH)f(fc) - H T g ) = (I + aQ TQ + 6 H T H ) f (fc) - 8HTg (9.6) where 8 takes a small positive value. The optimal 8 is a function of f (fc). For the sake of simplicity, it is usually chosen as a constant [93]. The network realization of function (9.6) is f(fc + l )=Cf(fc )|W3 t (9.7) where W = I + \u00C2\u00A3AQ TQ + 8HTH (9.8) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 112 and t = 6 H r g (9.9) For an n x n, this network has a depth of 1 and a size of n 4 . The task of finding the network realization of linear filters is now reduced to find the weight matrix W and the threshold vector t, which only involve simple matrix operations such as transposition and multiplication. The following are network realizations of two well-known linear filters. Psudo-inverse Filter. For the inverse filter, the weight matrix and the threshold vector are W = I + 6HTH (9.10) t = <5Hrg Wiener Filter. For the Wiener filter, the weight matrix and the threshold vector are W = I + ffif-1^ + 6 H T H ^ t = SWg The matrix Rf is usually assumed diagonal, hence its inverse can be easily calculated. 9.2 Network Realizations of Non-linear Filters Similar, to the network realization of image enhancement techniques, there is no universal network architecture for non-linear image restoration filters. Here, we show the derivation of LQT networks for realizing some of these filters and the limitation of LQT networks. M a x i m u m Entropy Filter The cost function of a maximum entropy filter is J(f) = Flnf - A(||g-Hf|| 2 -||n|| 2 ) (9.12) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 113 The gradient vector is VJ(f) = lnf + 1 - A(2H r(g - Hf)) (9.13) where 1 = ( 1 , 1 , 1 ) T . Based on the steepest decedent method, the estimate at time (ib + 1) is f(k + l) = f(fc)-^(lnf(fc) + l + A(2H T(g-Hf(fc)))) = (I-2A<5HTH)f(fc) + (2<5AHTg-l)-<51nf(fc) (9.14) The realization of the term In f by using LQT networks requires a huge amount of neurons. One way to reduce the network size is to include a new type of neurons which have the logarithm as their activation function. Since the logarithm function can be easily implemented in analog circuits [64], it is not unreasonable to include such a neuron model. Let us denote the natural logarithm activation function as N. Function (9.14) can be realized by a network model shown in Figure 9.1. Where W = I - 2 A \u00C2\u00A3 H T H (9.15) and t = 2<5AHTg - 1 (9.16) M a x i m u m Likelihood Filter The cost function of a maximum likelihood filter is A*) = \" | ( g - *{Hf ^ R ^ g - s{Hi}) + lnp(g) + K (9.17) The gradient vector is VJ(f) = B ^ D R - ^ g - 5{Hf}) (9.18) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 114 fi(k) f.OO fn(k) where Figure 9.1: A Network Realization of the Maximum Entropy Filter D 4 D i a g { ^ | x = k} (9.19) and bi are the elements of the vector b = Hf\". The estimate at time (fc + 1) is i(k + l) = f(k) - 5 H r D R ^ 1 g + (5HTDR~15{Hf(A;)} = Wf(fc) - t + Qs{Hf(fc)} Assume the noise term n is independent, then ( R \" 1 ) 7 = R^ 1 , hence we have (9.20) m r D R ~ 1 s { H f } - m ' R ^ D s j H f } (9.21) The term Ds{Hf} = Diag(s'(b1),s'(b2),..,s'(bn))(s(b1),s(b2),..,s(bn))T = (s'(b1)s(h),s'(b2)s(b2),...,s'(bn)s(K))T (9.22) where s'(i,) = x = bi. The term s'(6,-)s(6,-) is a multiplication of two variables, whose network realization by LQT networks requires a huge amount of neurons. One way to reduce the network size is to use the sigma-pi neurons mentioned in Section 2.1. Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 115 In the case when s(.) is linear, function (9.20) is reduced to i(k + l) = i(k)-SRTR?g + 6H.TTL?s{H.i(k)} = (I + H T R~ 1 H)f(fc) - ^ H T R z 1 g (9.23) This filter then can be realized by the network model of (9.7) with W ^ I + H 3 ! ^ 1 ! ! (9.24) and t = 6RTR?E (9.25) M a x i m u m A Posteriori ( M A P ) Fi lter The cost function of a MAP filter is J&) = -\(s-s{Bi})TR?te-8{Ht}) - f ) T R7 1 (? - f) + lnp(g) + K (9.26) The gradient vector is VJ(f) = H ^ D R - ^ g - s{Ui}) - Kfii - f) (9.27) where D is defined as in (9.19). The estimate at time (k + 1) is f( ib+l) = f(k) \u00E2\u0080\u0094 ^ H T D R ~ 1 g + 6H T DR^s{Hf(fc)} + ^ R j 1 ^ ^ ) - f) = (I + ^Rj x ) f ( fc) - < 5R7 1 f -<5H T DRZ 1 g + (5HTDR^1s{Hi(fc)}(9.28) Because of the term 6H rDR^ 15{Hf(fc)}, we have the same problem as in the implemen-tation of the maximum likelihood filter. In the case when s(.) is linear, the MAP filter is reduced to i(k + !) = (! + SRJ1 + 6H TR^n)i(k) - ^ R j 1 ? + H T R^ 1 g) (9.29) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 116 This filter can be realized by the network model of (9.7) with W = I + SR-J1 + 6HTR^H (9.30) and t = ^Ry 1? + H ^ g ) (9.31) 9.3 Comparison with the Hopfield Network Zhou, Chellappa, et al. applied the Hopfield network to solve an image restoration problem [99]. The restoration criterion they chose is that of minimizing the following cost function E = i | | g - H f | | 2 + ^ | | D f ||2 (9.32) where || . || represents the L 2 norm, A is the Langrange multiplier, and f is the restored image. Such a constrained cost function is widely used in the image restoration problems [6] and is also similar to the regularization techniques used in early vision problems [73]. The Hopfield network can minimize the following energy function: EH = - i v T T v - 0Tv, (9.33) where T is the connection matrix, and 0, are the output and the threshold of the i th neuron respectively. By choosing the right connection matrix T and threshold vector 0, the minimization problem of the restoration can then be solved by the Hopfield network. The cost function (9.32) has to be modified to fit the Hopfield network model because an image vector is not binary. Since each pixel takes integer value (usually ranging from 0 to 255), it can be represented by a limited number of neurons, such as f, = J2cjVj (9.34) Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 117 An example is Cj = 2^ x \ and L = 8, which is a binary representation of an integer ranging from 0 to 255. Equation (9.34) can be expressed in the matrix form as f = C v (9.35) where C = ci,..,,ct, 0,...,0, 0,...,0 0, ...,0, c i , . . . , c i , 0, ...,0 0,...,0, 0,...,0, Substitute (9.35) to (9.33), we get Ci, ...,cL (9.36) E = i ( g - H C v ) r ( g - H C v ) + i A v T C T D T D C v = I ( g T g - 2 g T H C v + v T C T ( A D T D + H T H ) C v ) Minimizing E is equivalent to minimizing 1 E' = - ( v r C r ( A D T D + H T H ) C v - 2 g T H C v ) (9.37) Comparing (9.37) with (9.33), it can be seen that a Hopfield network can solve the restoration problem by setting its parameters as follows: (9.38) T = C T ( A D r D + H T H ) C e = g T H c The Hopfield network can solve any optimization problem whose cost function can be modified to the form shown in (9.32), which is the case for most image restoration problems. However, some practical problems arise when the Hopfield network is applied to image restoration. Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 118 The first problem is that an enormously large number of neurons are needed for image restoration even when the size of the picture is small. This is because that several neurons are needed to represent a pixel since each neuron of a Hopfield network has only binary output. Suppose eight neurons are used to represent a pixel, then for an n x n image, 8 x n2 neurons (e.g., if n=64, then 32,768 neurons), are needed, that is, the size of the Hopfield network is 64n4. The second problem is that neurons in the Hopfield network have to be updated sequentially, i.e., one at a time, to ensure the convergence of the network [33]. This is a serious restriction, since it nullifies the advantage of parallelism offered by neural networks. Moreover, the need for a huge number of neurons makes this problem much worse. Our network model has a size of n4 neurons, and, most importantly, is guaranteed to work in parallel. Some detailed comparison between our network model and the Hopfield network in solving optimization problem in general are available in [87]. Chapter 1 0 A P P L I C A T I O N S I N O T H E R A R E A S The purpose of this chapter is two-fold. First of all, we shall show that the applicability of our methodology as well as the LQT networks is not limited to the area of image processing. Secondly, we shall show that by using our methodology, better neural network implementations can be easily developed. To serve such purpose, we have chosen some algorithms which are used in areas other than image processing and have known neural network implementations developed by other researchers. 1 0 . 1 Sorting Sorting is one of the most important operations performed by computers. It is believed that 25%-50% of all the works performed by computers consist of sorting data [1]. In addition, there exist other techniques which are based on sorting, for instance, the order statistic filtering. Therefore, increasing the speed of sorting is of great importance and interest. Since the advent of digital computers, many sorting schemes have been proposed in the literature [1, 51, 61]. Some of the sorting schemes are executed in serial or parallel digital computers, while others are executed in special-purpose machines. For bounded fan-in logic circuits (networks of AND, OR, NOT gates), the minimum sorting time achievable is 0(nlog 2n) for serial sorting schemes, and 0(log2 n) for parallel sorting schemes (n here is the number of items to be sorted) [1]. For unbounded fan-in logic circuits, the minimum sorting time is 0(1); however, the circuit size has to be more than 119 Chapter 10. APPLICATIONS IN OTHER AREAS 120 polynomial [13, 22]. It is implicit in [13] that if threshold-logic gates (neurons) are used, the circuit size can be reduced to polynomial. It is certainly possible to achieve constant sorting time using a polynomial-size LQT network since it employs threshold-logic neurons as one of three kinds of basic elements. Here, we derive such a network for sorting integers, and by doing so, give exact size and depth. This network implements the enumeration sorting algorithm, which is the fastest sorting algorithm when implemented in parallel [1]. Given an sequence S = (xi, x 2 , x n ) where Xi is an integer, it can be sorted in two ways: (1) S is sorted to a sequence (2/1,2/2, \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2,2/n), where ?/,\u00E2\u0080\u00A2 < yj iff i < j; and (2) iS\" is sorted to a sequence (y1? y 2 , Vn), where j/; > yj iff i < j. Let us refer to the first sequence as an increasingly sorted sequence and the second an decreasingly sorted sequence. Without loss of generality, we shall only consider the decreasingly sorted sequence and denote it as S~. It can be easily shown that for a sequence S of distinct numbers, it has the following relationship with its decreasingly sorted sequence S~: yk = x; iff there are (k \u00E2\u0080\u0094 1) elements in S which is greater than x,-. To include cases where there are some elements in S which are equal, we introduce the following definition and theorem. D e f i n i t i o n 10 .1 x,- t> Xj means that X{ > XJ if i > j or xt- > Xj if i < j. Define Qi = {XJ : Xj > x,-, VXJ \u00C2\u00A3 5 - (x,)}, we have the following theorem. T h e o r e m 10 .1 Given a sequence S, if we arrange its elements to form an new sequence S' according to the following rule: p(xi) - k, ifl\Gi\ = k-1 where p(xi) means the position of xt- in S', then S' is a decreasingly sorted sequence of S. Chapter 10. APPLICATIONS IN OTHER AREAS 121 PROOF The validity of this theorem is obvious. I To derive a LQT network for sorting, we need to restate the enumeration sorting algorithm in a more mathematically rigorous way. This requires two special functions introduced in Chapter Eight; they are the comparison function and the rank function. The comparison function is defined as 0 xt- > Xj 0 Xi = Xj i > j 1 Xi = Xj i < j 1 Xi j C^X ^ ^ Xj^j \u00E2\u0080\u0094 ^ (10.1) and the rank function is defined as (10.2) (10.3) rank(x,- : S) \u00E2\u0080\u0094 k, if x,- is the k largest element in S According to Theorem 10.1, the following equation is true. n rank(Xi : S) = ^c (x , - ,Xj ) + 1 J=I Now the enumeration sorting algorithm can be rephased as follows: 1. Calculate c(x;, Xj) G I = {1,2,.., n}, but i ^ j ; 2. Calculate rank(xj : S) Vi \u00E2\u0082\u00AC T', 3. Vfc \u00E2\u0082\u00AC 2T, Vk = xi if rank(x8- : S) = k. We first derive network realizations for each step and them compose them together to form the whole network. For the case i > j , C^tCj^ X\u00E2\u0080\u0094 * 0 XJ I X j 1 otherwise 1 Xi \u00E2\u0080\u0094 Xj > 1 0 otherwise (10.4) Chapter 10. APPLICATIONS IN OTHER AREAS 122 The network realization of this function is C ^ X f ^ Xj ^ \u00E2\u0080\u0094 3C^y Xj | 1 j X ]^ For the case i < j, The network realization of this function is 0 3C j %C j 1 otherwise 1 Xj \u00E2\u0080\u0094 X{ > 0 0 otherwise For the second step, the network realization of the rank function is rank(xj : S) = C C(XJ,, x i ) , C ( X J , Xj_x) , C(XJ, X J + I ) , c ( x , - , x n ) D _ 1 Substitute (10.5) and (10.7) to (10.8), we have rank(x,- : S) = C L Z X; , x i | \u00E2\u0080\u0094 1,1 Z I 1 , Z x ; , X j _ i | \u00E2\u0080\u0094 1,1 Z I 1 , LZ X i , x i + 1 | - 1,1 Z I 0 , L Z x,-, x n | - 1,1 Z I 0 ! ) \" 1 For the third step, let z(i, k) = < Then it can be shown that Xj if rank(xj : S) \u00E2\u0080\u0094 k 0 otherwise n Vk = k) t = i which has the following network realization (10.5) (10.6) (10.7) (10.8) (10.9) (10.10) (10.11) (10.12) yk = C z(l,k), z(2, k),z(n, k) D\u00C2\u00B0 (10.13) Chapter 10. APPLICATIONS IN OTHER AREAS 123 Let 1 if rank(x,- : S) = k 0 otherwise (10.14) then, X{ F{ \u00E2\u0080\u0094 k 0 otherwise z(i, k) = < According to Example 7.11.11, the network realization of this function is (10.15) z{i,k) = C < Xi,Fi | l,b>c,Ft | l,a D\u00C2\u00B0 (10.16) where a < min{a:,- : i \u00C2\u00A3 I and x t 6 X}, b > max{i,- : i 6 I and Xi \u00E2\u0082\u00AC X} \u00E2\u0080\u0094 a, and c = a + b. X is the set of all the possible values X{ may take. The network realization of Fi, according to Example 7.1.6, is Fi = C C ri Z l f c , C rt- | 1 , - 1 3 \u00C2\u00B0 (10.17) where = rank(a;,- : 5). Substitute this equation to (10.11), after simplifying the resulting network realization, we have z(i,k) = C < x i , r Z r , - Z l A \ r Z r i Z ] f c + 1 \l,b,-b> c,nriZlk,\ZriZ2k+1 (10.lo) \l,a,-aD\u00C2\u00B0 Composing all the network realizations together and after some simplification, we have the network implementation of the enumeration sorting algorithm: yk = C fJ-l, /il,fc, Hl,k+l, 1*2, H2,ki M2,fc+1, Hn,Hn,k, Hn,k+\ | 1, \u00C2\u00AB, \u00E2\u0080\u0094\u00C2\u00AB, 1, \u00C2\u00AB, \u00E2\u0080\u0094 \u00E2\u0080\u0094 , 1, \u00E2\u0080\u0094a Z>\u00C2\u00B0 { /Xi = < Xi,fiitk,fii,k+i I 1,6, > c {/*,-,/ = C c{xi,xx),...,c(xi,xi_l),c{xi,Xi+l),...,c(xi,xn) Zi1'1 ( 1 \u00C2\u00B0 r z Xi,xj | - l , 1 z i 1 111,-,^ | - 1 , 1 D \u00C2\u00B0 {c(x,-, Xj) = < Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 4 {c(xi,Xj) = < where i , j,k \u00C2\u00A3 1 and / G k + 1}. There are two special cases in the above network realization, namely, when k \u00E2\u0080\u0094 1 and k \u00E2\u0080\u0094 n. When k \u00E2\u0080\u0094 1, = 1 Vi \u00C2\u00A3 X. Therefore, the network realization of ?/i can be further simplified to V\ = C A\u00C2\u00BBi,2\u00C2\u00BB /*2, A*2,2, A\u00C2\u00BB\u00C2\u00AB, A*n,2 I l , - a , l , - a , 1 , - a D ~ n a {pi =< xt,mt2 | 1,-6 > a {fr,2 =\u00E2\u0080\u00A2 c(x,-, x i ) , C ( X J , x,_i), c(x,-, x i + i ) , c ( x , - , i n) Z)1 ( 1 0 . 2 0 ) LZ Xi,Xj | - 1,1 ZJ1 LZ_ Xj | 1^ 1 ZD where i,j \u00C2\u00A3 J . When A; = n, / ^ j 7 l + 1 = 0 Vi G X. Therefore, the network realization of t / n can be further simplified to Vn = C / / l , / * l,n, /l2, /t2,n, -, /^n, N , n | 1, a, 1, O, 1, fl D \u00C2\u00B0 {m =< Xi,ui>n | 1,6 > c {Pi,n =LZ c(xi, x j ) , c ( x i , x.-.i), c(x,-, x , - + 1 ) , c ( x j , xn) ( 1 0 . 2 1 ) Z^ ^-15 I I 5 1 3^ CZ 3?t? \ 1^ 1 ZD where i , j G X. For an input array of n elements, our network has a depth of 4 and a size of n(n2 + 6n \u00E2\u0080\u0094 5 ) . The computation time of our network is constant irrespective of the size of the input array. It is always four times the processing time of one single neuron. Our network for n \u00E2\u0080\u0094 3 is shown in Figure 1 0 . 1 , where the weights which are not shown have the value of 1. To our knowledge, the aforementioned network is the first published neural network which employs three kinds of neurons to implement the enumerate sorting algorithm [86]. {c(x,-,xj) = < Chapter 10. APPLICATIONS IN OTHER AREAS Figure 10.1: A Neural Network for Sorting an Input Array of Three Elements Chapter 10. APPLICATIONS IN OTHER AREAS 126 Later, Bruck and Siu propose another neural network model which has an architecture similar to that of ours [88]. It is composed of threshold-logic neurons and has a depth of 5. Note that the size of our sorting network is polynomial only with respect to the number of input elements. This is justifiable in applications where the value range of the input elements is fixed, for instance, in image processing, the value range of a pixel is restricted to [0,255]. 10.2 Communication Here, we chose an algorithm which is used in communication and has a well-known neural network implementation called Hamming network. 10.2.1 Improvement over Hamming Network In Chapter Two, we introduced the Hamming network which implements a minimum er-ror classification algorithm (referred to as Hamming Classification (HC) algorithm here). The Hamming network has some shortcomings: (1) the network output is not the ex-emplar; (2) the network has to iterate several times; (3) the matching scores have to be distinct; and (4) the output dynamic range of neurons has to be very large. Here, we derive a network which overcomes the above shortcomings. Assume there are m exemplars, and the size of the input pattern is n. Denote the input pattern as x, the output pattern as y, the i th exemplar as e,-, and the j th element of ei as e\. The HC algorithm is reformulated as follows: 1. Calculate d,- = \{n + x \u00E2\u0080\u00A2 e,) Vj G J = { 1 , 2 , m } ; 2. Find dk = max( 1, C rank((4 : Z)) \u00E2\u0080\u00A2 1= 1. According to Theorem 6.3, the above network can be simplified to zk =CC rank (4 : D) Z? | - 1 D~\ (10.28) Chapter 10. APPLICATIONS IN OTHER AREAS 128 The network realization of the rank function (see Section 10.1) is rank(d* : D) -C c(dk, dx),c(dk, dfc-i), c(dk, dk+i),.-, c(dk, dm) D 1 (10.29) where c(dk,dj) = < LZ dk,dj | - 1,1 D 1 if k > j \Zdk,dj | - 1 , 1 ZI0 if kj = ej \u00E2\u0080\u0094 ek. Substitute (10.33) to (10.28), and then substitute the resulting network realization to (10.26). After some simplification, we have the network realization of the HC algorithm as follows: {fik \u00E2\u0080\u0094LZ pk j < LZ x | 4 j ZJ\u00C2\u00B0 if k < j (10.34) Chapter 10. APPLICATIONS IN OTHER AREAS 129 Figure 10.2: The Network Model for Implementing the HC Algorithm where i \u00C2\u00A3 {l,2,...,ra}, k,j \u00C2\u00A3 {l,2,...,m}, and a,- = Z^LieJ- This network has a depth of 3 and a size of m2(n + 1) \u00E2\u0080\u0094 m. It is shown in Figure 10.2 for the case m \u00E2\u0080\u0094 n = 3. Our network overcomes the shortcomings of the Hamming network mentioned at the beginning of this section. In particular, since the processing time of the Hamming network is at least nine units of time, the processing speed of our network is at least three times as fast as that of the Hamming network. Furthermore, since our network is perfectly synchronized, input patterns can b e e f e d to the network in a pipeline fashion. The only disadvantage of our network is that the network size is bigger than that of the Hamming network (its network size is m(m + n)). Whether or not this is negligible or tolerable depends on the relative importance of the cost of hardwares for particular applications. Chapter 10. APPLICATIONS IN OTHER AREAS 130 10.3 Optimization Optimization has been an major field of application for neural networks since the pub-lication of Hopfield's papers [35, 36]. In Chapter Nine, we have derived some network models for solving image restoration problems, which are essentially optimization prob-lems. Here, we show the derivation of some network models for solving other optimization problems, and compare the performance of our networks with the Hopfield network which is a popular neural network used for optimization. 10.3.1 Solving Simultaneous Equations The matrix format of the simultaneous equations is expressed as A x = c (10.35) where A is n x m matrix, x is a m x 1 vector, and c is a n x 1 vector. The objective here is to find the vector x which satisfies equation (10.35). Let us define a computational energy function as follows \u00C2\u00A3 = ( A x - c f ( A x - c ) (10.36) This energy function can be shown to be a convex function, therefore any local minimum is also the global minimum [14]. Since E attains value zero if and only if A x = c, the problem of solving the simultaneous equation is now transformed to the problem of minimizing E. If equation (10.35) has a solution, by minimizing E this solution can always be found. If equation (10.35) has no solution, by minimizing E a best fit solution in the L2 sense is found. To find the minimum of E, we use the method of steepest descend. Here the change of x is along the best searching direction, which is the opposite direction of the gradient Chapter 10. APPLICATIONS IN OTHER AREAS 131 of E. Denoting Vi\u00C2\u00A3 as the gradient vector, we get VE = 2 A T A x - 2 A T c = 2 A T ( A x - c ) (10.37) Therefore, the increment of x is A x = -<5AT(Ax - c) (10.38) where 6 takes a small positive value. The optimum 6 is a function of x. Here for the sake of simplicity we choose 5 as a constant. We know that x(Jb + 1) = x(fc) + Ax(Jb) (10.39) This leads to x(fc + l) = (I \u00E2\u0080\u0094