A CEREBELLUM-LIKE LEARNING MACHINE by ROBERT DUNCAN KLETT B.A.Sc, University of British Columbia, 1975 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Electrical Engineering) We accept this thesis as conforming to the required standard. THE UNIVERSITY OF BRITISH COLUMBIA July, 1979 (c) Robert Duncan Klett, 1979 In presenting t h i s thesis in p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s thesis f o r s c h o l a r l y purposes may be granted by the Head of my Department or by his representatives. It i s understood that copying or p u b l i c a t i o n of t h i s thesis f o r f i n a n c i a l gain s h a l l not be allowed without my w r i t t e n permission. Department nf Electrical Engineering The U n i v e r s i t y of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date J u l y 19, 1979 ii ABSTRACT This thesis derives a new learning system which i s presented as both an improved cerebellar model and as a general purpose learning machine. It i s based on a summary of recent publications concerning the operating characteristics and structure of the mammalian cerebellum and on standard interpolating and surface fitting techniques for functions of one and several variables. The system approximates functions as weighted sums of continuous basis functions. Learning, which takes place in an iterative manner, i s accomplished by presenting the system with arbitrary training points (function input variables) and associated function values. The system i s shown to be capable of minimizing the estimation error in the mean-square-error sense. The system i s also shown to minimize the expectation of the interference, which results from learning at a single point, on a l l other points in the input space. In this sense, the system maximizes the rate functions are learned. at which arbitrary iii TABLE OF CONTENTS Page ABSTRACT ii LIST OF TABLES.. LIST OF FIGURES NOTATION Chapter II III IV vi vii ACKNOWLEDGEMENT I V ix Page INTRODUCTION 1 CEREBELLAR STRUCTURE 3 2.1 The Mossy Fiber Pathway 4 2.2 The Climbing Fiber Pathway 8 MODELING THE CEREBELLUM 9 3.1 Recent Cerebellar Models 9 3.2 Requirements of an Improved Cerebellar Model 13 MODEL SIMPLIFICATIONS AND ASSUMPTIONS 15 4.1 Arithmetic Functions of Cerebellar Neurons 15 4.2 Cerebellar System Considerations 20 V MATHEMATICAL ANALYSIS OF THE CEREBELLAR SYSTEM 23 5.1 The Cerebellum and Estimation Functions 23 5.2 An Optimized Learning Algorithm 27 5.3 System Properties 38 iv TABLE OF CONTENTS (Continued) Chapter VI VII Page SYSTEM PERFORMANCE 42 6.1 General Considerations 42 6.2 Examples of Learning a Function of a Single Variable 45 6.3 Soft Failure 53 6.4 Learning Functions of Several Variables 54 DISCUSSION AND CONCLUSION 59 7.1 Physiological Implications 59 7.2 The Learning Algorithm and Machine Intelligence 61 7.3 Contributions of This Thesis 64 7.4 Areas of Further Research 65 BIBLIOGRAPHY 70 Cited Literature 70 General References 73 V LIST OF TABLES Table Page 2.1 Connectivity of Cerebellar Neurons in Cat 5.1 Number of Terms Generated by Full and Reduced Cartesian Products 7 27 T 6.1 Values of b=MAX[(D(D] for Various Values of m and n 45 6.2 The Learning Algorithm as Effected by k and C 51 6.3 Learning Sequences for Cases Where u(H)^s(H) 53 6.4 Learning Sequences for the Arm Model 58 vi LIST OF FIGURES Figure Page 2.1 Block Diagram of the Cerebellar System 3 2.2 Structure of the Cerebellar Cortex 5 3.1 A Perceptron 3.2 Albus 4.1 Model Granule Cell 19 5.1 A Bounded Estimation Function 40 6.1 Error Correcting Functions for a Single Variable 44 6.2 Target Function Sin (2TTx) over 46 6.3 Optimal Polynomial Approximations 6.4 Sequences of Estimations of Sin (2TTx) 1 10 Mossy Fiber Activity Pattern 11 (0<x<l) of Sin (2TTx) For m=4, k=8 47 48 6.5 Near-optimal Estimations of Sin (2ITx) 49 6.6 The Geometry of the Model Arm 55 6.7 Block Diagram of the Arm Model Learning System 56 7.1 A Hierarchical System of Learning Machines 63 7.2 One Dimensional Spline Functions 67 vii NOTATION a) Standard Forms a = a scalar A = a vector A = a matrix aj = the i * - a^j = the element of A at co-ordinates (i,j) A b) T T or A n element of A = the transpose of A or A respectively Definition of Variables Ae = an error correcting function G = an arbitrary constant, G>0 used as the acceptable estimation error f = Purkinje c e l l activity or estimated output function f = system corrected output (Subcortical Nuclear c e l l output) f(H) = target function Af = f - f = the estimation error G H = Golgi feedback matrix = region of definition function Hi = a particular point in H I k = identity matrix = convergence gain factor in the hyperspace of the control viii NOTATION (Continued) L = number of elements in the basis set {(*)} and in the weight set {TT} m = number of elements in a one-variable interpolation function n = number of independent input variables or arbitrary power {0} = the function-space basis set (Parallel fiber activity) {(D(HTJ} = the vector of values of {0} at the point {TT} = expanded input set s ( H ) = a cost density function 0 = (I4G) -1 = Granule-Golgi transfer matrix also used as an arbitrary non-singular matrix u ( H ) = the training point density function v = arbitrary scalar for optimization techniques {W} = the set of variable weights (effective synaptic weights) X = augmented matrix whose columns are eigenvectors XJ = the i h element of the input set fc ix ACKNOWLEDGEMENT There are many people and institutions to whom I owe much gratitude on the completion of this thesis. My wife, Marian, has shown great patience and understanding while encouraging me in this project. It i s with much gratitude that I acknowledge the assistance of my supervisor, Dr. P.D. Lawrence. His suggestions and criticism have added much to this work while his encouragement has helped to bring i t to a successful conclusion. I wish to thank my co-reader, Dr. E.V. Bonn, and the other members of my committee for their careful inspection and criticism of this thesis. To my fellow students in Electrical Engineering at the University of British Columbia, I extend stimulating thanks for providing an enjoyable and environment. In particular, I thank Mr. Rolf Exner for developing the program which typed this thesis. This research has been supported by the Natural Sciences and Engineering Research Council of Canada under a Postgraduate Scholarship to i t s author and grant No. A9341, and by the University of British Columbia in the form of a Teaching Assistanceship. 1 I INTRODUCTION A r t i f i c i a l intelligence and pattern recognition are currently areas of great interest. This thesis analyses a phylogenetically lower portion of the brain, the cerebellar cortex, in the belief that such an analysis will assist the development of a new type of intelligent system. It i s further believed, since the higher regions of the brain are phylogenetically newer than the lower ones, that a study of the cerebellum i s a logical starting point in the study of intelligent structures in general. The cerebellum is a part of the brain which i s located in the rear portion of the head, at the base of the skull. It is generally accepted that the cerebellum is involved in maintaining posture and co-ordinating motor activities of the body [7,12,28]. Although other levels of the brain can perform these functions, i t has been postulated cerebellum levels that the i s designed both to provide finer co-ordination than other and to tasks [17,18,46]. relieve Studying higher levels the cerebellar of most cortex of the motor i s particularly appealing due to its extremely regular arrangemant of cells (which will be discussed in Chapter 2) and the fact that i t s inputs (only two types) and outputs (only one type) are so well differentiated. Although the following analysis i s based on the structure and function of the cerebellum, i t should be established at this point that much additional research i s required to show i f the cerebellum according to the theory to be developed. in fact operates 2 There are thus two main goals of this research: 1. to present a model of the cerebellum which is consistent with current anatomical and physiological data; thus being an improvement over existing models, and 2. to develop a learning machine, incorporating the improved cerebellar model, which may form the basis of a new approach to designing intelligent machines. With these goals in mind, the thesis begins by describing the structure of the cerebellum. Chapter 3 presents recent cerebellar models, some of their deficiencies, and the requirements of an improved model. Next, Chapter 4 begins to analyse the nature of possible cerebellar computations, leading to Chapter 5 which ends with the development of an optimized learning algorithm which i s consistent with the cerebellar system. The performance and operating characteristics of examples of this system are presented in Chapter 6. Finally, Chapter 7 discusses implications of both the new cerebellar model and the learning system, ending with suggestions for further research. 3 II CEREBELLAR STRUCTURE There have been numerous publications dealing with the cerebellum. Unless otherwise indicated, the following information is derived from some of those which deal with the cat [7,12,48]. A block diagram of the interconnections of neurons in the cerebellum is shown in Figure 2.1 in PURKINJE PARALLEL FIBERS CELLS h 7"% GRANULE GOLGI CELLS CELLS BASKET t, STELLATE CELLS 0 CEREBELLAR CORTEX SUBCORTICAL NUCLEAR CELLS KEY • EXCITATORY SYNAPSE - INHIBITORY SYNAPSE TO MOTOR CENTERS VIA VARIOUS ROUTES MOSSY FIBERS Fig 2.1 CLIMBING FIBERS Block Diagram of the Cerebellar System which the interconnections are marked to indicate whether they are excitatory or inhibitory. It can be seen that there are only six types o f neurons in the cerebellum. Of these, five are interneurons located 4 in the cerebellar cortex which have no direct effect outside the cerebellum (Granule, Golgi, Basket, Stellate, and Purkinje c e l l s ) , while the sixth (Subcortical Nuclear cells) are the only c e l l s which do have a direct external effect. There are also only two types of input fibers (Mossy and Climbing fibers). The interconnections of the cerebellar neurons, particularly those in the cerebellar cortex, have been extensively studied, revealing the remarkably regular geometry of Figure 2.2. 2.1 The Mossy Fiber Pathway The effects of Mossy fiber activity upon the cerebellum are very widespread. Upon entering the cerebellum, each Mossy fiber sends branches to numerous folia throughout the cerebellum. Some collaterals terminate on Subcortical Nuclear cells, while the branches which enter the cortex branch profusely before terminating as Mossy Rosettes in the Granular layer of the cerebellar cortex. The Rosettes form the nucleus of an excitatory synapse between a Mossy fiber, dendrites from numerous Granule c e l l s , and dendrites from a few Golgi c e l l s . Granule cells, each of which are contacted by 2 to 7 Mossy Rosettes (average 4.2), generate a single cerebellar folia to the Molecular axon which climbs through the layer where i t forms a "T"-shaped branch. Each branch of the axon, now classed as a Parallel fiber, runs longitudinally along the folia for a distance of up to 1.5 mm (2.5 mm in man [9]). Each Parallel fiber traverses a large number of characteristically flattened dendritic trees of Basket, Stellate, and Purkinje cells (up to 5 a) Perspective view of the cerebellar cortex (after [12]). BA=Basket c e l l , CL=Climbing fiber, GO=Golgi c e l l , GR=Granule c e l l , MO=Mossy fiber, P=Purkinje c e l l , ST=Stellate c e l l . b) Diagramatic view of the Molecular layer of the cerebellar cortex showing the packing arrangement of Parallel fibers and Purkinje c e l l dendrites. Fig 2.2 Structure of the Cerebellar Cortex 6 300 of the latter). Although there i s some uncertainty concerning the proportion of Parallel fiber-Purkinje c e l l intersections which in fact result in synapses, i t i s clear that the geometry maximizes the number of possible synapses. Those intersections which do result in synapses between Parallel fibers and Purkinje cells, Basket cells, or Stellate cells are excitatory. The axons of Basket and Stellate cells form inhibitory synapses with the dendritic trees and pre-axon areas of nearby Purkinje cells. Purkinje axons terminate in inhibitory synapses on Subcortical Nuclear cells. As well as forming excitatory synapses with Basket, Stellate, and Purkinje c e l l s , Parallel fibers also form excitatory synapses with the dendritic trees of Golgi cells. Unlike the dendritic trees of the other cerebellar interneurons, Golgi cell dendrites spread throughout a cylindrical volume whose base i s in the Granular layer and whose top i s in the Molecular layer of the cerebellar cortex. Each tree i s divided into two regions, the top one being bottom one by Mossy Rosettes. excited by Parallel fibers, the Each Golgi c e l l generates axons which inhibit a large fraction of the Granule cells which are in the volume enclosed by the Golgi dendritic tree. The capacity of the cerebellum to process (co-ordinate) is often discussed information in terms of the divergence and convergence of information carrying neurons [13,14,28], Table 2.1 illustrates properties. Of particular note i s the remarkably large these number of Parallel fibers (up to 200 000) which synapse with each Purkinje c e l l . This convergence factor i s a direct consequence of long, thin, Parallel fibers forming a lattice with fan-shaped dendritic trees of Purkinje cells in the Molecular layer of the cerebellar cortex. 7 Table 2.1 Connectivity of Cerebellar Neurons in Cat a) Mossy Fiber Pathway DIVERGENCE CONVERGENCE MOSSY FIBERS 460[14]-600[13] 4.2[14] 100[14]-300[13] 100 000[14]-200 000[13] 20 [14]-30[13] 20 000[14] GRANULE CELLS PARALLEL FIBERS PURKINJE CELLS PARALLEL FIBERS BASKET CELLS 8[14]-50[13] 20 [13]-50[14] PURKINJE CELLS b) Climbing Fiber Pathway DIVERGENCE CONVERGENCE CLIMBING FIBERS 10[14] 1[14] PURKINJE CELLS eg. Each Mossy fiber makes synaptic contact with between 460 and 600 Granule cells while each Granule c e l l i s contacted by an average of 4.2 Mossy fibers. 8 2.2 The C l i m b i n g F i b e r Pathway In c o n t r a s t t o Mossy f i b e r s , the e f f e c t s o f C l i m b i n g f i b e r a r e v e r y l o c a l i z e d . Although c o l l a t e r a l s which form with S u b c o r t i c a l Nuclear very little after c e l l s have been found, entering the cerebellum. excitatory Climbing Each activity synapses fibers Climbing branch fiber which e n t e r s the c e r e b e l l a r c o r t e x t y p i c a l l y forms a synapse w i t h o n l y one, with a t most dendrites excitatory. Basket and a and cell Stellate The understood. Purkinje body Climbing Purkinje c e l l . well few, of fibers cells cells. also which relation the of This synapse, Purkinje form are cell excitatory which is very synapses i n c l o s e proximity to Climbing fibers engulfs the strongly with the to G o l g i c e l l s or the target i s less 9 Ill 3.1 MODELING THE CEREBELLUM Recent Cerebellar Models A number of theories have been proposed which explain some aspects of cerebellar function and theories do algorithm not which Alternatively, organization. Unfortunately, acceptably explain must the those be papers the basis which do most of mathematics of of a truly describe a learning valid workable the model. learning algorithms are not fully compatible with known cerebellar structure. In one of the first theories to utilize current knowledge of cerebellar structure, Marr [28] proposes that the cerebellum directs a sequence of elemental movements to generate a desired action. That i s , the cerebellum of Mossy acts as a pattern recognition device, relating fiber activity to learned outputs. The patterns recognition is performed according to a "codon" (subset) of Parallel fibers which are active at a given time. Translating this into mathematical terminology, his proposal is that each Purkinje c e l l separates a binary hyperspace of Parallel fiber activity into linearly separable Purkinje cell is either active or regions inactive. The in which the orientation of the separating hyperplanes is learned by adjusting Parallel fiber-Purkinje c e l l synaptic strengths as a function of Climbing fiber activity. Albus [1,2] extended and modified Marr's theory by describing the cerebellum in terms of Perceptron theory [32,34,44,45]. A perceptron, which typically consists of binary inputs, a combinatorial network of "association cells", adjustable weights, and a summing device is shown 10 in Figure 3.1. Fig 3.1 In Albus' A Perceptron model, real-valued data, such as a joint angle, is converted to a set of binary signals in a manner similar to the mapping shown in Figure 3.2. along He proposes that these Mossy fibers before being signals are transmitted recoded by the Granular layer into patterns of Parallel fiber activity which form an expanded set of binary signals. The purpose of expansion recoding is to map a l l possible patterns of Mossy fiber activity into sets of Parallel fiber activity which are linearly separable. Weights of synapses between Parallel 11 1 A MF, MF, 180 180 1 S MF. T 180 1-J MF. 1 180 J MF. 180 1 -J 1-1 MF MF 4 8 0. 0 JOINT 180 ANGLE Fig 3.2 Albus 1 JOINT 180 ANGLE Mossy Fiber Activity Pattern fibers and Purkinje cells, Basket cells, and Stellate cells are adjusted under the influence of Climbing fiber activity to obtain the desired Purkinje c e l l output. The effectiveness of this model in performing cerebellum-like functions has been convincingly demonstrated by i t s a b i l i t y to learn to control a number of movements of a mechanical arm [2,3,4]. Unfortunately, Albus 1 model i s somewhat incompatible with some aspects of cerebellar structure and physiology. His expansion recoding scheme employs a hash-coding function which seems to be beyond the computational Granular powers of the layer. Furthermore, there i s 12 l i t t l e evidence to support his proposed mapping scheme of joint angle to Mossy fiber activity [22,23]. A model which i s somewhat similar to Albus 1 has been proposed by Gilbert [17]. The model computes Purkinje c e l l activity as a weighted sum of Parallel fiber activity; the Parallel fiber-Purkinje c e l l and Basket and Stellate cell-Purkinje c e l l synapses being modifiable under the influence of Climbing fiber activity. The theory i s consistent with cerebellar structure but does not describe the relation between the system's inputs, Mossy fiber activity, and Parallel fiber activity. In another theory, Kornhuber [24] emphasizes the capacity of the cerebellar cortex to act as a timing device. He proposes that the action potential velocity along long, thin, Parallel fibers provides a timing mechanism for the control of b a l l i s t i c motions such as saccadic eye movement. His model, although not rejecting i t , does not propose a mechanism by which learning could take place. Calvert and Meno [11] use spatiotemporal interconnections found in the cerebellum may analysis to show that cause i t to act as an anisotropic spatial f i l t e r which enhances both the spatial and temporal details of Mossy fiber activity patterns. Their analysis assumes that a l l synaptic weights are determined s t r i c t l y according to the type of pre and post-synaptic neuron. Hence, the model does not account for learning. Hassul and Daniels [20] present a theory, along with supporting experimental cerebellum evidence, that at least is to act as a higher part of order the function of the lead-lag compensator. They further propose that this compensation provides stability for reflex actions. The model does not compensation might be learned. propose a scheme whereby the correct 13 A number of computer simulations of activity in a cerebellum-like network have been presented [33,38,39,40]. A common prediction of these models is that lateral inhibition and Golgi c e l l feedback should cause the surface of the cerebellum to contain long, narrow, bands of active Parallel fibers. That i s , Mossy fiber activity i s "focussed" into regions of Parallel fiber activity. A similar study includes peripheral feedback and muscle fibers in the model [37]. A common problem with assume a l l Parallel these computer simulations fiber-Purkinje c e l l i s that they synaptic weights are equal. That i s , the models do not consider the effects of unequal synaptic weights which might result from the application of a learning process. 3.2 Requirements of an Improved Cerebellar Model The above cerebellar theories form a good starting point to begin understanding the cerebellum. Unfortunately, each theory i s deficient in at least one aspect. Furthermore, the theories tend to be mutually exclusive so that an a l l encompassing one is not easily synthesized as a combination of the strong points of each proposal. A common shortcoming of those models which do propose a learning scheme is their treatment of Parallel fiber activity as a binary quantity. In these models, learning operates to form linearly separable regions in a binary hyperspace of Parallel fiber activity. Although each action potential i s undoubtedly binary, the information carried by a nerve is generally accepted to be a function of the frequency of action potentials [53], not the presence or absence of one. An improved model must therefore deal with signals which are interpreted as real, not binary, variables. 14 To be truly valid, the operations of a cerebellar model must be consistent with those operations which are feasible to implement using the known structure of the cerebellum. In particular, the mapping of Mossy fiber activity to Parallel fiber activity must be consistent with the arrangement of cerebellar neurons, and the memory requirements of the learning algorithm must be consistent with the number of modifiable synapses in the cerebellar cortex. 15 IV 4.1 MODEL SIMPLIFICATIONS AND ASSUMPTIONS Arithmetic Functions of Cerebellar Neurons The basic assumption to be used in the following discussion is that cerebellar information processing may mathematical operators be interpreted as a number of acting upon real-valued data. That i s , data which is translated into a sequence of action potentials whose average frequency i s a function of some biologically significant variable, is processed, according to common mathematical operations, by cerebellar neurons. It has frequency been shown that neurons and neural networks, using coded data, can perform basic arithmetic operations subtract, multiply, and frequencies [8,43]. divide) The upon pre-synaptic operation performed by a (add, action potential particular neuron depends upon a number of factors including the location of the synapses relative to the post-synaptic neuron's c e l l body, the nature of each synapse (excitatory or inhibitory), the level of inherent inhibition or excitation of the post-synaptic neuron (ie. the level of resting activity or inhibitory threshold), and the relative activity of each pre-synaptic presented numbers to neuron. The represent data model to be and will a selection of the thus use real above arithmetic operations to represent the functions of relevant neurons. The frequency at which action potentials are transmitted is obviously a non-negative number. This restriction upon the values to be represented will be relaxed as i t is equally obvious that a fixed positive bias can be superimposed upon any bounded negative variable in 16 order to guarantee a net positive value. Similarly, although the nature of a particular synapse (excitatory or inhibitory) is fixed, either a bias, or an arrangement of interneurons can be established to permit a synapse to have a net effect which is either positive or negative [2,17], It i s useful to regard the cerebellar cortex system as a form of associative memory [16,25,27], In such an approach, a set of memories (the firing frequencies of the Purkinje cells) is associated with a set of inputs (the activities of the Mossy and/or Climbing fibers). Due to their remarkable specificity to the output cells, i t seems most likely that Climbing fibers are equivalent to "data" lines while Mossy fibers are equivalent to "address" lines. The only other possible arrangement, Mossy fibers providing data and Climbing fibers providing the address, is exceedingly unlikely due to the dispersion of information between Mossy fibers and the system's output, Purkinje cells. The structure of the Mossy fiber pathway of the cerebellum suggests a number of possible formulations for the value of Purkinje cell activity (system output) as a function of Mossy fiber input. These are: a sum of sums, a sum of products, a product of products, and a product of sums. A moment's reflection will indicate that a sum of products i s the most likely form for representing non-linear functions of several variables since both a product of sums and a product of products are always zero whenever any one of the primary terms is zero (either due to coincidence or to a faulty transmission line). Similarly, both a sum of sums and a product of products would not justify the two-stage structure of the system since a single neuron could perform the same computation. Furthermore, Taylor series expansions of functions of several variables show that any function can be adequately expressed as a sum of products, 17 providing enough terms are used. The choice of formulating system output as a weighted sum of products is not new as i t has been suggested by other workers, and i t seems highly plausible, that each Purkinje c e l l effectively computes a weighted sum of the activity of Parallel fibers which synapse with i t [2,17,28,32]. That i s , f = £>i(Di (4.1) where f =Purkinje c e l l activity w . i t h synaptic weight = (Inactivity of the i t n Parallel fiber. Learning is assumed to take place by adjusting the synaptic weight set {W} as a function of Climbing fiber activity [2,17,21,28], The function of cerebellar interneurons (Granule, Basket, Stellate, and Golgi cells) in the model must now be considered. This model will take the same approach as a number of previous workers who have suggested that Basket and Stellate c e l l s act to permit the net effect of some W J to be negative and to permit both increasing and decreasing wj [2,17,28]. (The nature of a synapse, excitatory or inhibitory, i s fixed.) A simple and reasonable model is that Basket and Stellate cells compute an unweighted sum of the activities of the Parallel fibers with which they synapse. Since Basket and Stellate cells are inhibitory, this sum is then subtracted from Purkinje c e l l activity. That i s , (4.2) c = effective (constant) synaptic connectivity between Parallel fibers and Purkinje c e l l s via the Basket and Stellate c e l l route. 18 Re-arranging (4.2) yields f =5 (Pi-c)Oi. (4.3) Letting w i = Pi _ c 4 ' 4 (-) will result in wj which may be positive or negative, depending upon the value of p j . This approach does, however, place a lower limit the value of every (-c) on w^. It should be noted that the dendritic tree of each Basket and Stellate c e l l is less widespread than that of a Purkinje c e l l . This arrangement reduces the volume of cortical cells by permitting a single Basket or Stellate c e l l to influence a number of Purkinje cells, despite the fact that each Purkinje c e l l has synapses with a slightly different subset of Parallel fibers. As for Granule cells, i t has been suggested above that (4.1) must represent a sum of products. This implies that each Granule c e l l forms a term which is the product of the activity of the Mossy fibers which make contact with i t s dendrites. To account for Golgi inhibition, Granule cells will be modeled as the two-part cells shown in Figure 4.1. The f i r s t part computes the product of the activity of relevant Mossy fibers while the second part subtracts Golgi cell inhibition). The maximum number of different products activity (Golgi is given by the number of a l l possible combinations of the set of input variables. The set of those products which are actually used in (4.1) determined general, in following sections) will be represented (a set to be by {TT}. In {TT} contains more elements than the number of Mossy fibers which enter the cerebellar cortex. The set w i l l therefore also be 19 PARALLEL MOSSY FIBER FIBERS MULTIPLIER GOLGI Fig 4.1 FEEDBACK Model Granule Cell referred to as the "Granule expanded input set". Such an expansion i s consistent with the large Mossy fiber-Granule cell divergence shown previously in Table 2.1. Since Golgi cells have two distinct dendritic trees, and since the upper tree i s much more dense [12], i t would seem that Parallel fiber synapses are more significant to the Golgi system and that Mossy fiber synapses are merely an improvement, possibly to reduce inherent delays of the system. This assumption does tend to time simplify the following analysis but is not an essential ingredient of i t s basic principles. In any case, the Parallel fiber-Golgi c e l l system w i l l be 20 modeled as a negative feedback network of the form $ = TT - G(D (4.5) where G=Golgi feedback matrix Tf=<5ranule expanded input set. Once a steady-state condition has been reached, Parallel fiber activity can be expressed as (p = ( I + G ) - i r . (4.6) 1, This can be re-written as 0 = 9T (4.7) where 9=(I+G)- . 1 (4.8) Substituting (4.7) into (4.1) and re-writing using vector notation, the system can be expressed as T f = W(D = 4.2 WT0Tr. (4.10) Cerebellar System Considerations The using (4.9) cerebellum functions as visual, tactile, and other a peripheral controller. forms of levels of the brain "teach" the cerebellum sensory That i s , feedback, higher to control desired motions. This procedure proves advantageous by relieving higher levels of these control tasks once the motions have been learned. To be most efficient, 21 the cerebellar system should be designed to learn the control parameters as they are being generated by the higher levels. In such a scheme, whenever the estimation error is deemed to be excessive, higher levels simultaneously correct the body position and adjust the weight set so as to reduce the estimation error for a subsequent estimation at that point in the control hyperspace. The existence of synapses between Climbing fibers and Purkinje cells, and between Climbing fibers and Subcortical Nuclear cells adds support to this proposal proposed that, during training, Climbing as i t may be further fiber activity overrides that of Purkinje cells in such a way as to f i r s t correct the body position and to then reduce the Purkinje cell's estimation error. Also, the cerebellum should be most motions, thus minimizing controlling effective in controlling the most usual the amount of time spent by higher levels in body motions. That i s , since (4.1) really estimates a control function, the estimation error should be least in those regions which are used most frequently. Referring to equations (4.1) and (4.4) and the figures showing cerebellar structure, i t i s apparent that the only variable elements in this model of the cerebellum are the weights of synapses between Parallel fibers and Purkinje cells. (Basket and Stellate cells may also have modifiable synapses without significantly changing the model.) These variable weights are considered by this model as the set {W}. From a different viewpoint, this means that the only stored (memory) of the system are the current weights. This fact prevents the use of standard matrix inversion techniques values completely in solving (4.1). It is therefore apparent that the system must learn by the use of an error correcting algorithm inversion in an iterative manner. which effectively performs matrix 22 A well known property of animals, particularly mammals, i s their ability to adapt to changes in their size, strength, and environment. Therefore, the mammalian motion controller, the cerebellum, must be able to modify i t s control parameters development. When considering at any stage the cerebellum in the animal's as a learning machine, this means that the system must be able to learn any number of training points or patterns which may be presented at any time. A final consideration in any mathematical model of the cerebellum is the number of terms required in the summation given by (4.1). Motion co-ordination would seem to involve non-linear independent variables [42] (sets of current functions of numerous positions, current velocities, desired positions, desired velocities, etc). Each Purkinje cell must therefore generate a function of several variables; the larger the number of variables, the better the co-ordination. Unfortunately, increasing the number of variables rapidly increases the number of terms required. As the model must not require more terms than a Purkinje c e l l is capable of adding, an upper limit of approximately 200 000 i s imposed upon the number of terms in each estimation function as this i s the largest estimate of the number of Parallel fibers which synapse with a single Purkinje c e l l . 23 V MATHEMATICAL ANALYSIS OF THE CEREBELLAR SYSTEM 5.1 The Cerebellum and Estimation Functions The nature of the cerebellar system i s suggestive of several properties of those mathematical procedures which will be referred to here as "estimation functions". These procedures include the related theories fitting, of curve theories are discussed regression, and interpolation. These in numerous books and papers, including [19], [35,36], and [41] respectively. Estimation procedures typically operate to minimize an expression of the error between a target function and an estimated function. The estimated function i s generally expressed as a weighted sum of basis functions f=^w (D (H) i (5.1) i where H = a vector of one or more independent variables which is essentially the same equation as (4.1). There are two significantly different approaches to finding f which are reflected in the form of {(])(H)} in (5.1): those in which {$} i s a set of functions which are continuous (and whose derivatives are a l l continuous) over the hyperspace of definition, and those in which {(*)} is a set of functions which are piecewise continuous (or whose derivatives are piecewise continuous). An advantage of the latter approach i s that it simplifies the computations, inversion, required to generate which invariably involve {W}. Piecewise continuous matrix functions, 24 such as splines, can be arranged so that the matrix is banded, thereby making the inversion process simpler and less prone to numerical (round-off) errors [41]. The only restriction independent set of upon (0)} functions is that i f the i t must be a resulting weight set linearly i s to be unique. That i s , the Granule expanded input set {TT} forms a basis set 1 which spans a function space. If the elements of {H} independent, some of them may are not linearly be deleted, resulting in a reduced set which spans the same function space and is linearly independent. On the other hand, the practical advantages of improved system accuracy and r e l i a b i l i t y may result from a system in which Parallel fiber activity forms a set of functions which are not linearly independent [15,47]. Although the following derivations will assume linear independence, this condition may be relaxed in some cases. The choice of the form of {($} (ie. polynomials, trigonometric functions, exponentials, etc.) and of the number of elements in the set is a matter of judgement, dependent upon any known properties of the function which is to be estimated. The above developed either mentioned estimation for functions of techniques have usually a single variable or functions of several variables. Extension to non-linear been for linear functions of several variables is often not as straight-forward as one might expect. (See Chapter 5 of Prenter [41] for a discussion of interpolation functions of several variables.) A particularly severe problem is to ensure the continuity of functions which are interpolated with piecewise continuous interpolation functions. The usual solution to this problem is to form {(J)(H)} as the Cartesian product of interpolation functions of single variables. As an example, for interpolating a function of two 25 variables, one could use <Dk(x,y) = 0i(x) -(Dj(y) (5.2) where k = j+m(i-l) m = the number of elements in {(D(x)} . An important disadvantage of this approach i s that the number of elements in {0(H)} grows rapidly as the numbers of independent variables and elements in {0(x)} increase. That i s , n L = m (5.3) where L = the number of elements in {(J)(H)} n = the number of independent variables. Fortunately, the extension, to several variables, of continuous interpolating functions i s not similarly restricted. The space spanned by a continuous set {0(x)} may be extended to {(|)(H)} as a "reduced Cartesian product" in which only those terms with a total order which i s less than or equal to some maximum value are retained. For example, the set {(p(x)} = {1, sin(x) , sin(2x)} may be extended to two dimensions as a f u l l Cartesian product requiring 9 terms, {(|)(x,y)} = {1, sin(x), sin(2x) , sin(y) , sin(2y) , sin(x)sin(y) , sin(x)sin(2y), sin(2x)sin(y), sin(2x)sin(2y)} 26 or as a reduced Cartesian product requiring only 6 terms. (0(x,y)} = {1, sin(x), sin(2x) , sin(y) , sin(2y), sin-(x) sin(y)} For a reduced Cartesian product, the number of terms required i s given by L = (m-l+n)1 (m-1)!n! (5.4) Where m = the number of elements in the single variable set which i s usually system order + 1. The effectiveness of using an extension based on the reduced Cartesian product i s clearly demonstrated by Table 5.1 which compares the number of terms generated by f u l l and reduced Cartesian product expansions for various values of m and n. It i s particularly interesting to note the values of n and m which generate approximately 175 000 terms, a number which corresponds to the number of Parallel fibers which synapse with each Purkinje c e l l . A somewhat different approach to estimating variables functions of several has been taken by Specht [50]. Based discriminant functions [36], he develops on an analysis of polynomial discriminant functions which can be trained to distinguish between a number of patterns. This theory has been extended to include general regression surfaces [29,51] which can be interpreted, with a suitable change in normalization, Unfortunately, as generalized the training functions of procedure requires arbitrary "smoothness" factor and pre-determining several variables. the selection of an the fixed number of points in the training set. Also, since the technique i s based upon approximating a Gaussian function at each point in the hyperspace of definition, a large number of terms must be used throughout the 27 procedure. Table 5.1 Number of Terms Generated by Full and Reduced Cartesian Products n m FULL PRODUCT REDUCED PRODUCT 3 2 8 4 3 4 64 20 3 5 125 35 6 7 117 649 924 7 5 78 125 330 8 5 390 625 495 15 8 3.52-10 42 5 2.27-10 44 5 5.68-10 100 4 1.61-10 13 170 544 29 163 185 30 194 580 60 176 851 5.2 An Optimized Learning Algorithm Previous sections have stated a number of constraints under which learning i s assumed to take place in a cerebellum-like system. These are: 1. f = W (D, 2. the system has no memory other than of the current values T of synaptic weights {W}, 3. the system must operate in an error correcting mode, (5.5) 28 4. the order and number of training points neither needs to be fixed nor pre-determined, and 5. the basis set, {$}, can be considered as a set of linearly independent functions of the form 0 = 0TT (5.6) where {TP} is a reduced Cartesian product expansion of the input set. A traditional approach to solving estimation problems i s to minimize the weighted mean-square estimation error over the domain of the input variables. J = J s(H) • (WTctKHJ-f (H) ) -dH H 2 (5.7) where H i s the hyperspace of definition f i s the target function s(H) i s the error cost density function. Then dJ = 2^ s(H)<D(H)'(d)T(H)W-f (H) )dH = 0 d-W " (5.8) H yielding W = Since evaluating T s(H)(D(H)(t) (H)dH )~1 • $ s(H)0(H)f (H)dH H H (5.9) the cerebellar structure shows no apparent mechanism for (5.9) in a single operation, the problem i s to select an iterative scheme which i s both compatible with the given structure and converges toward the optimum weight set in a minimal number of iterations. If the system operates to correct errors, can only store 29 the current values of the weight s e t , and the order and number of training points i s not known, i t would seem reasonable that {W} be changed, thus reducing the estimation e f f e c t , o f the changed estimation space. This scheme w i l l tend should error, so as to minimize the function, on a l l other to minimize the number points i n the of iterations required to obtain a good approximation of the target function. Let A f ( H i ) = f(H!) - f(Hi) (5.10) be the estimation error at some point, H i . Then, adjusting the weight set so that AwTdXHiJ = A f ( H i ) (5.11) w i l l r e s u l t i n an error correcting function Ae(H,H]_) = AW 0(H) T (5.12) which i s superimposed upon the function which existed before the change. In order to minimize the e f f e c t s o f t h i s superimposed error correcting function, use J = J s(H) A e d H + v(AW (t)(H )-Af (Hi)) . H 2 T 1 Standard optimization techniques can be applied to (5.13). dJ dAW dJ dv T = 2\s(W) AWdH = ^ ( H i ) AW + (D(Hi)v = 0 - Af(Hi) = 0 This can be re-written as a matrix equation. (5.13) 30 T 2^sO0)dH d)(Hi)\ (P (Hi) 0j T /AW\ / 0 \ I Af(Hi) i vj \ In order to simplify the notation, let T P = ( s<D(DdH. H (5.15) J Applying row reduction techniques to (5.14) gives an expression for the optimal adjustment, AW. AW = AftHiJP-^OKHi.) (5.16) ^(HiJP-idXHx) At this point i t is useful to look at some of the properties of P. Lemma 1. P is symmetric. T T T pT = ($s00 dH) = $s(dXD )TdH T = jJs0(DdH = P. QED. Lemma 2. P is positive definite. Consider T Y PY where Y is an arbitrary vector. Then T T T Y PY = Y (jJs(t)0 dH)Y T = JsY%D YdH T 2 = Js(Y 0(H)) dH Since {0} is a set of functions which are linearly independent over H, the only vector for which YTcD = 0 for a l l H 31 Thus T Y PY > 0 for a l l non-zero Y (s(H) is positive by definition) and P i s positive QED. definite. • 11 Lemma 3. P exists for a l l real valued n, i s unique, and i s symmetric. It i s well known that the eigenvectors of a positive definite, real symmetric matrix are real and positive. Thus, let P = XLX T where X is an augmented matrix whose columns are the eigenvectors of P and L is a diagonal matrix whose entries are the eigenvalues of P. Then n P° = XL X T (5.17) and (P")T = (XL"xT)T n T n = XL X = P . In particular this proves the existence of P~l. Lemma 4. QED. The error correcting function, Ae(H,Hi) is invariant for non-singular linear combinations of the basis set and i s thus a unique function of the region (H), the function space spanned by {0}, and the estimation error Af(Hj). The error correcting function for the space spanned by {(J)} is given by Ae(H,Hi) = Af(Hi)0 (H )p- ($)(D(H) 1 T 1 0 ( H ) p - (0)0 (Hi) T 1 1 (5.18) Consider Ae(H,H eO) lf where 9 is any non-singular matrix Ae(H,H eq» = Af(Hi)-(D (H1)eTp-1(e(t))-ecD(H) T lr 0 but (Hi) -eTp-1 (0(D) 90 (Hi) T POO) = JseO(H)(D (H)e dH T T = ep((D)eT and p-1(9(D) = (eT)-1?"1 (0)e-i. Thus T 1 Ae(H,H 00) = Af0 (Hi)P" (0)0(H) lf 0 T (H 1 )P-1 (0)0 (Hi) = Ae(H,H 0) . lf Lemmas 3 and 4 suggest a useful simplification, i 0 = (( sTTTrTdH)-^ = P-£(TT) Since 0(H) = ©IT (H) . Combining (5.19) and (5.20), one obtains P(0) = J( s9TrirT0 dH T H = 0P(T)0T = p-?P( -C)T P =l Thus AW = Af(H )0(H ) 1 1 OTfHnWH!) m 33 and Ae(H,H T 1; = Af(Hi)(D (H )(D(H) (5.22) 1 T 0 (Hi)(J)(Hi) In this way, P Equation _ 1 may be deleted from (5.16). (5.21) results estimation error in a scheme which tends to reduce the for a given function. Further analysis of the set generated by (D = P(Tr)-|'ir (5.23) indicates an even more efficient approach, employing the properties of orthonormal functions. Lemma 5. The set of basis functions given by (5.23) is orthonormal over the weighted region. Consider r j j = (Oi(H) 0j(H)) f = J s(Di(H)(Dj(H)dH H Then, R, the matrix whose entries are r j j can be written R = ( s(txDdH T = P. It has been previously shown that P(0) = I for 0 given by (5.23). 34 Thus (0 i f i * j 1 (l if i = j rij = and (5.24) {0(H)} i s an orthonormal s e t over As a f u r t h e r improvement o f the weighted r e g i o n . (5.21). QED. Consider AW = A f (Hi )fl)(Hi) (5.25) k where MAX[(D (D] « 1. T k Let the density weight s e t be adjusted at a total of k points with a point u(H). s(H) The sequence u(H) =7 of AW's (5.26) Js(H)dH at successive weights a r e i n i t i a l l y z e r o , w i l l AWi AW = 2 points, be ffHiXMHi), = f(H )0(H ) - f(H!)(J) ^2)0(^)0^2), 2 T 2 k AW learning k 2 = f ( H ) 0 ( H 3 ) - f(H )(P (H )(p(H2)(J)(H3) T 3 2 3 k k 3 2 4- f ( H ) ( D T H ) ( D ( H ) ( D T ( H ) ( D ( H ) ( ] ) ( H ) , 1 k etc. 3 ( 3 2 2 1 3 assuming all 35 Yielding W = 5 AWi k = _L If(Hi)(D(Hi) k i=l k --L_ k 2 If(H _ )(DT H )(D(H _ )(l)(H ) + ... i=2 i 1 ( i i 1 (5.27) i Since the points have a density given by u(H), the f i r s t term of (5.27) is an approximation of the weighted integral (5.28) - _ L l f (Hi)O(Hi) . (5.29) k Comparing (5.29) with (5.9) and applying Lemma 5, i t can be seen that, after k points, the weight set approximates the optimal set. Continuing the procedure with additional learning points w i l l improve the estimate as long as Afi - Afi-i- Similarly, a sequence of learning steps at k points w i l l weight set so that any resultant estimation error adjust the i s approximately orthogonal to a l l elements of {(*)}. That i s , the error is minimized and {W} approximates the optimum set. Unless the estimated function can be made to be identical to the target function and the training data is noiseless, the values of the elements of {W} will change at each point as learning is applied. That i s , {W} will not in general converge in the set. usual sense, but will tend to fluctuate about the optimum weight The sequence of weight sets w i l l , however, have a point of 36 accumulation at the location of the optimum set. It i s important to note at this point denominator of equation (5.21) with a constant, that replacing the k, results in error corrections which are un-normalized. That i s , despite applying to correct an estimation error, an error may still (5.21) exist. This variation of the learning algorithm, which w i l l be discussed further in the following chapters, requires that an additional element be added to the system equations. Specifically, the system equations become T f = W<t) f = f + Af-g(t) Af (5.30) = f-f Aw = Af<D k where f = the estimated function (Purkinje c e l l output) Af=system error term f = system corrected output (Subcortical Nuclear c e l l output). g(t) =1 and g(t+At) = 0 That i s , the system requires an element which evaluates prior to any weight adjustments being (f+Af) made, holds this value, then permits i t to decay to the new value of f. Various procedures are available to permit the learning procedure to be halted. The most direct approach i s to require learning only i f the obtained response (either the estimated function or some physical response of the system) deviates sufficiently from the desired response. This in effect replaces the single valued target function with a band of acceptable estimations. 37 That is if| Afl * e 0 (5.31) AfQ i f I A f l > e ! k The learning algorithm i s also effective even i f the distribution of points i s not known a p r i o r i . If possible, the expected point density can be estimated, or i f not, i t may be replaced by a constant (over a closed reqion) when deriving the basis set. Each time the weights are adjusted, an error correcting function i s superimposed upon the previous function. The learning algorithm ensures that interference effects at each point are minimized at a l l other points in the learning hyperspace. This approach will result in a sequence of weights which tends to converge presented and toward the optimal set. Other researchers have arguments showing that similar devices such as /Adalines [10] Perceptrons [32] exhibit strong convergence tendencies. /Although not optimized, these devices are similar to this one in that they also approximate functions as weighted sums. Regardless of the known and unknown parameters of the system, i t i s necessary to select k in (5.25). There are two opposing considerations: stability and learning rate. Larger values of k reduce the magnitude of perturbations, due to points with low probability or high noise, of the estimation function, while smaller values of k reduce the number of points required before the estimation function begins to approximate the target function. Unless very rapid learning i s required, k should be determined to ensure that the fluctuations in f are acceptably small. 38 That i s , with re-arrangements to (5.25) MAX[Ae(H,H!)] = MAXfAf ( H ) C D ( H T J O ( H ) ] . (5.32) T X k Thus, k must be sufficiently large so as to ensure that the effect, of an adjustment at a single point, i s less than €• over a l l points in the space. T k > MAX [ A f (Hi) (|) (Hi) (]) (Hi) ] (5.33) e To guarantee that continuous iterations at any single point will be stable and converge to generate f(Hi) exactly (pointwise convergence), 0 < (D (Hi)(D(Hi) < 2. T (5.34) k The next chapter will present examples which demonstrate the effects, upon the learning system, of varying the values of k and €. 5.3 System Properties The preceding section has developed a system which can learn to approximate, functions as with minimum linear mean-square combinations of estimation an error, arbitrary orthonormal set of basis functions. The system uses an iterative error correcting strategy in 39 which the weights are adjusted at each stage according to the expression (repeated here f o r convenience) AW = AfJHiJCKHTj . k The factor k depends upon: and 1. the basis set {0} chosen, 2. the required precision of the f i n a l estimation function, i s a constant which may be computed i n advance and/or altered at any time. Concerning property the i s that system's operating the resultant characteristics, estimation function a useful i s capable o f estimating a target function with an error which i s l e s s than the error correcting mechanism can detect. mechanism being rather inexact In other words, despite in i t s ability error, the f i n a l estimation function w i l l , precise the approximation statement, corrected detectable) consider whenever a of target learning the estimation tolerance, G, as given error by to detect or correct an i n general, be a much more function. system, the t r a i n i n g To the weights exceeds an support this o f which are acceptable (5.31). If the function (or space spanned by the basis functions i s such that the tolerance of G can i n fact be met for the whole hyperspace, then the resulting estimation will l i e within a region bounded by f(H) -G < f (H) < f(H) +G as shown i n Figure 5.1. For such a bounded, continuous, (5.35) estimation function |f(H!) - f t f i T j | « G (5.36) 40 f (H) ^ -— - e H Fig 5.1 A Bounded Estimation Function over much o f the region, H. Hence, f produces a better terms of the desired r e s u l t , than could detecting/correcting mechanism alone. be obtained A somewhat estimate, i n using the error similar property, c a l l e d "learning with a c r i t i c " , has been described for an Automatically Adapted Threshold Logic Element (Adaline) Adeline, i s taught the optimal [54]. In that experiment, the strategy blackjack s t r i c t l y on the basis of whether f o r playing the game of i t wins or loses a game. That i s , despite the fact that the error detecting mechanism determines that the estimation function i s i n error i f , and only i f , a game i s l o s t , the Adaline learns to generate the function which optimizes the strategy for winning games. 42 VT 6.1 General SYSTEM PERFORMANCE Considerations The previous chapter has derived an algorithm which minimizes the interference of learning a r b i t r a r y functions, when using an i t e r a t i v e , point-by-point algorithm, i n a cerebellum-like system. This chapter w i l l demonstrate the effectiveness of the algorithm. The d e r i v a t i o n of the learning algorithm i s independent of the function-space which i s defined by the basis set. Thus, the algorithm guarantees optimal performance (for that space) independent set o f functions {<$}, which s a t i s f y which basis implementation most likely set to use is a matter of f o r every linearly (5.19). The choice o f selecting, subject to constraints, that function-space and basis set which are to match the function or functions to be estimated. Multinomials are a reasonable choice f o r a large number of applications, and are the functions which w i l l be used i n the examples presented i n t h i s chapter. That i s , the terms to be used are those generated by n (1 + £ x i )m. i=l (6.1) Application of Lemma 4 permits the binomial c o e f f i c i e n t s to be dropped from each term, y i e l d i n g 2 T T = (l,x ,x ,...,x ,xi ,..,x m) 1 and 2 n <D = (C sTTH^dH •'H Due to the absence of other constant i n the following examples. n TT information, s(H) w i l l (6.2) (6.3) be set to a 43 Some t y p i c a l , normalized, error correcting shown i n F i g u r e 6.1. I t c a n be seen t h a t resemblance The on t o normal probability density f i g u r e s a l s o show t h a t low-order polynomials functions, these Ae(H,Hi,0) , are functions functions with l i m i t a t i o n s imposed by u s i n g result i n error correcting bear a mean loose of Hi. a system based functions whose maxima a r e o f t e n "skewed" o r s h i f t e d from H]_. In regard listed t o {Tf}, i t should be noted that any number o f t h e terms i n (6.2) may be d e l e t e d , w i t h o u t j e o p a r d i z i n g system performance, i f t h e c o e f f i c i e n t s o f those terms, i n t h e t a r g e t functions, a r e known to be z e r o . Another important parameter which was d i s c u s s e d i n the previous c h a p t e r i s k which must be s e l e c t e d i n r e l a t i o n t o b ( H i , H ) = MAX[(D (H )0(H )] T 2 as g i v e n by (5.31), (5.32), 1 2 (5.33). F o r any b a s i s (6.4) s e t {(*)}, b w i l l be maximum a t some p o i n t , H5, where | 0 ( H ) | > |<D(Hj)| (6.5) b for a l l Hj 7* H , b since <D (Hi)0)(Hj) = 10(Hi) I |<D(Hj) Icose T (6.6) where e i s t h e "angl e" between the v e c t o r s 0(Hj) <D(Hj) - and Thus b = (D (H )(D(H ) = |(p(H ) | c o s 0 • T 2 b b b 2 = IO(H )I . b (6.7) 44 45 Values of b=MAX[O 0] for Various Values of m and n T Table 6.1 n m b 1 2 4 1 3 1 Average* n m b Average 2.0 2 3 26 6.9 9 3.1 2 4 70 12.3 4 16 4.2 2 5 155 20.0 1 5 25 5.3 3 4 190 32.3 1 6 36 6.4 3 5 553 67.0 •Average: approximate average value of The location of Hb and <1)0 T the value of b are properties of {(J)}. Table 6.1 shows that, for multinomial-based functions, b grows r a p i d l y as m and increase. This in turn means that k, and hence the number of required to obtain acceptable estimations, must be s i m i l a r l y 6.2 points increased. Examples of Learning a Function of a Single Variable In order and to demonstrate the operation of the training algorithm, i t s a b i l i t y to estimate functions, the function f = sin ( 2 T x ) (6.8) over was n (0 <x< 1) "taught" to a number of computer-based models of the target function functions, is shown in Figure 6.2. i n a least-square-error shown i n Figure 6.3 sense, to The the system. optimal target for polynomials of various orders. The estimation function These are functions 46 Fig 6.2 Target Function Sin (2TTx) over (0<x<l) were calculated using the optimizing formula derived as equation (5.9). Chapter 5 suggests that, as learning progresses, the estimation function grows, then settles, toward the optimal estimation of the target function. This sequence of functions i s shown i n Figure 6.4. For large values learned o f k, C=0, estimation and many t r a i n i n g functions functions shown previously. can points, closely Figure 6.5 approximate shows the that optimal 47 C) m=6,7 RMS ERROR=0.0050 EMAX<=0.016 d ) m=8,9 RMS ERROR=0.00019 EMAX=0.00066 Fig 6.3 Optimal Polynomial /Approximations o f Sin (2TTx) 48 0.0 0.25 0.5 0.75 1.0 0.0 C) IS POINTS F i g 6.4 0.25 0.5 d) 40 POINTS Sequences o f Estimations o f Sin (2TTx) For m=4, k=8 0.75 1.0 F i g 6.5 Near-optimal Estimations of Sin (2TTx) 50 Once the order of the estimation system has been selected, values of k and G must be determined. The e f f e c t s , on the learning system, o f varying k and G are shown by various calculated measures: 1. EMAX, the maximum observed estimation i n the course Af(Hj), error, o f processing which was the preceeding J points, 2. CHANGES, the t o t a l number o f times the weight set has required adjustment, 3. ERMS, an estimate o f the RMS estimation error over the preceeding J points J ERMS = (1 £ A f ( H i ) )<, J i=l 2 4. SF, a measure (6.9) o f the perturbations i s calculated o f the estimated function, which as the change over the preceeding J points, i n the values o f the elements o f the weight set (SF = s t a b i l i t y factor) T SF = (_1 (W -W _ ) (W -W _ ) )£ L i where i J i i (6.10) J i s the current weight set Wi_j i s the weight set p r i o r to any adjustments resulting from applying the learning algorithm at the previous J points, 5. POINTS, observed. the t o t a l number o f points which have been Table 6.2 The Learning Algorithm as Effected by k and G Estimating Sin (2Tx) for 0<x<l m=4, J=100 a) Effect of k €=0.0 u(H)=s(H)=uniform POINTS k EMAX ERMS 100 200 400 600 10 0.76 0.29 0.25 0.34 0.19 0.077 0.087 0.093 0.36 0.059 0.036 0.035 100 200 400 600 50 0.86 0.19 0.21 0.27 0.36 0.080 0.070 0.077 0.30 0.042 0.0086 0.012 100 200 400 600 800 100 0.91 0.37 0.17 0.23 0.22 0.46 0.19 0.074 0.074 0.069 0.21 0.084 0.0012 0.0063 0.0045 b) Effect of G SF k=100 u(H)=s(H)=uniform POINTS g CHANGES EMAX ERMS SF 200 400 600 800 0.10 153 224 253 262 0.37 0.17 0.15 0.11 0.19 0.079 0.077 0.074 0.080 0.015 0.0090 0.0012 200 400 600 800 0.15 133 178 184 184 0.37 0.20 0.16 0.15 0.20 0.095 0.098 0.088 0.074 0.0093 0.0043 0.0 52 These measures are used since they show important properties of the learning system, are r e l a t i v e l y easy to calculate i n conjunction with applying the learning algorithm, and automatically take into account the e f f e c t of non-uniformly d i s t r i b u t e d training points. It w i l l though, that due measures tend to the training points being to oscillate rather than be noted, randomly generated, decrease the monotonically. Table 6.2 demonstrates the following trade-offs i n r e l a t i o n to k and G: 1. increasing required k increases before the the system number begins of to training settle points toward the optimal estimate, 2. once the estimation approaches the optimum, larger values of k reduce the magnitude of perturbations as indicated by smaller values of SF, 3. smaller values of G produce estimation functions which tend to have less RMS error, and 4. larger values of G tend to reduce error at the cost of larger RMS Chapter 5 also predicts the maximum estimation errors. that, although i t s performance will be sub-optimal, the learning algorithm w i l l continue to function i n cases where the point density i s not which was Although used to Table 6.3 disadvantage generate does of t h i s equivalent to the error the support basis this set, as prediction, cost density required by i t also approach since many more training (5.26). shows points a (1500 versus 1000 t r a i n i n g points) are required before the estimated function approaches the optimum. 53 Table 6.3 Learning Sequences for Cases Where u(H)^s(H) Estimating Sin (2TTx) for 0<x<l 6=0.0, k=100, J=100 , u(H) =uniform : i s gaussian (mean=0 .5, variance=l.0) b) 6.3 POINTS EMAX ERMS SF 200 0.66 0.39 0.12 400 0.34 0.19 0.056 600 0.18 0.11 0.026 800 0.19 0.074 0.0095 1000 0.17 0.069 0.0058 s(H) i s gaussian (mean=0.5, variance=0.5) POINTS EMAX 200 0.81 0.51 0.12 400 0.58 0.34 0.077 600 0.40 0.25 0.053 800 0.29 0.16 0.033 1300 0.15 0.081 0.012 1500 0.13 0.067 0.0062 ERMS SF Soft Failure An interesting, and p o t e n t i a l l y u s e f u l , when assessing the value o f t h i s learning factor to be considered algorithm i s i t s property o f s o f t - f a i l u r e . That i s , should one or more elements of {0} inoperative, or {W} become the system w i l l s t i l l be capable of representing a r b i t r a r y 54 functions to any required accuracy, possibly a f t e r additional t r a i n i n g , as long as the region of representation i s permitted to be s u f f i c i e n t l y small. To prove t h i s conjecture, i t i s only necessary to show that at least one functional element o f {$} i s non-zero estimation i s required. Multinomial-based at the point where systems can be arranged so that few ( l i k e l y no more than n) elements o f {0} have values o f zero at the same point and hence systems based on these sets w i l l exhibit s o f t f a i l u r e . In terms of a physical device which implements the algorithm, t h i s means that the device may be u s e f u l , possibly over a reduced range, despite the f a i l u r e of some of i t s components. When considering systems which employ splines or harmonic s e r i e s , i t i s seen that at a number of points i n the region of d e f i n i t i o n , a l l but a limited number of the elements of the basis function set are zero. Should f a u l t y components cause a l l these non-zero elements to become inoperative, the system's output w i l l be a f i x e d , erroneous, value. Thus, the present system of multinomial-based functions has a s i g n i f i c a n t , p r a c t i c a l advantage. 6.4 Learning Functions of Several Variables In order learn to demonstrate the a b i l i t y o f the learning algorithm to and generate several applied to a more complicated functions o f several variables, i t was model. The model used i s based upon the geometry o f a human arm as shown i n Figure 6.6 [26], The system, whose model was programmed on a d i g i t a l computer, i s best described by the block diagram of Figure 6.7. In t h i s f i g u r e , i t can be seen that a desired wrist p o s i t i o n , i s used as the input to 55 Fig 6.6 The Geometry of the Model Arm (after [26]) the estimation (learning) system. Values of {(J)(HTJ} are then generated and used to compute each j o i n t angle according to fjJHi) = (DT(Hi)Wi f ( H i ) = 0 (Hi)W2 T 2 f ( H i ) = (D (Hi)W3 T 3 f ( H i ) = (D (Hi)W4 T 4 where i s the weight set associated with f j . . (6.11) 56 CALCULATED £ LEARNING SYSTEM 1 POSITION MODELED I > ARM Af ERROR CORRECTING SYSTEM THRESHOLD SENSOR 7 DESIRED POSITION TARGET POSITION GENERATOR Fig 6.7 These joint Block Diagram of the Arm Model Learning System angles are p o s i t i o n . This p o s i t i o n then used to compute the i s compared with the desired resultant one, and error i s greater than 6, the angular corrections are computed wrist i f the (modeling a sensory feedback loop), the system output, "f, i s corrected with A f to obtain is applied. the It desired is wrist position, particularly and interesting the to learning note the algorithm similarities between Figure 6.7 and the cerebellar system shown i n Figure 2.1. The model was presented with randomly generated points which have a 57 uniform d i s t r i b u t i o n inside a box which i s located i n the region 5 < x < 13 -5 < y < 12 0 < z < -9 where a l l dimensions are i n inches. Table 6.4 demonstrates the rapid learning that the model learns rate of t h i s model and shows to position the wrist with good accuracy a f t e r learning at approximately 3000 points. The e f f e c t o f varying demonstrated. That i s , larger values o f € tend error while increasing the RMS e r r o r . G i s also to reduce the maximum Table 6.4 m=5 k=500 Learning Sequences f o r the Arm Model J=500 Arm f l a p (alpha)=40 degrees u(H) = s(H) = uniform ERMS and EMAX are i n inches a) 6=0.0 POINTS CHANGES ERMS EMAX SF 1000 1000 6.70 18.3 0.045 2000 2000 0.99 2.52 0.0066 3000 3000 0.19 0.73 0.0010 4000 4000 0.13 0.48 0.0004 5000 5000 0.14 0.84 0.0005 ERMS EMAX SF b) G5 = 0 . 4 POINTS CHANGES 1000 1000 6.70 18.3 0.045 2000 1934 0.99 2.52 0.0065 3000 2150 0.33 0.68 0.0003 4000 2171 0.32 0.46 0.0001 5000 2188 0.32 0.64 0.0003 59 VII •7.1 DISCUSSION AND CONCLUSION Physiological Implications The algorithm which has been developed i n t h i s thesis i s based upon the known anatomy and physiology therefore reasonable s i m i l a r to those o f the mammalian cerebellum. It i s to predict that cerebellar operations may be very of the learning algorithm herein described. That i s , the mathematics o f modifiable synapses, and the functions o f cerebellar c e l l s may well be those given by equations (4.1), (4.3), (5.30), and (5.31) which specify the learning system derived by t h i s t h e s i s . An important property o f t h i s system i s that a l l inputs are treated i d e n t i c a l l y . That i s , there i s no need to d i f f e r e n t i a t e between Mossy are fiber inputs which related to sensory or those peripheral information and those which are related to "context" or commands. This permits both sensory and command parameters to be treated as continuous variables so that commands inherently contain rate factors. Thus "walk", "run", and "sprint" may be the same command, "move", at various i n t e n s i t i e s . The lack o f s p e c i f i c i t y also means that there for an exact specific target mapping Granule c e l l ) area of Mossy fibers to s p e c i f i c i n the cerebellar cortex. i s a l l that i s required, thus i s no need points Rather, reducing (or to a a general the amount o f information which must be stored by cerebellum-related genes. Both of the above properties represent significant improvements over existing cerebellar models. In p a r t i c u l a r , t h i s model resolves the d e f i c i e n c i e s o f Albus 1 theory [2] by presenting a feasible mapping of 60 Mossy fiber activity to Purkinje inputs, both commands and cell peripheral activity data, and by treating a l l i d e n t i c a l l y as continuous variables. The model has predicted that corrections to Purkinje c e l l are not normalized. That i s , Climbing fiber activity (Af) activity causes the synaptic weights of the target Purkinje c e l l to change, resulting i n a change i n the frequency of that Purkinje c e l l ' s action p o t e n t i a l s which is not cerebellar exactly system to Subcortical Nuclear generating is equivalent function cells to in Af* this perform a exactly. The weight permit manner the proposed i t i s necessary "sample-and-hold" an output from the cerebellum corrected To (Af) function, (f) (see equation 5.30) adjustments, that which thus which correspond to learning, are not normalized for several reasons: 1. to permit weight computations; a adjustments function of to the be pre strictly and post-synaptic a c t i v i t i e s at the synapse whose weight i s being 2. to permit the optimal weight set to be local adjusted, computed as the approximation of an integral as given by equation (5.29), 3. to speed factors convergence in techniques, 4. to aid i n the numerical manner of optimization convergence and root gain finding and system stabilty by reducing the effects of correcting errors at infrequent points. If further un-normalized iterations at physiological operation, any the single experiments algorithm learning may point should be act disprove modified to so compute adjustments which r e s u l t i n changing f by an amount equal to the this that weight exact 61 error at that point. That i s , continuous re-applications o f the weight adjustment algorithm will reduce the estimation error to zero, thus computing the weight adjustment as AW = AfO(Hi) k . ( 7 a ) k <DT ( H L ) (D ( H L ) which i s r e a l l y the equation given previously as (5.21). Future experiments may also show that P a r a l l e l not constant as implied fiber activity i s i n Chapter 5. That i s , i t may be as suggested by Eccles [14], that P a r a l l e l fiber a c t i v i t y i s normally i n s i g n i f i c a n t , rising during cerebellar level. In t h i s case, Purkinje rising briefly, then computations, then cell returning returning to a near-zero a c t i v i t y would also be transient, to i t s normal, spontaneous, rate. To account f o r t h i s modification of the nature o f cerebellar operation, the theory again requires combination of phasic activity, would be learning algorithm 7.2 The Learning The only and tonic) used to form variation. activity, Namely phasic rather (or a than s t r i c t l y the orthonormal tonic set to which the i s applied. Algorithm learning slight alogrithm and Machine Intelligence described in this thesis may be applied d i r e c t l y to learning machines. The cerebellum, a f t e r which the system i s modeled, i s an extremely e f f e c t i v e motion c o n t r o l l e r . This suggests that the current system may prove e f f e c t i v e i n a number of applications as an adaptive c o n t r o l l e r . Potential where complicated, possibly applications unknown, non-linear include control situations equations o f 62 several variables are found such as i n power system control [52], i n i n d u s t r i a l process control [30,49], and in robotics [5,6], Another a p p l i c a t i o n of the system i s i n pattern system has been shown to be capable of recognition. The simultaneously learning to generate a number of functions of several v a r i a b l e s . If these variables are parameters derived from a family of patterns, and i f the output functions are the p r o b a b i l i t i e s that a given input pattern corresponds to each of a number of c l a s s e s , then the device i s r e a l l y a pattern recognizer. In these and other applications, a h i e r a r c h i c a l network of learning machines [4,31] such as that shown i n Figure 7.1 may prove effective. In t h i s arrangement, successively higher l e v e l s control correspondingly higher operations. Each device d i r e c t s lower l e v e l s , and levels. large The Figure numbers of performs processes shows a inputs information conceptual and i t s own outputs control to be arrangement. which can functions, used by In higher p r a c t i c e , the be handled by a cerebellum-like learning machine permit a s i n g l e physical device to act as several l e v e l s of the hierarchy simply by parameters as input v a r i a b l e s . This using computed feedback provides such with the c a p a b i l i t y to compute and/or control complicated also poses interesting problems regarding programming (output) a system functions. It or teaching strategies for the system. The learning algorithm may also be considered the Polynomial Discriminant Functions as a refinement of [29,50,51] discussed in Chapter 5. The advantage of the new approach i s to remove the requirement of using a fixed number of training points. With the new upper l i m i t purposes. on the number of points which may algorithm, there i s no be used for training OUTPUT LEVEL 2 LEVEL 1 LEVEL 0 "T^v T>—7 EXTERNAL DATA Fig 7.1 A Hierarchical System of Learning Machines 64 F i n a l l y , the p a r a l l e l processing properties of t h i s learning system are important. It should are used be remembered that the to compute a l l the output functions output functions and error corrections may same basis so that be calculated any functions number of simultaneously. A l l output c e l l s are isolated from each other, thus permitting s e l e c t i v e adjustment of estimating functions; only those functions whose error i s excessively large requiring adjustment at any time. 7.3 Contributions Of This Thesis The major contribution of this thesis i s the development of a system which has the capacity to learn to approximate a r b i t r a r y , high order of functions several functions as weighted employing an iterative inversion, the sums variables. of solution system requires By continuous rather representing basis than relatively estimated functions, one which few variable and uses by matrix elements (memory). The learning thesis has also shown that, for orthonormal interference i s minimized i f estimation adjusting the weight set {W} according to the basis errors are sets, reduced by expression AW = AffHiMH].) . (7.2) k This procedure thus reduces the number of i t e r a t i o n s which are required before a r b i t r a r y functions are approximated to a required accuracy. procedure has also been shown to r e s u l t i n a learned produces closely an estimated function weight set which approximating least-square-error estimation of the target function. The the 65 Since cerebellum the system which i s based is a very on the efficient structure adaptive of the mammalian controller, i t shows promise i n a number of applications and as the basis of a new c l a s s of i n t e l l i g e n t machines. A generalized method of constructing the orthonormal sets of functions required by the system has also been presented. The second s i g n i f i c a n t contribution of t h i s thesis i s to present an improved cerebellar anatomical models. and Unlike activities as model which, physiological data, other being consistent i s more cerebellar models, continuous This current p l a u s i b l e than previous this v a r i a b l e s , rather throughout a l l i t s operations. with one simulates than as binary v a r i a b l e s , i s important since, despite " a l l or nothing" character of action p o t e n t i a l s , neural generally thought to be transmitted as neural frequency the information i s coded values. Also, the model proposes an alogorithm which adjusts synaptic weights s t r i c t l y according to the a c t i v i t i e s of the pre and post-synaptic neurons at that synapse. Referring to the post-synaptic learning activity i s of c r i t i c a l the long, narrow, and makes an (7.2), the pre-synaptic algorithm is A f (Hi). This activity property i s <t)(Hi) while of localized importance to a p l a u s i b l e cerebellar model as widespread structure of Purkinje c e l l which requires computations dendrites involving non-local optimized learning i n most research, a c t i v i t y exceedingly u n l i k e l y . 7.4 Areas of Further Research This thesis has succeeded in developing algorithm which models the cerebellum. an However, as 66 while resolving some issues, i t leaves many interesting questions to be answered. In terms of the cerebellum, a number of physiological investigations are indicated: 1. Determine the mathematical form of the mapping from Mossy fiber activity to Granule c e l l activity. determine whether the frequency o f action Parallel f i b e r s can be interpreted In p a r t i c u l a r , potentials i n as a basis set o f the form suggested by t h i s t h e s i s : S(H) = A-KD(H) (7.3) where {(f) (H)} i s an orthonormal set A i s a vector of constants to ensure S(H)>0 {S(H)} i s the set of actual P a r a l l e l f i b e r activity. 2. Determine whether fiber-Purkinje cell the connectivity synapses i s modifiable. of Parallel If some form of p l a s t i c i t y i s found, determine the mathematical r e l a t i o n which governs t h i s p l a s t i c i t y . 3. Determine whether the cerebellum functions as a phasic or as a tonic device and whether Subcortical Nuclear c e l l s do in f a c t perform a "sample-and-hold" by t h i s t h e s i s . 1 operation as proposed 1 0.0 0.25 0.5 1 0.75 1.0 1.25 1.5 1.75 2.0 0.75 1.0 1.25 1.5 1.75 2.0 a) FIRST ORDER SPLINES in b) SECOND ORDER SPLINES a 0.0 0.25 0.5 C ) THIRD ORDER SPLINES Fig 7.2 One Dimensional Spline Functions 68 There are also a number of mathematical questions, relating to learning machines, posed by t h i s research. The following investigations could prove most i n t e r e s t i n g : 1. There are several properties of spline functions, i n terms of basis functions for learning machines, promising. As shown i n Figure 7.2, a limited learning produce region of non-zero algorithms, error support. In this correcting other words, functions would cause no which each such function has support. When means that functions appear of considering weight adjustments similarly spline-based error limited correcting interference outside of a small region. The major problem with splines i s to generate a useful set of functions which produce continuous functions and do not require excessive numbers of Another possible problem, as discussed elements [41]. i n Section 6.3, i s the potential system f a i l u r e which could result from the f a i l u r e of only a few elements of the basis set generator. 2. The programs which demonstrate the effectiveness of this learning system use extended p r e c i s i o n (64 b i t ) , f l o a t i n g point, d i g i t a l values i n a l l computations. The e f f e c t s of using analog, reduced p r e c i s i o n , or noisy v a r i a b l e s require further investigation. 3. The operational c h a r a c t e r i s t i c s of the system in a r e a l i s t i c control system environment require t e s t i n g . 4. There i s much strategies for t h e o r e t i c a l work training or networks of learning systems. required "programming" to determine hierarchical 69 5. The system i s r e l a t i v e l y expensive and slow when modeled on a general purpose, digital e f f e c t i v e , the system should special purpose, device. computer. be constructed To be as a most single, 70 BIBLIOGRAPHY Cited Literature [I] J.S. Albus, A Theory o f Cerebellar pp25-61, Feb 1971. Function, Math. B i o s c i . , 10, [2] J.S. Albus, Theoretical and Experimental Aspects o f a Cerebellar Model, Ph.D. Thesis, University o f Maryland, Dec 1972. [3] J.S. Albus, A New Approach to Manipulator Control: The Cerebellar Model A r t i c u l a t i o n Controller (CMAC), Trans. ASME Ser.G., 97, pp200-227, Sept 1975. [4] J.S. Albus, Data Storage i n the Cerebellar Model A r t i c u l a t i o n Controller (CMAC), Trans. ASME Ser.G., 97, pp228-233, Sept 1975. [5] J.S. Albus and J.M. Evans,Jr., Robot Systems, S c i e n t i f i c 234, pp77-86b, Feb 1976. [6] L.A. Alekseeva and Y.F. Golubev, An Adaptive Algorithm for S t a b i l i z a t i o n of Motion of an Automatic Walking Machine, Eng. Cybern., 14, No.5, pp51-59, 1976. [7] C C . B e l l and R.S. Dow, Cerebellar C i r c u i t r y , Prog. Bui., 5, No.2, ppl21-222, 1967. [8] S. Blomfield, Arithmetic Operations Brain Res., 69, ppll5-124, 1974. [9] V. Braitenberg and R.P. Atwood, Morphological Observations on the Cerebellar Cortex, J . Comp. Neurol., 109, ppl-27, 1958. [10] R.J. Brown, Adaptive Multiple-Output Threshold Systems and Their Storage Capacities, Technical Report 6771-1, Stanford Electronics Laboratories (SU-SEL-64-018), June 1964, AD 444 110. [II] T.W. Calvert and F. Meno, Neural Systems Modeling Applied to the Cerebellum, IEEE Trans. Syst., Man, & Cybern., SMC-2, pp363-374, J u l y 1972. [12] J.C. Eccles, M. Ito, J . Szentagothai, The Cerebellum as a Neuronal Machine, Springer-Verlag, New York, 1967. Performed American, Neurosci. Res. by Nerve Cells, 71 [13] J.C. Eccles, The Dynamic Loop Control Hypothesis of Movement Control, i n Information Processing i n the Nervous System, K.N. Leibovic, ed., Springer-Verlag, New York, 1969. [14] J.C. Eccles, The Cerebellum as a Computer: Patterns i n Space and Time, J . Physiol. Lond., 229, No.l, ppl-32, 1973. [15] B.R. Gaines, Stochastic Computing Systems, i n Advances in Information Systems Science, J.T. Tou, ed., Plenum Press, New York, 1969. [16] A.R. Gardner-Medwin, The r e c a l l of events through the learning of associations between their parts, Proc. R. Soc. Lond. Ser.B, 194, pp375-402, 1976. [17] P.F.C. G i l b e r t , A Theory of Memory that Explains the Function and Structure of the Cerebellum, Brain Res., 70, ppl-18, 1974. [18] G.H. Glasser and D.C. Higgins, Motor S t a b i l i t y , Stretch Responses and the Cerebellum, i n Nobel Symposium, I. Muscular Afferents and Motor Control, R. Granit, ed., Almquist and Wiksell, Stockholm, ppl21-138, 1966. [19] P.G. Guest, Numerical Methods of University Press, London, 1961. [20] M. Hassul and P.D. Daniels, Cerebellar Dynamics: The Mossy Fiber Input, IEEE Trans. Bio-Med, BME-24, pp449-456, Sept 1977. [21] D.O. [22] J.K.S. Jansen, K. Nicolaysen, T. Rudjord, Discharge Patterns of Neurons of the Dorsal Spinocerebellar Tract Activated by S t a t i c Extension of Primary Endings of Muscle Spindles, J . Neurophysiol., 29, ppl061-1086, 1966. [23] J.K.S. Jansen, K. Nicolaysen, T. Rudjord, On the F i r i n g Pattern of Spinal Neurons Activated from the Secondary Endings of Muscle Spindles, Acta Physiol. Scand., 70, ppl83-193, 1967. [24] H.H. [25] Y. Kosugi and Y. Naito, An Associative Memory as a Model for the Cerebellar Cortex, IEEE Trans. Syst., Man, & Cybern., SMC-7, pp94-98, Feb 1977. [26] P.D. Curve Hebb, The Organization of Behaviour: Theory, Wiley, New York, 1949. Fitting, A Cambridge Neuropsychological Kornhuber, Motor Functions of Cerebellum and Basal Ganglia: The Cerebellocortical Saccadic (Ballistic) Clock, the Cerebellonuclear Hold Regulator, and the Basal Ganglia Ramp (Voluntary Speed Smooth Movement) Generator, Kybernetic, 8, No.4, ppl57-162, 1971. Lawrence and W-C. L i n , S t a t i s t i c a l Decision Making i n the Real-Time Control of an Arm Aid for the Disabled, IEEE Trans. Syst., Man, & Cybern., SMC-2, pp35-42, Jan 1972. 72 [27] H.C. Longuet-Higgins, D.J. Willshaw, O.P. Buneman, Theories of Associative Recall, Q. Rev. Biophys., 3, No.2, pp223-244, 1970. [28] D. Marr, A Theory of Cerebellar Cortex, J . Physiol. Lond., pp437-470, 1969. [29] W.S. Meisel, Potential Functions in Mathematical Pattern Recognition, IEEE Trans. Comput., C-18, pp911-918, Oct 1969. [30] M.D. Mesarovic, The Control of Multivariable Systems, Technology Press and John Wiley and Sons, New York, 1960. [31] M.D. Mesarovic, D. Macko, and Y. Takahara, Theory of H i e r a r c h i c a l , M u l t i l e v e l Systems, Academic Press, New York, 1970. [32] M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, [33] J.A. Mortimer, A Computer Model of Mammalian Cerebellar Cortex, Comput. B i o l . & Med., 4, pp59-78, 1974. [34] N.J. Nilsson, Learning Machines, McGraw-Hill, New York, 1965. [35] E. Parzen, Modern P r o b a b i l i t y Theory and Wiley and Sons, New York, 1960. [36] E. Parzen, On Estimation of a P r o b a b i l i t y Density Function Mode, Ann. Math. Stat., 33, ppl065-1076, Sept 1962. [37] R.G. Peddicord, A Computational Model of Cerebellar Cortex and Peripheral Muscle, Int. J . Bio-Med. Comput., 8, pp217-237, 1977. [38] A. P e l l i o n i s z , Computer Simulation of Large Cerebellar Neuronal F i e l d s , Hung., 5, No.l, pp71-79, 1970. [39] A. P e l l i o n i s z and J . Szentagothai, Dynamic Single Unit Simulation of a R e a l i s t i c Cerebellar Network Model, Brain Res., 49, pp83-99, 1973. [40] A. P e l l i o n i s z and J . Szentagothai, Dynamic Single Unit Simulation of a R e a l i s t i c Cerebellar Network Model. I I . Purkinje C e l l A c t i v i t y within the Basic C i r c u i t and Modified by Inhibitory Systems, Brain Res., 68, ppl9-40, 1974. [41] P.M. Prenter, Splines and Sons, New York, 1975. [42] M.H. Raibert, a Model for Sensorimotor Control and Learning, B i o l . Cybern., 29, No.l, pp29-36, 1978. [43] A. Rapoport, "Addition" and " M u l t i p l i c a t i o n " Theorems for the Inputs of Two Neurons Converging on a Third, Bui. Math. Biophys., 13, ppl79-188, 1951. Variational 202, 1969. i t s Applications, John and the Pattern Transfer of Acta Bioch. & Biophys. Methods, John Wiley and 73 [44] F. Rosenblatt, the Perceptron: A Probabilistic Model for Information Storage and Organization i n the Brain, Psycol. Rev., 65, No.6, pp386-408, 1958. [45] F. Rosenblatt, P r i n c i p l e s of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington, 1961. [46] T.C. Ruch, Basal Ganglia and Cerebellum, i n Medical Physiology and Biophysics, T.C. Ruch and J.F. Fulton, eds., Saunders, Philadelphia, 1960. [47] N.H. Sabah and J.T. Murphy, R e l i a b i l i t y o f Computations Cerebellum, Biophys J . , 11, pp429-445, 1971. [48] N.H. Sabah, Aspects of Cerebellar Computation, i n Proceedings of the European Meeting on Cybernetics and System Research, Vienna, Transcripta, London, 1972, pp230-239. [49] M. Simaan, Stackelberg Optimization o f Two-Level Systems, IEEE Trans. Syst., Man, & Cybern., SMC-7, pp554-557, J u l y 1977. [50] D.F. Specht, Generation of Polynomial Discriminant Functions f o r Pattern Recognition, Technical Report No.6764-5, Stanford Electronics Laboratories (SU-SEL-66-029), May 1966, AD 487 537. [51] D.F. Specht, A Practical Technique for Estimating General Regression Surfaces, Lockheed Palo Alto Research Laboratory, June 1968, AD 672 505. [52] B. Stott, Power System Dynamic Response Calculations, Proc. 67, pp219-241, Feb 1979. IEEE, [53] D.F. Stubbs, Frequency and the Brain, L i f e ppl-14, 1976. No.l, [54] B. Widrow, N.K. Gupta, S. Maitra, Punish/Reward: Learning with a C r i t i c i n Adaptive Threshold Systems, IEEE Trans. Syst., Man, & Cybern., SMC-3, pp455-465, Sept 1973. General [1] i n the Sciences, 18, References P. Cress, P. Dirksen, J.W. Graham, Fortran IV with Watfor and Watfiv, Prentice-Hall, Englewood C l i f f s , New Jersey, 1970. 74 [2] H.W. Fowler and F.G. Fowler, eds., The Concise Oxford Dictionary, University Press, Oxford, 1964. [3] A.C. Guyton, Textbook Philadelphia, 1976. [4] P.H. [5] B. Noble, Applied Linear Algebra, Prentice-Hall, Englewood C l i f f s , New Jersey, 1969. [6] B. Rust, W.R. Burrus and C. Schneeberger, A Simple Algorithm for Computing the Generalized Inverse of a Matrix, Comm. ACM, 9, pp381-387, May 1966. [7] S.M. of Medical Physiology, Lindsay and D.A. Norman, Human Academic Press, New York, 1977. Selby, ed., Standard Co., Cleveland, 1968. Mathematical W.B. Information Tables, Saunders, Processing, Chemical Rubber
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A cerebellum-like learning machine
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A cerebellum-like learning machine Klett, Robert Duncan 1979
pdf
Page Metadata
Item Metadata
Title | A cerebellum-like learning machine |
Creator |
Klett, Robert Duncan |
Date Issued | 1979 |
Description | This thesis derives a new learning system which is presented as both an improved cerebellar model and as a general purpose learning machine. It is based on a summary of recent publications concerning the operating characteristics and structure of the mammalian cerebellum and on standard interpolating and surface fitting techniques for functions of one and several variables. The system approximates functions as weighted sums of continuous basis functions. Learning, which takes place in an iterative manner, is accomplished by presenting the system with arbitrary training points (function input variables) and associated function values. The system is shown to be capable of minimizing the estimation error in the mean-square-error sense. The system is also shown to minimize the expectation of the interference, which results from learning at a single point, on all other points in the input space. In this sense, the system maximizes the rate at which arbitrary functions are learned. |
Subject |
Cerebellum Artificial intelligence |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065482 |
URI | http://hdl.handle.net/2429/21768 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Unknown |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1979_A7 K54.pdf [ 3.08MB ]
- Metadata
- JSON: 831-1.0065482.json
- JSON-LD: 831-1.0065482-ld.json
- RDF/XML (Pretty): 831-1.0065482-rdf.xml
- RDF/JSON: 831-1.0065482-rdf.json
- Turtle: 831-1.0065482-turtle.txt
- N-Triples: 831-1.0065482-rdf-ntriples.txt
- Original Record: 831-1.0065482-source.json
- Full Text
- 831-1.0065482-fulltext.txt
- Citation
- 831-1.0065482.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065482/manifest