UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A cerebellum-like learning machine Klett, Robert Duncan 1979

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1979_A7 K54.pdf [ 3.08MB ]
Metadata
JSON: 831-1.0065482.json
JSON-LD: 831-1.0065482-ld.json
RDF/XML (Pretty): 831-1.0065482-rdf.xml
RDF/JSON: 831-1.0065482-rdf.json
Turtle: 831-1.0065482-turtle.txt
N-Triples: 831-1.0065482-rdf-ntriples.txt
Original Record: 831-1.0065482-source.json
Full Text
831-1.0065482-fulltext.txt
Citation
831-1.0065482.ris

Full Text

A CEREBELLUM-LIKE LEARNING MACHINE by ROBERT DUNCAN KLETT B.A.Sc, University of British Columbia, 1975  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Electrical Engineering)  We accept this thesis as conforming to the required standard.  THE UNIVERSITY OF BRITISH COLUMBIA July, 1979 (c) Robert Duncan Klett, 1979  In presenting t h i s thesis in p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s thesis f o r s c h o l a r l y purposes may be granted by the Head of my Department or by his representatives.  It i s understood that copying or p u b l i c a t i o n  of t h i s thesis f o r f i n a n c i a l gain s h a l l not be allowed without my w r i t t e n permission.  Department nf  Electrical  Engineering  The U n i v e r s i t y of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5  Date  J u l y 19,  1979  ii ABSTRACT  This thesis derives a new learning system which i s presented as both an improved cerebellar model and as a general purpose learning machine. It i s based on a summary of recent publications concerning the operating characteristics and structure of the mammalian cerebellum and on standard  interpolating and surface fitting techniques  for functions  of one and several variables. The system approximates functions as weighted sums of continuous basis  functions. Learning,  which takes  place in an iterative manner, i s accomplished by presenting the system with arbitrary training points (function input variables) and associated function values. The system i s shown to be capable of minimizing the estimation error in the mean-square-error sense. The system i s also shown to minimize the expectation of the interference, which results from learning at a single point, on a l l other points in the input space. In  this  sense,  the system maximizes the rate  functions are learned.  at which arbitrary  iii TABLE OF CONTENTS Page ABSTRACT  ii  LIST OF TABLES.. LIST OF FIGURES NOTATION  Chapter  II  III  IV  vi vii  ACKNOWLEDGEMENT  I  V  ix  Page INTRODUCTION  1  CEREBELLAR STRUCTURE  3  2.1  The Mossy Fiber Pathway  4  2.2  The Climbing Fiber Pathway  8  MODELING THE CEREBELLUM  9  3.1  Recent Cerebellar Models  9  3.2  Requirements of an Improved Cerebellar Model  13  MODEL SIMPLIFICATIONS AND ASSUMPTIONS  15  4.1  Arithmetic Functions of Cerebellar Neurons  15  4.2  Cerebellar System Considerations  20  V MATHEMATICAL ANALYSIS OF THE CEREBELLAR SYSTEM  23  5.1  The Cerebellum and Estimation Functions  23  5.2  An Optimized Learning Algorithm  27  5.3  System Properties  38  iv TABLE OF CONTENTS (Continued)  Chapter VI  VII  Page SYSTEM PERFORMANCE  42  6.1 General Considerations  42  6.2  Examples of Learning a Function of a Single Variable  45  6.3  Soft Failure  53  6.4  Learning Functions of Several Variables  54  DISCUSSION AND CONCLUSION  59  7.1 Physiological Implications  59  7.2 The Learning Algorithm and Machine Intelligence  61  7.3 Contributions of This Thesis  64  7.4 Areas of Further Research  65  BIBLIOGRAPHY  70  Cited Literature  70  General References  73  V  LIST OF TABLES  Table  Page  2.1 Connectivity of Cerebellar Neurons in Cat 5.1 Number of Terms Generated by Full and Reduced Cartesian Products  7  27  T  6.1 Values of b=MAX[(D(D] for Various Values of m and n  45  6.2 The Learning Algorithm as Effected by k and C  51  6.3 Learning Sequences for Cases Where u(H)^s(H)  53  6.4 Learning Sequences for the Arm Model  58  vi LIST OF FIGURES  Figure  Page  2.1  Block Diagram of the Cerebellar System  3  2.2  Structure of the Cerebellar Cortex  5  3.1  A Perceptron  3.2  Albus  4.1  Model Granule Cell  19  5.1  A Bounded Estimation Function  40  6.1  Error Correcting Functions for a Single Variable  44  6.2  Target Function Sin (2TTx) over  46  6.3  Optimal Polynomial Approximations  6.4  Sequences of Estimations of Sin (2TTx)  1  10  Mossy Fiber Activity Pattern  11  (0<x<l) of Sin (2TTx)  For m=4, k=8  47  48  6.5  Near-optimal  Estimations of Sin (2ITx)  49  6.6  The Geometry of the Model Arm  55  6.7  Block Diagram of the Arm Model Learning System  56  7.1  A Hierarchical System of Learning Machines  63  7.2  One Dimensional Spline Functions  67  vii NOTATION  a)  Standard Forms a  = a scalar  A  = a vector  A  = a matrix  aj  = the i * -  a^j  = the element of A at co-ordinates (i,j)  A  b)  T  T  or A  n  element of A  = the transpose of A or A respectively  Definition of Variables Ae  = an error correcting function  G  = an arbitrary constant, G>0 used as the acceptable estimation error  f  = Purkinje c e l l activity or estimated output function  f  = system corrected output (Subcortical Nuclear c e l l output)  f(H) = target function Af  = f - f = the estimation error  G H  = Golgi feedback matrix = region of definition function  Hi  = a particular point in H  I k  = identity matrix = convergence gain factor  in  the  hyperspace  of  the  control  viii NOTATION (Continued)  L  = number of elements in the basis set {(*)} and in the weight set {TT}  m  = number of elements in a one-variable interpolation function  n  = number of independent input variables or arbitrary power  {0}  = the function-space basis set (Parallel fiber activity)  {(D(HTJ}  = the vector of values of {0} at the point  {TT} = expanded input set s ( H ) = a cost density function 0  = (I4G)  -1  = Granule-Golgi transfer matrix  also used as an arbitrary non-singular matrix u ( H ) = the training point density function v  = arbitrary scalar for optimization techniques  {W}  = the set of variable weights (effective synaptic weights)  X  = augmented matrix whose columns are eigenvectors  XJ  = the i h element of the input set  fc  ix ACKNOWLEDGEMENT  There are many people and institutions to whom I owe much gratitude on the completion of this thesis. My wife, Marian, has shown great patience and understanding while encouraging me in this project. It i s with much gratitude that I acknowledge the assistance of my supervisor, Dr. P.D. Lawrence. His suggestions and criticism have added much to this work while his encouragement has helped to bring i t to a successful conclusion. I wish to thank my co-reader, Dr. E.V. Bonn, and the other members of my  committee for their careful  inspection and  criticism of this  thesis. To my fellow students in Electrical Engineering at the University of British Columbia, I extend stimulating  thanks  for providing an enjoyable and  environment. In particular,  I thank Mr. Rolf Exner for  developing the program which typed this thesis. This research has  been supported  by  the Natural  Sciences  and  Engineering Research Council of Canada under a Postgraduate Scholarship to i t s author and grant No. A9341, and by the University of British Columbia in the form of a Teaching Assistanceship.  1 I  INTRODUCTION  A r t i f i c i a l intelligence and pattern recognition are currently areas of  great  interest.  This  thesis  analyses  a phylogenetically  lower  portion of the brain, the cerebellar cortex, in the belief that such an analysis will  assist  the development of a new type of intelligent  system. It i s further believed, since the higher regions of the brain are phylogenetically newer than the lower ones, that a study of the cerebellum  i s a logical  starting  point  in the study of intelligent  structures in general. The cerebellum  is a part of the brain which i s located in the rear  portion of the head, at the base of the skull. It is generally accepted that the cerebellum  is involved in maintaining posture and co-ordinating  motor activities of the body [7,12,28]. Although other  levels of the  brain can perform these functions, i t has been postulated cerebellum levels  that the  i s designed both to provide finer co-ordination than other  and  to  tasks [17,18,46].  relieve Studying  higher  levels  the cerebellar  of  most  cortex  of  the motor  i s particularly  appealing due to its extremely regular arrangemant of cells (which will be discussed in Chapter 2) and the fact that i t s inputs (only two types) and outputs (only one type) are so well differentiated. Although the following  analysis  i s based on the structure  and function  of the  cerebellum, i t should be established at this point that much additional research  i s required  to show i f the cerebellum  according to the theory to be developed.  in fact  operates  2  There are thus two main goals of this research: 1.  to present a model of the cerebellum which is consistent with current anatomical and physiological data; thus being an improvement over existing models, and  2.  to develop a learning machine, incorporating the improved cerebellar  model, which may  form  the  basis of  a  new  approach to designing intelligent machines. With these goals in mind, the thesis begins by describing the structure of the cerebellum. Chapter 3 presents recent cerebellar models, some of their deficiencies, and the requirements of an improved model. Next, Chapter 4  begins  to  analyse  the  nature  of  possible  cerebellar  computations, leading to Chapter 5 which ends with the development of an optimized learning algorithm which i s consistent with the cerebellar system. The performance and operating characteristics of examples of this system are presented  in Chapter 6. Finally, Chapter 7 discusses  implications of both the new cerebellar model and the learning system, ending with suggestions for further research.  3  II  CEREBELLAR STRUCTURE  There have been numerous publications dealing with the cerebellum. Unless otherwise indicated, the following information is derived from some of those which deal with the cat [7,12,48]. A block diagram of the interconnections of neurons in the cerebellum is shown in Figure 2.1 in  PURKINJE PARALLEL FIBERS  CELLS  h  7"% GRANULE  GOLGI  CELLS  CELLS  BASKET t, STELLATE CELLS  0  CEREBELLAR CORTEX  SUBCORTICAL NUCLEAR CELLS  KEY • EXCITATORY SYNAPSE - INHIBITORY SYNAPSE TO MOTOR CENTERS VIA VARIOUS ROUTES  MOSSY FIBERS  Fig 2.1  CLIMBING FIBERS  Block Diagram of the Cerebellar System  which the interconnections are marked to indicate whether they are excitatory or inhibitory. It can be seen that there are only six types o f neurons in the cerebellum. Of these, five are interneurons located  4 in  the  cerebellar cortex which have no  direct  effect  outside  the  cerebellum (Granule, Golgi, Basket, Stellate, and Purkinje c e l l s ) , while the sixth (Subcortical Nuclear cells) are the only c e l l s which do have a direct external effect. There are also only two types of input fibers (Mossy and Climbing fibers). The interconnections of the cerebellar neurons, particularly those in the cerebellar cortex, have been extensively studied, revealing the remarkably regular geometry of Figure 2.2.  2.1  The Mossy Fiber Pathway  The effects of Mossy fiber activity upon the cerebellum are very widespread.  Upon  entering  the  cerebellum,  each  Mossy  fiber  sends  branches to numerous folia throughout the cerebellum. Some collaterals terminate on Subcortical Nuclear cells, while the branches which enter the cortex branch profusely before terminating as Mossy Rosettes in the Granular layer of the cerebellar cortex. The Rosettes form the nucleus of an excitatory synapse between a Mossy fiber, dendrites from numerous Granule c e l l s , and dendrites from a few Golgi c e l l s . Granule cells, each of which are contacted by 2 to 7 Mossy Rosettes (average  4.2),  generate  a  single  cerebellar folia to the Molecular  axon  which  climbs  through  the  layer where i t forms a "T"-shaped  branch. Each branch of the axon, now classed as a Parallel fiber, runs longitudinally along the folia for a distance of up to 1.5 mm  (2.5 mm in  man [9]). Each Parallel fiber traverses a large number of characteristically flattened dendritic trees of Basket, Stellate, and Purkinje cells (up to  5  a) Perspective view of the cerebellar cortex (after [12]). BA=Basket c e l l , CL=Climbing fiber, GO=Golgi c e l l , GR=Granule c e l l , MO=Mossy fiber, P=Purkinje c e l l , ST=Stellate c e l l .  b) Diagramatic view of the Molecular layer of the cerebellar cortex showing the packing arrangement of Parallel fibers and Purkinje c e l l dendrites.  Fig 2.2  Structure of the Cerebellar Cortex  6 300 of the latter). Although there i s some uncertainty concerning the proportion of Parallel fiber-Purkinje c e l l intersections which in fact result in synapses, i t i s clear that the geometry maximizes the number of possible synapses. Those intersections which do result in synapses between Parallel fibers and Purkinje cells, Basket cells, or Stellate cells  are excitatory. The axons of Basket and Stellate cells  form  inhibitory synapses with the dendritic trees and pre-axon areas of nearby Purkinje cells. Purkinje axons terminate in inhibitory synapses on Subcortical Nuclear cells. As well as forming excitatory synapses with Basket, Stellate, and Purkinje c e l l s , Parallel fibers also form excitatory synapses with the dendritic trees of Golgi cells. Unlike the dendritic trees of the other cerebellar  interneurons,  Golgi  cell  dendrites  spread  throughout a  cylindrical volume whose base i s in the Granular layer and whose top i s in the Molecular  layer of the cerebellar cortex. Each tree i s divided  into two regions, the top one being bottom one by Mossy Rosettes.  excited by Parallel  fibers, the  Each Golgi c e l l generates axons which  inhibit a large fraction of the Granule cells which are in the volume enclosed by the Golgi dendritic tree. The capacity of the cerebellum to process (co-ordinate) is  often  discussed  information  in terms of the divergence and convergence of  information  carrying  neurons [13,14,28], Table 2.1 illustrates  properties.  Of particular note  i s the remarkably  large  these  number of  Parallel fibers (up to 200 000) which synapse with each Purkinje c e l l . This convergence factor i s a direct consequence of long, thin, Parallel fibers forming a lattice with fan-shaped dendritic trees of Purkinje cells in the Molecular layer of the cerebellar cortex.  7  Table 2.1 Connectivity of Cerebellar Neurons in Cat  a)  Mossy Fiber Pathway DIVERGENCE  CONVERGENCE  MOSSY FIBERS 460[14]-600[13]  4.2[14]  100[14]-300[13]  100 000[14]-200 000[13]  20 [14]-30[13]  20 000[14]  GRANULE CELLS PARALLEL FIBERS  PURKINJE CELLS  PARALLEL FIBERS  BASKET CELLS 8[14]-50[13]  20 [13]-50[14]  PURKINJE CELLS  b)  Climbing Fiber Pathway DIVERGENCE  CONVERGENCE  CLIMBING FIBERS 10[14]  1[14]  PURKINJE CELLS eg.  Each Mossy fiber makes synaptic contact with between 460 and 600 Granule cells while each Granule c e l l i s contacted by an average of 4.2 Mossy fibers.  8 2.2  The C l i m b i n g F i b e r Pathway  In  c o n t r a s t t o Mossy f i b e r s , the e f f e c t s o f C l i m b i n g f i b e r  a r e v e r y l o c a l i z e d . Although c o l l a t e r a l s which form with  S u b c o r t i c a l Nuclear  very  little  after  c e l l s have been found,  entering  the  cerebellum.  excitatory  Climbing  Each  activity synapses  fibers  Climbing  branch  fiber  which  e n t e r s the c e r e b e l l a r c o r t e x t y p i c a l l y forms a synapse w i t h o n l y one, with  a t most  dendrites excitatory. Basket  and  a  and  cell  Stellate The  understood.  Purkinje body  Climbing  Purkinje c e l l . well  few,  of  fibers cells  cells.  also  which  relation  the  of  This  synapse,  Purkinje form  are  cell  excitatory  which is  very  synapses  i n c l o s e proximity to  Climbing  fibers  engulfs  the  strongly with  the  to G o l g i c e l l s  or  the  target i s less  9 Ill  3.1  MODELING THE CEREBELLUM  Recent Cerebellar Models  A number of theories have been proposed which explain some aspects of cerebellar function and theories  do  algorithm  not  which  Alternatively,  organization. Unfortunately,  acceptably  explain  must  the  those  be papers  the  basis  which  do  most of  mathematics of of  a  truly  describe  a  learning  valid  workable  the  model. learning  algorithms are not fully compatible with known cerebellar structure. In one  of  the  first  theories to utilize  current  knowledge of  cerebellar structure, Marr [28] proposes that the cerebellum  directs a  sequence of elemental movements to generate a desired action. That i s , the cerebellum of  Mossy  acts as a pattern recognition device, relating  fiber  activity  to  learned  outputs.  The  patterns  recognition  is  performed according to a "codon" (subset) of Parallel fibers which are active at a given time. Translating this into mathematical terminology, his proposal is that each Purkinje c e l l separates a binary hyperspace of Parallel fiber activity into linearly separable Purkinje  cell  is either active or  regions  inactive. The  in which the  orientation of  the  separating hyperplanes is learned by adjusting Parallel fiber-Purkinje c e l l synaptic strengths as a function of Climbing fiber activity. Albus [1,2] extended and modified Marr's theory by describing the cerebellum  in terms of Perceptron theory [32,34,44,45]. A  perceptron,  which typically consists of binary inputs, a combinatorial  network of  "association cells", adjustable weights, and a summing device is shown  10 in Figure 3.1.  Fig 3.1  In Albus'  A Perceptron  model, real-valued data,  such as a joint  angle,  is  converted to a set of binary signals in a manner similar to the mapping shown in Figure 3.2. along  He proposes that these  Mossy fibers before being  signals are  transmitted  recoded by the Granular  layer into  patterns of Parallel fiber activity which form an expanded set of binary signals. The  purpose of expansion recoding  is to map  a l l possible  patterns of Mossy fiber activity into sets of Parallel fiber activity which are  linearly  separable. Weights of  synapses between  Parallel  11  1  A  MF,  MF,  180  180  1 S  MF.  T  180  1-J MF.  1  180  J  MF.  180  1 -J  1-1 MF  MF  4  8  0.  0  JOINT  180  ANGLE  Fig 3.2  Albus  1  JOINT  180  ANGLE  Mossy Fiber Activity Pattern  fibers and Purkinje cells, Basket cells, and Stellate cells are adjusted under the influence of Climbing  fiber activity to obtain the desired  Purkinje c e l l output. The  effectiveness of  this  model  in performing  cerebellum-like  functions has been convincingly demonstrated by i t s a b i l i t y to learn to control a number of movements of a mechanical arm [2,3,4]. Unfortunately, Albus  1  model i s somewhat incompatible  with some  aspects of cerebellar structure and physiology. His expansion  recoding  scheme employs a hash-coding  function which seems to be beyond the  computational  Granular  powers of  the  layer.  Furthermore,  there i s  12 l i t t l e evidence to support his proposed mapping scheme of joint angle to Mossy fiber activity [22,23]. A model which i s somewhat similar to Albus  1  has been proposed by  Gilbert [17]. The model computes Purkinje c e l l activity as a weighted sum of Parallel fiber activity; the Parallel  fiber-Purkinje c e l l  and  Basket and Stellate cell-Purkinje c e l l synapses being modifiable under the influence of Climbing fiber activity. The theory i s consistent with cerebellar structure but does not describe the relation between the system's inputs, Mossy fiber activity, and Parallel fiber activity. In another theory, Kornhuber [24] emphasizes the capacity of the cerebellar cortex to act as a timing device. He  proposes that the  action potential velocity along long, thin, Parallel fibers provides a timing mechanism for the control of b a l l i s t i c motions such as saccadic eye movement. His model, although not rejecting i t , does not propose a mechanism by which learning could take place. Calvert and  Meno [11] use spatiotemporal  interconnections found  in the cerebellum may  analysis to show that cause i t to act as an  anisotropic spatial f i l t e r which enhances both the spatial and  temporal  details of Mossy fiber activity patterns. Their analysis assumes that a l l synaptic weights are determined  s t r i c t l y according to the type of  pre and post-synaptic neuron. Hence, the model does not account for learning. Hassul and Daniels [20] present a theory, along with supporting experimental cerebellum  evidence,  that at least  is to act as a higher  part of  order  the  function of  the  lead-lag compensator. They  further propose that this compensation provides stability for reflex actions. The  model does not  compensation might be learned.  propose a  scheme whereby the correct  13 A number of computer simulations of activity in a cerebellum-like network have been presented  [33,38,39,40]. A common prediction of these  models is that lateral inhibition and Golgi c e l l feedback should cause the surface of the cerebellum to contain long, narrow, bands of active Parallel  fibers. That  i s , Mossy fiber  activity  i s "focussed"  into  regions of Parallel fiber activity. A similar study includes peripheral feedback and muscle fibers in the model [37]. A common problem with assume a l l Parallel  these  computer simulations  fiber-Purkinje c e l l  i s that  they  synaptic weights are equal.  That i s , the models do not consider the effects of unequal synaptic weights which might result from the application of a learning process.  3.2 Requirements of an Improved Cerebellar Model  The above cerebellar theories form a good starting point to begin understanding  the cerebellum.  Unfortunately, each theory i s deficient  in at least one aspect. Furthermore, the theories tend to be mutually exclusive so that an a l l encompassing one is not easily synthesized as a combination of the strong points of each proposal. A common shortcoming of those models which do propose a learning scheme is their treatment of Parallel fiber activity as a binary quantity. In these models, learning operates to form linearly separable regions in a binary hyperspace of Parallel fiber activity. Although each action potential i s undoubtedly binary, the information carried by a nerve is generally accepted to be a function of the frequency of action potentials [53], not the presence or absence of one. An improved model must therefore deal with signals which are interpreted as real, not binary, variables.  14  To be truly valid, the operations of a cerebellar model must be consistent with those operations which are feasible to implement using the known structure of the cerebellum. In particular, the mapping of Mossy fiber activity to Parallel fiber activity must be consistent with the arrangement of cerebellar neurons, and the memory requirements of the learning algorithm must be consistent with the number of modifiable synapses in the cerebellar cortex.  15 IV  4.1  MODEL SIMPLIFICATIONS AND ASSUMPTIONS  Arithmetic Functions of Cerebellar Neurons  The basic assumption to be used in the following discussion is that cerebellar information processing may mathematical operators  be  interpreted as a number of  acting upon real-valued data.  That  i s , data  which is translated into a sequence of action potentials whose average frequency  i s a function of some biologically significant variable, is  processed, according to common mathematical operations, by cerebellar neurons. It has frequency  been shown that neurons and  neural networks, using  coded data, can perform basic arithmetic operations  subtract, multiply, and frequencies [8,43].  divide)  The  upon pre-synaptic  operation  performed  by  a  (add,  action potential particular  neuron  depends upon a number of factors including the location of the synapses relative to the post-synaptic neuron's c e l l body, the nature of each synapse (excitatory or inhibitory), the level of inherent inhibition or excitation  of  the  post-synaptic  neuron  (ie. the  level  of resting  activity or inhibitory threshold), and  the relative activity of each  pre-synaptic  presented  numbers to  neuron. The represent  data  model to be and  will  a selection of the  thus use  real  above arithmetic  operations to represent the functions of relevant neurons. The  frequency  at  which  action  potentials are  transmitted  is  obviously a non-negative number. This restriction upon the values to be represented  will  be  relaxed as  i t is equally obvious that a  fixed  positive bias can be superimposed upon any bounded negative variable in  16 order to guarantee a net positive value. Similarly, although the nature of a particular synapse (excitatory or inhibitory) is fixed, either a bias, or an arrangement of interneurons can be established to permit a synapse  to  have  a  net  effect  which  is  either  positive  or  negative [2,17], It i s useful to regard the cerebellar cortex system as a form of associative memory [16,25,27], In such an approach, a set of memories (the firing frequencies of the Purkinje cells) is associated with a set of inputs (the activities of the Mossy and/or Climbing fibers). Due to their remarkable specificity to the output cells, i t seems most likely that Climbing fibers are equivalent to "data" lines while Mossy fibers are equivalent to "address" lines. The only other possible arrangement, Mossy fibers providing data and Climbing fibers providing the address, is exceedingly unlikely due to the dispersion of information between Mossy fibers and the system's output, Purkinje cells. The structure of the Mossy fiber pathway of the cerebellum suggests a  number of possible formulations  for the value  of  Purkinje  cell  activity (system output) as a function of Mossy fiber input. These are: a sum of sums, a sum of products, a product of products, and a product of sums. A moment's reflection will indicate that a sum of products i s the most likely form for representing non-linear functions of several variables since both a product of sums and a product of products are always zero whenever any one of the primary terms is zero (either due to coincidence or to a faulty transmission line). Similarly, both a sum of sums and a product of products would not justify the two-stage structure of the system since a single neuron could perform the same computation. Furthermore, Taylor series expansions of functions of several variables show that any function can be adequately expressed as a sum of products,  17 providing  enough terms are used. The  choice of  formulating  system  output as a weighted sum of products is not new as i t has been suggested by other workers, and i t seems highly plausible, that each Purkinje c e l l effectively computes a weighted sum of the activity of Parallel fibers which synapse with i t [2,17,28,32]. That i s ,  f = £>i(Di  (4.1)  where f =Purkinje c e l l activity w  . i t h synaptic weight =  (Inactivity of the i  t  n  Parallel fiber.  Learning is assumed to take place by adjusting the synaptic weight set {W} as a function of Climbing fiber activity [2,17,21,28], The function of cerebellar interneurons (Granule, Basket, Stellate, and Golgi cells) in the model must now be considered. This model will take  the same approach  as a number of previous workers  who  have  suggested that Basket and Stellate c e l l s act to permit the net effect of some W J to be negative and to permit both increasing and decreasing wj [2,17,28]. (The nature of a synapse, excitatory or inhibitory, i s fixed.) A simple and reasonable model is that Basket and Stellate cells compute an unweighted sum of the activities of the Parallel fibers with which they synapse. Since Basket and Stellate  cells are  inhibitory,  this sum is then subtracted from Purkinje c e l l activity. That i s ,  (4.2)  c = effective (constant) synaptic connectivity between Parallel fibers and Purkinje c e l l s via the Basket and Stellate c e l l route.  18 Re-arranging (4.2) yields  f =5 (Pi-c)Oi.  (4.3)  Letting  w  i = Pi  _ c  4  '  4  (-)  will result in wj which may be positive or negative, depending upon the value of p j . This approach does, however, place a lower limit the value of every  (-c)  on  w^.  It should be noted that the dendritic tree of each Basket and Stellate c e l l  is less widespread than that of a Purkinje c e l l . This  arrangement reduces the volume of cortical cells by permitting a single Basket or Stellate c e l l to influence a number of Purkinje cells, despite the fact that each Purkinje c e l l has synapses with a slightly different subset of Parallel fibers. As for Granule cells, i t has been suggested above that (4.1) must represent a sum of products. This implies that each Granule c e l l forms a term which is the product of the activity of the Mossy fibers which make contact with  i t s dendrites. To  account  for Golgi  inhibition,  Granule cells will be modeled as the two-part cells shown in Figure 4.1. The f i r s t part computes the product of the activity of relevant Mossy fibers while  the  second  part  subtracts Golgi  cell  inhibition). The maximum number of different products  activity  (Golgi  is given by the  number of a l l possible combinations of the set of input variables. The set of those products which are actually used in (4.1) determined general,  in following  sections)  will  be  represented  (a set to be by  {TT}.  In  {TT} contains more elements than the number of Mossy fibers  which enter  the cerebellar cortex. The  set w i l l  therefore also  be  19  PARALLEL MOSSY  FIBER  FIBERS MULTIPLIER GOLGI  Fig 4.1  FEEDBACK  Model Granule Cell  referred to as the "Granule expanded  input set". Such an expansion i s  consistent with the large Mossy fiber-Granule  cell  divergence shown  previously in Table 2.1. Since Golgi cells have two distinct dendritic trees, and since the upper tree i s much more dense [12], i t would seem that Parallel fiber synapses are more significant to the Golgi system and that Mossy fiber synapses are merely an improvement, possibly to reduce inherent delays  of the  system.  This  assumption  does  tend  to  time  simplify the  following analysis but is not an essential ingredient of i t s basic principles. In any case, the Parallel fiber-Golgi c e l l system w i l l be  20 modeled as a negative feedback network of the form  $ = TT - G(D  (4.5)  where G=Golgi feedback matrix Tf=<5ranule expanded input set.  Once a steady-state condition has been reached, Parallel fiber activity can be expressed as (p = ( I + G ) - i r .  (4.6)  1,  This can be re-written as 0  = 9T  (4.7)  where 9=(I+G)- . 1  (4.8)  Substituting (4.7) into (4.1) and re-writing using vector notation, the system can be expressed as T  f = W(D =  4.2  WT0Tr.  (4.10)  Cerebellar System Considerations  The using  (4.9)  cerebellum  functions as  visual, tactile, and  other  a  peripheral controller.  forms of  levels of the brain "teach" the cerebellum  sensory  That i s ,  feedback,  higher  to control desired motions.  This procedure proves advantageous by relieving higher levels of these control tasks once the motions have been learned. To be most efficient,  21 the cerebellar system should be designed to learn the control parameters as they are being generated by the higher levels. In such a scheme, whenever the estimation error is deemed to be excessive, higher levels simultaneously correct the body position and adjust the weight set so as to reduce the estimation error for a subsequent estimation at that point in the control hyperspace. The existence of synapses between Climbing fibers and Purkinje cells, and between Climbing fibers and Subcortical Nuclear  cells adds support  to this proposal  proposed that, during training, Climbing  as i t may  be further  fiber activity overrides that  of Purkinje cells in such a way as to f i r s t correct the body position and  to then reduce the Purkinje cell's estimation error. Also, the  cerebellum  should  be most  motions, thus minimizing controlling  effective  in controlling  the most  usual  the amount of time spent by higher levels in  body motions. That  i s , since  (4.1) really  estimates  a  control function, the estimation error should be least in those regions which are used most frequently. Referring to equations  (4.1) and  (4.4) and the figures showing  cerebellar structure, i t i s apparent that the only variable elements in this model of the cerebellum  are the weights of synapses between  Parallel fibers and Purkinje cells. (Basket and Stellate cells may also have modifiable  synapses without  significantly changing  the model.)  These variable weights are considered by this model as the set {W}. From a different viewpoint,  this means that the only stored  (memory) of the system are the current weights. This fact prevents  the use of standard  matrix  inversion techniques  values  completely in solving  (4.1). It is therefore apparent that the system must learn by the use of an error correcting algorithm inversion in an iterative manner.  which effectively  performs  matrix  22 A well known property of animals, particularly mammals, i s their ability to adapt to changes in their size, strength, and environment. Therefore, the mammalian motion controller, the cerebellum, must be able to  modify  i t s control  parameters  development. When considering  at any stage  the cerebellum  in the animal's  as a learning machine,  this means that the system must be able to learn any number of training points or patterns which may be presented at any time. A final consideration in any mathematical model of the cerebellum is the number of terms required in the summation given by (4.1). Motion co-ordination would seem to involve non-linear independent  variables [42]  (sets  of  current  functions of numerous positions,  current  velocities, desired positions, desired velocities, etc). Each Purkinje cell must therefore generate a function of several variables; the larger the number of variables, the better the co-ordination.  Unfortunately,  increasing the number of variables rapidly increases the number of terms required. As the model must not require more terms than a Purkinje c e l l is capable of adding, an upper limit of approximately 200 000 i s imposed upon the number of terms in each estimation  function as this i s the  largest estimate of the number of Parallel fibers which synapse with a single Purkinje c e l l .  23 V  MATHEMATICAL ANALYSIS OF THE CEREBELLAR SYSTEM  5.1 The Cerebellum and Estimation Functions  The  nature of the cerebellar system  i s suggestive of several  properties of those mathematical procedures which will be referred to here as "estimation  functions". These procedures include the related  theories  fitting,  of  curve  theories are discussed  regression,  and  interpolation. These  in numerous books and papers, including [19],  [35,36], and [41] respectively. Estimation procedures typically operate to minimize an expression of the error between a target function and an estimated function. The estimated function i s generally expressed as a weighted sum of basis functions  f=^w (D (H) i  (5.1)  i  where H = a vector of one or more independent variables  which is essentially the same equation as (4.1). There are two significantly different approaches to finding f which are reflected in the form of {(])(H)} in (5.1): those in which {$} i s a set of functions which are continuous (and whose derivatives are a l l continuous) over the hyperspace of definition, and those in which {(*)} is a set of functions which are piecewise continuous (or whose derivatives are piecewise continuous). An advantage of the latter approach i s that it  simplifies  the computations,  inversion, required  to generate  which  invariably  involve  {W}. Piecewise continuous  matrix  functions,  24 such as splines, can be arranged so that the matrix is banded, thereby making  the  inversion  process  simpler  and  less  prone  to  numerical  (round-off) errors [41]. The  only  restriction  independent set of  upon (0)}  functions  is that  i f the  i t must be  a  resulting weight set  linearly i s to  be  unique. That i s , the Granule expanded input set {TT} forms a basis set 1  which spans a function space. If the elements of {H} independent, some of them may  are not linearly  be deleted, resulting in a reduced set  which spans the same function space and is linearly independent. On the other hand, the practical advantages of improved system accuracy and r e l i a b i l i t y may  result from a system in which Parallel fiber activity  forms a set of functions which are not  linearly independent [15,47].  Although the following derivations will assume linear independence, this condition may be relaxed in some cases. The  choice  of  the  form of  {($}  (ie. polynomials,  trigonometric  functions, exponentials, etc.) and of the number of elements in the set is a matter of judgement, dependent upon any known properties of  the  function which is to be estimated. The  above  developed either  mentioned  estimation  for functions  of  techniques  have  usually  a single variable or  functions of several variables. Extension to non-linear  been  for linear functions of  several variables is often not as straight-forward as one might expect. (See  Chapter 5  of  Prenter [41]  for  a  discussion  of  interpolation  functions of several variables.) A particularly severe problem is to ensure the continuity of functions which are interpolated with piecewise continuous interpolation functions. The usual solution to this problem is to form {(J)(H)} as the Cartesian product of interpolation functions of single variables. As an example, for interpolating a function of  two  25 variables, one could use <Dk(x,y) = 0i(x) -(Dj(y)  (5.2)  where k = j+m(i-l) m = the number of elements in {(D(x)} . An  important disadvantage of this  approach  i s that  the number of  elements in {0(H)} grows rapidly as the numbers of independent variables and elements in {0(x)} increase. That i s , n  L = m  (5.3) where L = the number of elements in {(J)(H)} n = the number of independent variables.  Fortunately, the extension, to several variables, of continuous interpolating functions i s not similarly restricted. The space spanned by a continuous set {0(x)} may be extended to {(|)(H)} as a "reduced Cartesian product" in which only those terms with a total order which i s less than or equal to some maximum value are retained. For example, the set  {(p(x)} = {1, sin(x) , sin(2x)}  may be extended to two dimensions as a f u l l Cartesian product requiring 9 terms,  {(|)(x,y)} = {1, sin(x), sin(2x) , sin(y) , sin(2y) , sin(x)sin(y) , sin(x)sin(2y), sin(2x)sin(y), sin(2x)sin(2y)}  26 or as a reduced Cartesian product requiring only 6 terms. (0(x,y)} = {1, sin(x), sin(2x) , sin(y) , sin(2y), sin-(x) sin(y)} For a reduced Cartesian product, the number of terms required i s given by L = (m-l+n)1 (m-1)!n!  (5.4)  Where m = the number of elements in the single variable set which i s usually system order + 1. The effectiveness of using an extension based on the reduced Cartesian product i s clearly demonstrated by Table 5.1 which compares the number of terms generated by f u l l and reduced Cartesian product expansions for various values of m and n. It i s particularly interesting to note the values of n and m which generate approximately 175 000 terms, a number which corresponds to the number of Parallel fibers which synapse with each Purkinje c e l l . A somewhat different approach to estimating variables  functions of several  has been taken by Specht [50]. Based  discriminant  functions [36],  he  develops  on an analysis of  polynomial  discriminant  functions which can be trained to distinguish between a number of patterns. This theory has been extended to include general  regression  surfaces [29,51] which can be interpreted, with a suitable change in normalization, Unfortunately,  as  generalized  the training  functions  of  procedure requires  arbitrary "smoothness" factor and pre-determining  several  variables.  the selection of an the fixed number of  points in the training set. Also, since the technique i s based upon approximating a Gaussian function at each point in the hyperspace of definition,  a large  number  of terms must  be used  throughout the  27 procedure.  Table 5.1 Number of Terms Generated by Full and Reduced Cartesian Products  n  m  FULL PRODUCT  REDUCED PRODUCT  3  2  8  4  3  4  64  20  3  5  125  35  6  7  117 649  924  7  5  78 125  330  8  5  390 625  495  15  8  3.52-10  42  5  2.27-10  44  5  5.68-10  100  4  1.61-10  13  170 544  29  163 185  30  194 580  60  176 851  5.2 An Optimized Learning Algorithm  Previous sections have stated a number of constraints under which learning i s assumed to take place in a cerebellum-like system. These are: 1.  f = W (D,  2.  the system has no memory other than of the current values  T  of synaptic weights {W}, 3.  the system must operate in an error correcting mode,  (5.5)  28 4.  the order and number of training points neither needs to be fixed nor pre-determined, and  5.  the basis set, {$}, can be considered as a set of linearly independent functions of the form 0 = 0TT  (5.6)  where {TP} is a reduced Cartesian product expansion of the input set.  A  traditional  approach  to solving  estimation  problems  i s to  minimize the weighted mean-square estimation error over the domain of the input variables. J = J s(H) • (WTctKHJ-f (H) ) -dH H 2  (5.7)  where H i s the hyperspace of definition f i s the target function s(H) i s the error cost density function. Then  dJ = 2^ s(H)<D(H)'(d)T(H)W-f (H) )dH = 0 d-W "  (5.8)  H  yielding  W =  Since evaluating  T  s(H)(D(H)(t) (H)dH )~1 • $ s(H)0(H)f (H)dH H H  (5.9)  the cerebellar structure shows no apparent mechanism for (5.9) in a single operation,  the problem i s to select an  iterative scheme which i s both compatible with the given structure and converges  toward  the optimum  weight  set in a minimal  number of  iterations. If the system operates to correct errors, can only store  29 the  current  values  of the weight  s e t , and the order  and number of  training points i s not known, i t would seem reasonable that {W} be changed, thus reducing  the estimation  e f f e c t , o f the changed estimation space.  This  scheme w i l l  tend  should  error, so as to minimize the  function, on a l l other  to minimize  the number  points i n the of  iterations  required to obtain a good approximation of the target function. Let  A f ( H i ) = f(H!) - f(Hi)  (5.10)  be the estimation error at some point, H i . Then, adjusting  the weight  set so that  AwTdXHiJ = A f ( H i )  (5.11)  w i l l r e s u l t i n an error correcting function  Ae(H,H]_) = AW 0(H) T  (5.12)  which i s superimposed upon the function which existed before the change. In order  to minimize the e f f e c t s o f t h i s superimposed error correcting  function, use  J = J s(H) A e d H + v(AW (t)(H )-Af (Hi)) . H 2  T  1  Standard optimization techniques can be applied to (5.13).  dJ dAW dJ dv  T  = 2\s(W) AWdH  = ^ ( H i ) AW  + (D(Hi)v = 0  - Af(Hi) = 0  This can be re-written as a matrix  equation.  (5.13)  30 T  2^sO0)dH  d)(Hi)\  (P (Hi)  0j  T  /AW\  / 0  \  I Af(Hi) i  vj  \  In order to simplify the notation, let T  P = ( s<D(DdH. H  (5.15)  J  Applying row reduction techniques to (5.14) gives an expression for the optimal adjustment, AW.  AW = AftHiJP-^OKHi.)  (5.16)  ^(HiJP-idXHx)  At this point i t is useful to look at some of the properties of P. Lemma 1. P is symmetric. T  T  T  pT = ($s00 dH) = $s(dXD )TdH T  = jJs0(DdH = P.  QED.  Lemma 2. P is positive definite. Consider  T  Y PY  where Y is an arbitrary vector.  Then T  T  T  Y PY = Y (jJs(t)0 dH)Y T  = JsY%D YdH T  2  = Js(Y 0(H)) dH Since {0} is a set of functions which are linearly independent over H, the only vector for which YTcD = 0 for a l l H  31 Thus T  Y PY > 0 for a l l non-zero Y (s(H) is positive by definition) and P i s positive QED.  definite. •  11  Lemma 3. P  exists  for a l l real valued  n, i s unique, and i s  symmetric. It i s well known that the eigenvectors of a positive definite, real symmetric matrix are real and positive. Thus, let P = XLX  T  where X is an augmented matrix whose columns are the eigenvectors of P and L is a diagonal matrix whose entries are the eigenvalues of P. Then n  P° = XL X  T  (5.17)  and (P")T = (XL"xT)T n  T  n  = XL X = P . In particular this proves the existence of P~l.  Lemma 4.  QED.  The error correcting function, Ae(H,Hi) is invariant for non-singular linear combinations of the basis set and i s thus a unique function of the region  (H), the function  space spanned by {0}, and the estimation error  Af(Hj).  The error correcting function for the space spanned by {(J)} is given by Ae(H,Hi) = Af(Hi)0 (H )p- ($)(D(H) 1  T  1  0 ( H ) p - (0)0 (Hi) T  1  1  (5.18)  Consider Ae(H,H eO) lf  where 9 is any non-singular matrix Ae(H,H eq» = Af(Hi)-(D (H1)eTp-1(e(t))-ecD(H) T  lr  0  but  (Hi) -eTp-1 (0(D) 90 (Hi)  T  POO) = JseO(H)(D (H)e dH T  T  = ep((D)eT  and p-1(9(D) = (eT)-1?"1 (0)e-i.  Thus T  1  Ae(H,H 00) = Af0 (Hi)P" (0)0(H) lf  0 T (H 1 )P-1 (0)0 (Hi)  = Ae(H,H 0) . lf  Lemmas 3 and 4 suggest a useful simplification, i 0 = (( sTTTrTdH)-^  = P-£(TT)  Since 0(H) = ©IT (H) . Combining (5.19) and (5.20), one obtains P(0) = J( s9TrirT0 dH T  H  = 0P(T)0T  = p-?P( -C)T P  =l  Thus  AW = Af(H )0(H ) 1  1  OTfHnWH!)  m  33 and  Ae(H,H  T  1;  = Af(Hi)(D (H )(D(H)  (5.22)  1  T  0 (Hi)(J)(Hi) In this way, P Equation  _ 1  may be deleted from (5.16).  (5.21) results  estimation error  in a scheme which  tends to reduce the  for a given function. Further analysis of the set  generated by (D = P(Tr)-|'ir  (5.23)  indicates an even more efficient approach, employing the properties of orthonormal functions.  Lemma 5. The set of basis functions given by (5.23) is orthonormal over the weighted region. Consider  r j j = (Oi(H) 0j(H)) f  = J s(Di(H)(Dj(H)dH H Then, R, the matrix whose entries are r j j can be written R = ( s(txDdH T  = P.  It has been previously shown that P(0) = I for 0 given by (5.23).  34 Thus (0 i f i * j 1 (l if i = j  rij =  and  (5.24)  {0(H)} i s an orthonormal s e t over  As a f u r t h e r improvement o f  the weighted r e g i o n .  (5.21).  QED.  Consider  AW = A f (Hi )fl)(Hi)  (5.25)  k  where MAX[(D (D] «  1.  T  k  Let  the  density  weight  s e t be  adjusted  at  a  total  of  k  points  with  a  point  u(H). s(H)  The  sequence  u(H)  =7  of  AW's  (5.26)  Js(H)dH at  successive  weights a r e i n i t i a l l y z e r o , w i l l AWi  AW  =  2  points,  be  ffHiXMHi),  = f(H )0(H ) - f(H!)(J) ^2)0(^)0^2), 2  T  2  k AW  learning  k  2  = f ( H ) 0 ( H 3 ) - f(H )(P (H )(p(H2)(J)(H3) T  3  2  3  k  k  3  2  4- f ( H ) ( D T H ) ( D ( H ) ( D T ( H ) ( D ( H ) ( ] ) ( H ) , 1  k etc.  3  (  3  2  2  1  3  assuming  all  35 Yielding W =  5 AWi k  = _L  If(Hi)(D(Hi) k i=l k  --L_ k  2  If(H _ )(DT H )(D(H _ )(l)(H ) + ... i=2 i  1  (  i  i  1  (5.27)  i  Since the points have a density given by u(H), the f i r s t term of (5.27) is an approximation of the weighted integral  (5.28) - _ L l f (Hi)O(Hi) .  (5.29)  k  Comparing  (5.29) with (5.9) and applying Lemma 5, i t can be seen  that, after k points, the weight set approximates the optimal set. Continuing the procedure with additional learning points w i l l  improve  the estimate as long as  Afi  - Afi-i-  Similarly, a sequence of learning steps at k points w i l l weight set so that  any  resultant  estimation  error  adjust the  i s approximately  orthogonal to a l l elements of {(*)}. That i s , the error is minimized and {W} approximates the optimum set. Unless the estimated function can be made to be identical to the target function and the training data is noiseless, the values of the elements of {W} will change at each point as learning is applied. That i s , {W} will not in general converge in the set.  usual sense, but will tend to fluctuate about the optimum weight The  sequence  of weight  sets w i l l ,  however, have a  point  of  36 accumulation at the location of the optimum set. It  i s important  to note  at this  point  denominator of equation (5.21) with a constant,  that  replacing the  k, results in error  corrections which are un-normalized. That i s , despite applying to  correct  an estimation  error,  an error  may  still  (5.21)  exist. This  variation of the learning algorithm, which w i l l be discussed further in the following chapters, requires that an additional element be added to the system equations. Specifically, the system equations become  T  f = W<t) f = f + Af-g(t) Af  (5.30)  = f-f  Aw = Af<D k where f = the estimated function (Purkinje c e l l output) Af=system error term f = system corrected output (Subcortical Nuclear c e l l output). g(t) =1 and g(t+At) = 0 That i s , the system requires an element which evaluates prior to any weight adjustments being  (f+Af)  made, holds this value,  then  permits i t to decay to the new value of f. Various procedures are available to permit the learning procedure to be halted. The most direct approach i s to require learning only i f the obtained  response (either the estimated function or some physical  response of the system) deviates sufficiently from the desired response. This in effect replaces the single valued target function with a band of acceptable  estimations.  37 That is if| Afl * e  0  (5.31)  AfQ i f I A f l > e !  k  The learning algorithm i s also effective even i f the distribution of points  i s not known a p r i o r i .  If possible, the expected  point  density can be estimated, or i f not, i t may be replaced by a constant (over  a closed  reqion)  when deriving the basis set. Each time the  weights are adjusted, an error correcting function i s superimposed upon the previous function. The learning algorithm ensures that interference effects at each point are minimized at a l l other points in the learning hyperspace. This approach will result in a sequence of weights which tends to converge presented and  toward  the optimal  set. Other  researchers  have  arguments showing that similar devices such as /Adalines [10]  Perceptrons [32] exhibit strong  convergence tendencies.  /Although  not optimized, these devices are similar to this one in that they also approximate functions as weighted sums. Regardless of the known and unknown parameters of the system, i t i s necessary to select k in (5.25). There are two opposing considerations: stability and learning rate. Larger values of k reduce the magnitude of perturbations, due to points with low probability or high noise, of the estimation  function, while smaller values of k reduce the number of  points required before the estimation function begins to approximate the target function. Unless very rapid learning i s required, k should be determined to ensure that the fluctuations in f are acceptably small.  38 That i s , with re-arrangements to (5.25) MAX[Ae(H,H!)] = MAXfAf ( H ) C D ( H T J O ( H ) ] .  (5.32)  T  X  k  Thus, k must be sufficiently large so as to ensure that the effect, of an adjustment at a single point, i s less than €• over a l l points in the space.  T  k > MAX [ A f (Hi) (|) (Hi) (]) (Hi) ]  (5.33)  e To guarantee that continuous iterations at any single point will  be  stable and converge to generate f(Hi) exactly (pointwise convergence), 0 < (D (Hi)(D(Hi) < 2. T  (5.34)  k  The next chapter will present examples which demonstrate the effects, upon the learning system, of varying the values of k and €.  5.3  System Properties  The preceding section has developed a system which can learn to approximate, functions  as  with  minimum  linear  mean-square  combinations  of  estimation an  error, arbitrary  orthonormal  set of  basis  functions. The system uses an iterative error correcting strategy in  39 which the weights are adjusted at each stage according  to the expression  (repeated here f o r convenience)  AW = AfJHiJCKHTj . k  The factor k depends upon:  and  1.  the basis set {0} chosen,  2.  the required precision of the f i n a l estimation function,  i s a constant which may be computed i n advance and/or altered at any  time. Concerning property  the  i s that  system's  operating  the resultant  characteristics,  estimation  function  a  useful  i s capable o f  estimating a target function with an error which i s l e s s than the error correcting mechanism can detect. mechanism being  rather inexact  In other  words, despite  in i t s ability  error, the f i n a l estimation  function w i l l ,  precise  the  approximation  statement, corrected detectable)  consider whenever  a  of  target  learning  the estimation  tolerance, G, as given  error by  to detect or correct an  i n general, be a much more  function.  system,  the t r a i n i n g  To  the weights exceeds  an  support  this  o f which are acceptable  (5.31). If the function  (or space  spanned by the basis functions i s such that the tolerance of G can i n fact be met for the whole hyperspace, then the resulting estimation  will  l i e within a region bounded by  f(H) -G < f (H) < f(H) +G  as  shown  i n Figure 5.1. For such  a bounded, continuous,  (5.35)  estimation  function  |f(H!)  - f t f i T j  | «  G  (5.36)  40  f (H)  ^ -—  -  e  H  Fig 5.1  A Bounded Estimation Function  over much o f the region, H. Hence, f produces a better terms of the desired r e s u l t , than could detecting/correcting  mechanism  alone.  be obtained  A  somewhat  estimate, i n  using  the error  similar  property,  c a l l e d "learning with a c r i t i c " , has been described for an Automatically Adapted Threshold Logic Element (Adaline) Adeline,  i s taught  the optimal  [54]. In that experiment, the  strategy  blackjack s t r i c t l y on the basis of whether  f o r playing  the game of  i t wins or loses a game.  That i s , despite the fact that the error detecting mechanism determines that the estimation  function i s i n error i f , and only  i f , a game i s  l o s t , the Adaline learns to generate the function which optimizes the  strategy for winning games.  42 VT  6.1  General  SYSTEM PERFORMANCE  Considerations  The previous chapter has derived an algorithm which minimizes the interference of learning a r b i t r a r y functions, when using an i t e r a t i v e , point-by-point  algorithm,  i n a cerebellum-like  system.  This  chapter  w i l l demonstrate the effectiveness of the algorithm. The  d e r i v a t i o n of the learning algorithm  i s independent  of the  function-space which i s defined by the basis set. Thus, the algorithm guarantees  optimal  performance  (for that  space)  independent set o f functions {<$}, which s a t i s f y which  basis  implementation most  likely  set  to  use  is a  matter  of  f o r every  linearly  (5.19). The choice o f selecting,  subject  to  constraints, that function-space and basis set which are to match  the function  or  functions  to be  estimated.  Multinomials are a reasonable choice f o r a large number of applications, and are the functions which w i l l be used i n the examples presented i n t h i s chapter. That i s , the terms to be used are those generated by n (1 + £ x i )m. i=l  (6.1)  Application of Lemma 4 permits the binomial c o e f f i c i e n t s to be dropped from each term, y i e l d i n g  2  T T = (l,x ,x ,...,x ,xi ,..,x m) 1  and  2  n  <D = (C sTTH^dH •'H  Due to the absence of other constant i n the following examples.  n  TT  information, s(H) w i l l  (6.2) (6.3)  be set to a  43 Some t y p i c a l ,  normalized, error  correcting  shown i n F i g u r e 6.1. I t c a n be seen t h a t resemblance The on  t o normal  probability density  f i g u r e s a l s o show t h a t low-order  polynomials  functions,  these  Ae(H,Hi,0) , are  functions  functions  with  l i m i t a t i o n s imposed by u s i n g result  i n error  correcting  bear  a  mean  loose of Hi.  a system based functions  whose  maxima a r e o f t e n "skewed" o r s h i f t e d from H]_. In regard listed  t o {Tf}, i t should  be noted  that  any number o f t h e terms  i n (6.2) may be d e l e t e d , w i t h o u t j e o p a r d i z i n g system performance,  i f t h e c o e f f i c i e n t s o f those terms, i n t h e t a r g e t  functions,  a r e known  to be z e r o . Another  important  parameter  which  was d i s c u s s e d  i n the previous  c h a p t e r i s k which must be s e l e c t e d i n r e l a t i o n t o  b ( H i , H ) = MAX[(D (H )0(H )] T  2  as g i v e n  by (5.31),  (5.32),  1  2  (5.33). F o r any b a s i s  (6.4)  s e t {(*)}, b w i l l  be  maximum a t some p o i n t , H5, where  | 0 ( H ) | > |<D(Hj)|  (6.5)  b  for  a l l Hj 7* H , b  since  <D (Hi)0)(Hj) = 10(Hi) I |<D(Hj) Icose T  (6.6)  where e i s t h e "angl e" between the v e c t o r s 0(Hj) <D(Hj) -  and  Thus  b = (D (H )(D(H ) = |(p(H ) | c o s 0 • T  2  b  b  b  2  = IO(H )I . b  (6.7)  44  45 Values of b=MAX[O 0] for Various Values of m and n T  Table 6.1  n  m  b  1  2  4  1  3  1  Average*  n  m  b  Average  2.0  2  3  26  6.9  9  3.1  2  4  70  12.3  4  16  4.2  2  5  155  20.0  1  5  25  5.3  3  4  190  32.3  1  6  36  6.4  3  5  553  67.0  •Average: approximate average value of  The  location of Hb and  <1)0 T  the value of b are properties of {(J)}. Table  6.1  shows that, for multinomial-based functions, b grows r a p i d l y as m and increase. This  in turn means that  k,  and  hence the  number of  required to obtain acceptable estimations, must be s i m i l a r l y  6.2  points  increased.  Examples of Learning a Function of a Single Variable  In order and  to demonstrate the operation  of the  training  algorithm,  i t s a b i l i t y to estimate functions, the function  f = sin ( 2 T x )  (6.8)  over  was  n  (0 <x<  1)  "taught" to a number of computer-based models of the  target  function  functions,  is  shown  in  Figure 6.2.  i n a least-square-error  shown i n Figure 6.3  sense, to  The the  system.  optimal target  for polynomials of various orders.  The  estimation function  These  are  functions  46  Fig 6.2  Target  Function  Sin (2TTx) over  (0<x<l)  were calculated using the optimizing formula derived as equation  (5.9).  Chapter 5 suggests that, as learning progresses, the estimation function grows,  then  settles,  toward  the optimal  estimation  of  the target  function. This sequence of functions i s shown i n Figure 6.4. For large values learned  o f k, C=0, estimation  and many t r a i n i n g functions  functions shown previously.  can  points,  closely  Figure 6.5  approximate  shows the  that  optimal  47  C) m=6,7  RMS ERROR=0.0050  EMAX<=0.016  d ) m=8,9  RMS ERROR=0.00019  EMAX=0.00066  Fig 6.3 Optimal Polynomial /Approximations o f Sin (2TTx)  48  0.0  0.25  0.5  0.75  1.0  0.0  C) IS POINTS  F i g 6.4  0.25  0.5  d) 40 POINTS  Sequences o f Estimations o f Sin (2TTx) For m=4, k=8  0.75  1.0  F i g 6.5  Near-optimal  Estimations  of  Sin (2TTx)  50 Once the order of the estimation system has been selected, values of k and G must be determined. The e f f e c t s , on the learning system, o f varying k and G are shown by various calculated measures: 1.  EMAX,  the maximum  observed  estimation  i n the course  Af(Hj),  error,  o f processing  which was  the preceeding  J  points, 2.  CHANGES, the t o t a l  number  o f times  the weight  set  has  required adjustment, 3.  ERMS, an estimate  o f the RMS estimation  error over the  preceeding J points J ERMS = (1 £ A f ( H i ) )<, J i=l 2  4.  SF, a measure  (6.9)  o f the perturbations i s calculated  o f the estimated  function,  which  as the change  over the  preceeding  J points, i n the values o f the elements o f the  weight set (SF = s t a b i l i t y factor)  T  SF = (_1 (W -W _ ) (W -W _ ) )£ L i  where  i  J  i  i  (6.10)  J  i s the current weight set Wi_j i s the weight set p r i o r to any adjustments resulting from applying the learning algorithm at the previous J points,  5.  POINTS, observed.  the t o t a l  number  o f points  which  have  been  Table 6.2  The Learning Algorithm as Effected by k and G  Estimating  Sin (2Tx)  for  0<x<l  m=4, J=100  a)  Effect of k  €=0.0  u(H)=s(H)=uniform  POINTS  k  EMAX  ERMS  100 200 400 600  10  0.76 0.29 0.25 0.34  0.19 0.077 0.087 0.093  0.36 0.059 0.036 0.035  100 200 400 600  50  0.86 0.19 0.21 0.27  0.36 0.080 0.070 0.077  0.30 0.042 0.0086 0.012  100 200 400 600 800  100  0.91 0.37 0.17 0.23 0.22  0.46 0.19 0.074 0.074 0.069  0.21 0.084 0.0012 0.0063 0.0045  b)  Effect of G  SF  k=100  u(H)=s(H)=uniform  POINTS  g  CHANGES  EMAX  ERMS  SF  200 400 600 800  0.10  153 224 253 262  0.37 0.17 0.15 0.11  0.19 0.079 0.077 0.074  0.080 0.015 0.0090 0.0012  200 400 600 800  0.15  133 178 184 184  0.37 0.20 0.16 0.15  0.20 0.095 0.098 0.088  0.074 0.0093 0.0043 0.0  52 These measures are used since they show important properties of the learning system, are r e l a t i v e l y easy to calculate  i n conjunction with  applying the learning algorithm, and automatically take into account the e f f e c t of non-uniformly d i s t r i b u t e d training points. It w i l l though, that due measures  tend  to the training points being to  oscillate  rather  than  be  noted,  randomly generated, decrease  the  monotonically.  Table 6.2 demonstrates the following trade-offs i n r e l a t i o n to k and G: 1.  increasing required  k  increases  before  the  the  system  number begins  of to  training  settle  points  toward  the  optimal estimate, 2.  once the estimation approaches the optimum, larger values of k reduce the magnitude of perturbations as indicated  by  smaller values of SF, 3.  smaller values of G produce estimation functions which tend to have less RMS error, and  4.  larger values of G tend to reduce error at the cost of larger RMS  Chapter 5 also  predicts  the maximum estimation  errors.  that, although  i t s performance  will  be  sub-optimal, the learning algorithm w i l l continue to function i n cases where the point density i s not which  was  Although  used  to  Table 6.3  disadvantage  generate does  of t h i s  equivalent to the error  the  support  basis this  set, as prediction,  cost density  required  by  i t also  approach since many more training  (5.26). shows  points  a  (1500  versus 1000 t r a i n i n g points) are required before the estimated function approaches the optimum.  53 Table 6.3  Learning Sequences for Cases Where u(H)^s(H)  Estimating  Sin (2TTx)  for  0<x<l  6=0.0, k=100, J=100 , u(H) =uniform :  i s gaussian (mean=0 .5, variance=l.0)  b)  6.3  POINTS  EMAX  ERMS  SF  200  0.66  0.39  0.12  400  0.34  0.19  0.056  600  0.18  0.11  0.026  800  0.19  0.074  0.0095  1000  0.17  0.069  0.0058  s(H) i s gaussian (mean=0.5, variance=0.5)  POINTS  EMAX  200  0.81  0.51  0.12  400  0.58  0.34  0.077  600  0.40  0.25  0.053  800  0.29  0.16  0.033  1300  0.15  0.081  0.012  1500  0.13  0.067  0.0062  ERMS  SF  Soft Failure  An  interesting,  and p o t e n t i a l l y u s e f u l ,  when assessing the value o f t h i s learning  factor  to be considered  algorithm i s i t s property o f  s o f t - f a i l u r e . That i s , should one or more elements of {0} inoperative,  or {W} become  the system w i l l s t i l l be capable of representing a r b i t r a r y  54 functions to any required accuracy, possibly a f t e r additional t r a i n i n g , as long as the region of representation i s permitted  to be s u f f i c i e n t l y  small. To prove t h i s conjecture, i t i s only necessary to show that at least  one functional element o f {$} i s non-zero  estimation  i s required. Multinomial-based  at the point where  systems can be arranged so  that few ( l i k e l y no more than n) elements o f {0} have values o f zero at the same point and hence systems based on these sets w i l l exhibit s o f t f a i l u r e . In terms of a physical device which implements the algorithm, t h i s means that the device may be u s e f u l , possibly over a reduced range, despite the f a i l u r e of some of i t s components. When considering systems which employ splines or harmonic s e r i e s , i t i s seen that at a number of points  i n the region of d e f i n i t i o n , a l l but a limited  number of the  elements of the basis function set are zero. Should f a u l t y components cause a l l these non-zero elements to become inoperative, the system's output w i l l be a f i x e d , erroneous, value.  Thus, the present  system of  multinomial-based functions has a s i g n i f i c a n t , p r a c t i c a l advantage.  6.4  Learning Functions of Several Variables  In order learn  to demonstrate the a b i l i t y o f the learning algorithm to  and generate  several  applied to a more complicated  functions  o f several  variables,  i t was  model. The model used i s based upon the  geometry o f a human arm as shown i n Figure 6.6 [26], The  system, whose model was programmed on a d i g i t a l computer, i s  best described by the block diagram of Figure 6.7. In t h i s f i g u r e , i t can be seen that a desired wrist p o s i t i o n ,  i s used as the input to  55  Fig 6.6  The Geometry of the Model Arm (after [26])  the estimation (learning) system. Values of {(J)(HTJ} are then generated and used to compute each j o i n t angle according to  fjJHi) = (DT(Hi)Wi f ( H i ) = 0 (Hi)W2 T  2  f ( H i ) = (D (Hi)W3 T  3  f ( H i ) = (D (Hi)W4 T  4  where  i s the weight set associated with f j . .  (6.11)  56  CALCULATED £  LEARNING SYSTEM  1  POSITION  MODELED  I >  ARM  Af ERROR CORRECTING SYSTEM  THRESHOLD SENSOR  7 DESIRED POSITION  TARGET POSITION GENERATOR  Fig 6.7  These  joint  Block Diagram of the Arm Model Learning System  angles  are  p o s i t i o n . This p o s i t i o n  then  used  to  compute  the  i s compared with the desired  resultant one,  and  error i s greater than 6, the angular corrections are computed  wrist i f the  (modeling  a sensory feedback loop), the system output, "f, i s corrected with A f  to  obtain  is  applied.  the It  desired is  wrist  position,  particularly  and  interesting  the to  learning note  the  algorithm  similarities  between Figure 6.7 and the cerebellar system shown i n Figure 2.1. The model was presented with randomly generated points which have a  57 uniform d i s t r i b u t i o n inside a box which i s located i n the region 5  < x < 13  -5 < y < 12 0  < z < -9  where a l l dimensions are i n inches.  Table 6.4 demonstrates the rapid learning that the model learns  rate of t h i s model and shows  to position the wrist with good accuracy a f t e r  learning at approximately 3000 points. The e f f e c t o f varying demonstrated. That i s , larger values o f € tend error while increasing the RMS e r r o r .  G i s also  to reduce the maximum  Table 6.4  m=5  k=500  Learning Sequences f o r the Arm Model  J=500  Arm f l a p (alpha)=40 degrees  u(H) = s(H) = uniform ERMS and EMAX are i n inches  a)  6=0.0  POINTS  CHANGES  ERMS  EMAX  SF  1000  1000  6.70  18.3  0.045  2000  2000  0.99  2.52  0.0066  3000  3000  0.19  0.73  0.0010  4000  4000  0.13  0.48  0.0004  5000  5000  0.14  0.84  0.0005  ERMS  EMAX  SF  b) G5 = 0 . 4  POINTS  CHANGES  1000  1000  6.70  18.3  0.045  2000  1934  0.99  2.52  0.0065  3000  2150  0.33  0.68  0.0003  4000  2171  0.32  0.46  0.0001  5000  2188  0.32  0.64  0.0003  59 VII  •7.1  DISCUSSION AND CONCLUSION  Physiological Implications  The algorithm which has been developed i n t h i s thesis i s based upon the  known anatomy and physiology  therefore reasonable s i m i l a r to those  o f the mammalian cerebellum.  It i s  to predict that cerebellar operations may be very  of the learning algorithm herein described. That i s ,  the mathematics o f modifiable synapses, and the functions o f cerebellar c e l l s may well be those  given by equations  (4.1),  (4.3),  (5.30), and  (5.31) which specify the learning system derived by t h i s t h e s i s . An important property o f t h i s system i s that a l l inputs are treated i d e n t i c a l l y . That i s , there  i s no need to d i f f e r e n t i a t e between  Mossy  are  fiber  inputs  which  related  to  sensory  or  those  peripheral  information and those which are related to "context" or commands. This permits both sensory and command parameters to be treated as continuous variables  so  that  commands  inherently  contain  rate  factors.  Thus  "walk", "run", and "sprint" may be the same command, "move", at various i n t e n s i t i e s . The lack o f s p e c i f i c i t y also means that there for  an exact  specific target  mapping  Granule c e l l ) area  of Mossy  fibers  to s p e c i f i c  i n the cerebellar cortex.  i s a l l that  i s required,  thus  i s no need  points Rather,  reducing  (or to a a  general  the amount o f  information which must be stored by cerebellum-related genes. Both of the above properties represent  significant  improvements  over existing cerebellar models. In p a r t i c u l a r , t h i s model resolves the d e f i c i e n c i e s o f Albus  1  theory [2] by presenting  a feasible mapping of  60 Mossy  fiber  activity  to  Purkinje  inputs, both commands and  cell  peripheral  activity  data,  and  by  treating a l l  i d e n t i c a l l y as  continuous  variables. The model has predicted that corrections to Purkinje c e l l are not normalized. That i s , Climbing  fiber activity  (Af)  activity  causes the  synaptic weights of the target Purkinje c e l l to change, resulting i n a change i n the frequency of that Purkinje c e l l ' s action p o t e n t i a l s which  is  not  cerebellar  exactly  system  to  Subcortical Nuclear generating is  equivalent function  cells  to  in  Af*  this  perform  a  exactly.  The  weight  permit  manner  the  proposed  i t i s necessary  "sample-and-hold"  an output from the cerebellum  corrected  To  (Af)  function,  (f) (see equation 5.30)  adjustments,  that  which  thus which  correspond  to  learning, are not normalized for several reasons: 1.  to  permit  weight  computations;  a  adjustments  function  of  to  the  be  pre  strictly and  post-synaptic  a c t i v i t i e s at the synapse whose weight i s being 2.  to  permit the  optimal  weight set  to  be  local  adjusted,  computed  as  the  approximation of an integral as given by equation (5.29), 3.  to  speed  factors  convergence in  techniques, 4.  to  aid  i n the  numerical  manner of  optimization  convergence  and  root  gain  finding  and  system  stabilty  by  reducing  the  effects  of  correcting errors at infrequent points.  If  further  un-normalized iterations  at  physiological  operation, any  the  single  experiments algorithm  learning  may  point  should be act  disprove  modified to  so  compute  adjustments which r e s u l t i n changing f by an amount equal to the  this that weight exact  61 error at that point. That i s , continuous re-applications o f the weight adjustment  algorithm  will  reduce the estimation  error  to zero,  thus  computing the weight adjustment as  AW = AfO(Hi)  k  .  ( 7 a )  k  <DT  ( H L )  (D  ( H L )  which i s r e a l l y the equation given previously as (5.21). Future experiments may also show that P a r a l l e l not constant as implied  fiber  activity i s  i n Chapter 5. That i s , i t may be as suggested  by Eccles [14], that P a r a l l e l fiber a c t i v i t y i s normally i n s i g n i f i c a n t , rising  during  cerebellar  level.  In t h i s case, Purkinje  rising  briefly,  then  computations, then cell  returning  returning  to a near-zero  a c t i v i t y would also be transient,  to i t s normal, spontaneous,  rate. To  account f o r t h i s modification of the nature o f cerebellar operation, the theory  again  requires  combination of phasic activity,  would  be  learning algorithm  7.2  The Learning  The  only  and tonic) used  to form  variation.  activity,  Namely phasic  rather  (or a  than s t r i c t l y  the orthonormal  tonic  set to which the  i s applied.  Algorithm  learning  slight  alogrithm  and Machine Intelligence  described  in this  thesis may  be  applied  d i r e c t l y to learning machines. The cerebellum, a f t e r which the system i s modeled, i s an extremely e f f e c t i v e motion c o n t r o l l e r . This  suggests  that the current system may prove e f f e c t i v e i n a number of applications as  an adaptive  c o n t r o l l e r . Potential  where complicated, possibly  applications  unknown, non-linear  include  control  situations  equations o f  62 several  variables are  found such as  i n power system control [52], i n  i n d u s t r i a l process control [30,49], and  in robotics [5,6],  Another a p p l i c a t i o n of the system i s i n pattern system  has  been  shown  to  be  capable  of  recognition. The  simultaneously  learning  to  generate a number of functions of several v a r i a b l e s . If these variables are parameters derived  from a family of patterns, and  i f the  output  functions are the p r o b a b i l i t i e s that a given input pattern corresponds to each of a number of c l a s s e s , then the device  i s r e a l l y a pattern  recognizer. In these and other applications, a h i e r a r c h i c a l network of learning machines [4,31] such as that shown i n Figure 7.1  may  prove  effective.  In t h i s arrangement, successively higher l e v e l s control correspondingly higher  operations.  Each  device  d i r e c t s lower l e v e l s , and levels. large  The  Figure  numbers  of  performs  processes  shows a inputs  information  conceptual and  i t s own  outputs  control  to be  arrangement. which  can  functions,  used by  In  higher  p r a c t i c e , the  be  handled  by  a  cerebellum-like learning machine permit a s i n g l e physical device to act as several l e v e l s of the hierarchy simply by parameters as  input v a r i a b l e s . This  using  computed  feedback provides  such  with the c a p a b i l i t y to compute and/or control complicated also  poses  interesting  problems  regarding  programming  (output) a  system  functions. It or  teaching  strategies for the system. The  learning algorithm may  also be considered  the Polynomial Discriminant Functions  as a refinement  of  [29,50,51] discussed in Chapter 5.  The advantage of the new approach i s to remove the requirement of using a fixed number of training points. With the new upper l i m i t purposes.  on  the  number of points which may  algorithm, there i s no be  used  for  training  OUTPUT  LEVEL 2  LEVEL 1  LEVEL 0  "T^v  T>—7  EXTERNAL DATA  Fig 7.1 A Hierarchical System of Learning Machines  64 F i n a l l y , the p a r a l l e l processing properties of t h i s learning system are  important. It should  are  used  be  remembered that the  to compute a l l the  output  functions  output functions and error corrections may  same basis  so  that  be calculated  any  functions number  of  simultaneously.  A l l output c e l l s are isolated from each other, thus permitting s e l e c t i v e adjustment of estimating  functions; only those functions whose error i s  excessively large requiring adjustment at any time.  7.3  Contributions Of This Thesis The  major  contribution  of  this  thesis  i s the  development of  a  system which has  the capacity to learn to approximate a r b i t r a r y , high  order  of  functions  several  functions  as  weighted  employing  an  iterative  inversion,  the  sums  variables. of  solution  system  requires  By  continuous rather  representing basis  than  relatively  estimated  functions,  one  which  few  variable  and  uses  by  matrix  elements  (memory). The learning  thesis  has  also  shown  that,  for  orthonormal  interference i s minimized i f estimation  adjusting the weight set {W}  according  to the  basis  errors are  sets,  reduced  by  expression  AW = AffHiMH].) .  (7.2)  k This procedure thus reduces the number of i t e r a t i o n s which are  required  before a r b i t r a r y functions are approximated to a required accuracy. procedure has also been shown to r e s u l t  i n a learned  produces  closely  an  estimated  function  weight set which  approximating  least-square-error estimation of the target function.  The  the  65 Since cerebellum  the  system  which  i s based  is a  very  on  the  efficient  structure  adaptive  of  the  mammalian  controller,  i t shows  promise i n a number of applications and as the basis of a new  c l a s s of  i n t e l l i g e n t machines. A  generalized  method  of  constructing  the  orthonormal  sets  of  functions required by the system has also been presented. The second s i g n i f i c a n t contribution of t h i s thesis i s to present an improved  cerebellar  anatomical models.  and  Unlike  activities  as  model  which,  physiological data, other  being  consistent  i s more  cerebellar models,  continuous  This  current  p l a u s i b l e than  previous  this  v a r i a b l e s , rather  throughout a l l i t s operations.  with  one  simulates  than as binary v a r i a b l e s ,  i s important  since, despite  " a l l or nothing" character of action p o t e n t i a l s , neural generally thought to be  transmitted  as  neural  frequency  the  information i s  coded values.  Also,  the model proposes an alogorithm which adjusts synaptic weights s t r i c t l y according to the a c t i v i t i e s of the pre and post-synaptic neurons at that synapse. Referring to the  post-synaptic  learning  activity  i s of c r i t i c a l  the long, narrow, and makes  an  (7.2), the pre-synaptic  algorithm  is  A f (Hi).  This  activity property  i s <t)(Hi) while of  localized  importance to a p l a u s i b l e cerebellar model as  widespread structure of Purkinje c e l l  which  requires  computations  dendrites  involving  non-local  optimized  learning  i n most  research,  a c t i v i t y exceedingly u n l i k e l y .  7.4  Areas of Further Research  This  thesis has  succeeded  in developing  algorithm which models the cerebellum.  an  However, as  66 while resolving some issues, i t leaves many interesting questions to be answered. In  terms  of  the  cerebellum,  a  number  of  physiological  investigations are indicated: 1.  Determine the mathematical form of the mapping from Mossy fiber  activity  to Granule c e l l  activity.  determine whether the frequency o f action Parallel  f i b e r s can be interpreted  In p a r t i c u l a r , potentials i n  as a basis set o f the  form suggested by t h i s t h e s i s :  S(H) = A-KD(H)  (7.3)  where {(f) (H)} i s an orthonormal set A i s a vector of constants to ensure S(H)>0 {S(H)} i s the set of actual P a r a l l e l f i b e r activity.  2.  Determine  whether  fiber-Purkinje  cell  the  connectivity  synapses  i s modifiable.  of  Parallel  If some form  of p l a s t i c i t y i s found, determine the mathematical r e l a t i o n which governs t h i s p l a s t i c i t y . 3.  Determine whether the cerebellum functions as a phasic or as a tonic device and whether Subcortical Nuclear c e l l s do in f a c t perform a "sample-and-hold" by t h i s t h e s i s .  1  operation as proposed  1  0.0  0.25  0.5  1  0.75  1.0  1.25  1.5  1.75  2.0  0.75  1.0  1.25  1.5  1.75  2.0  a) FIRST ORDER SPLINES in  b) SECOND ORDER SPLINES  a  0.0  0.25  0.5  C ) THIRD ORDER SPLINES  Fig 7.2  One Dimensional Spline Functions  68 There are also a number of mathematical  questions,  relating  to  learning machines, posed by t h i s research. The following investigations could prove most i n t e r e s t i n g : 1.  There are several properties of spline functions, i n terms of  basis  functions  for learning  machines,  promising. As shown i n Figure 7.2, a  limited  learning produce  region  of non-zero  algorithms, error  support.  In  this  correcting other  words,  functions would cause no  which  each such function has  support. When  means that functions  appear  of  considering  weight  adjustments  similarly  spline-based  error  limited  correcting  interference outside of a  small  region. The major problem with splines i s to generate a useful set of functions which produce continuous functions and  do  not  require  excessive  numbers of  Another possible problem, as discussed  elements [41].  i n Section 6.3, i s  the potential system f a i l u r e which could  result  from the  f a i l u r e of only a few elements of the basis set generator. 2.  The programs which demonstrate  the effectiveness of  this  learning system use extended p r e c i s i o n (64 b i t ) , f l o a t i n g point, d i g i t a l values i n a l l computations. The e f f e c t s of using analog, reduced p r e c i s i o n , or noisy v a r i a b l e s require further investigation. 3.  The  operational  c h a r a c t e r i s t i c s of  the  system  in  a  r e a l i s t i c control system environment require t e s t i n g . 4.  There  i s much  strategies  for  t h e o r e t i c a l work training  or  networks of learning systems.  required  "programming"  to  determine  hierarchical  69 5.  The system i s r e l a t i v e l y expensive and slow when modeled on a  general  purpose,  digital  e f f e c t i v e , the system should special purpose, device.  computer.  be constructed  To  be  as a  most single,  70 BIBLIOGRAPHY  Cited Literature  [I]  J.S. Albus, A Theory o f Cerebellar pp25-61, Feb 1971.  Function, Math. B i o s c i . , 10,  [2]  J.S. Albus, Theoretical and Experimental Aspects o f a Cerebellar Model, Ph.D. Thesis, University o f Maryland, Dec 1972.  [3]  J.S. Albus, A New Approach to Manipulator Control: The Cerebellar Model A r t i c u l a t i o n Controller (CMAC), Trans. ASME Ser.G., 97, pp200-227, Sept 1975.  [4]  J.S. Albus, Data Storage i n the Cerebellar Model A r t i c u l a t i o n Controller (CMAC), Trans. ASME Ser.G., 97, pp228-233, Sept 1975.  [5]  J.S. Albus and J.M. Evans,Jr., Robot Systems, S c i e n t i f i c 234, pp77-86b, Feb 1976.  [6]  L.A. Alekseeva and Y.F. Golubev, An Adaptive Algorithm for S t a b i l i z a t i o n of Motion of an Automatic Walking Machine, Eng. Cybern., 14, No.5, pp51-59, 1976.  [7]  C C . B e l l and R.S. Dow, Cerebellar C i r c u i t r y , Prog. Bui., 5, No.2, ppl21-222, 1967.  [8]  S. Blomfield, Arithmetic Operations Brain Res., 69, ppll5-124, 1974.  [9]  V. Braitenberg and R.P. Atwood, Morphological Observations on the Cerebellar Cortex, J . Comp. Neurol., 109, ppl-27, 1958.  [10]  R.J. Brown, Adaptive Multiple-Output Threshold Systems and Their Storage Capacities, Technical Report 6771-1, Stanford Electronics Laboratories (SU-SEL-64-018), June 1964, AD 444 110.  [II]  T.W. Calvert and F. Meno, Neural Systems Modeling Applied to the Cerebellum, IEEE Trans. Syst., Man, & Cybern., SMC-2, pp363-374, J u l y 1972.  [12]  J.C. Eccles, M. Ito, J . Szentagothai, The Cerebellum as a Neuronal Machine, Springer-Verlag, New York, 1967.  Performed  American,  Neurosci. Res.  by Nerve  Cells,  71 [13]  J.C. Eccles, The Dynamic Loop Control Hypothesis of Movement Control, i n Information Processing i n the Nervous System, K.N. Leibovic, ed., Springer-Verlag, New York, 1969.  [14]  J.C. Eccles, The Cerebellum as a Computer: Patterns i n Space and Time, J . Physiol. Lond., 229, No.l, ppl-32, 1973.  [15]  B.R.  Gaines, Stochastic Computing Systems, i n Advances in Information Systems Science, J.T. Tou, ed., Plenum Press, New York, 1969.  [16]  A.R.  Gardner-Medwin, The r e c a l l of events through the learning of associations between their parts, Proc. R. Soc. Lond. Ser.B, 194, pp375-402, 1976.  [17]  P.F.C. G i l b e r t , A Theory of Memory that Explains the Function and Structure of the Cerebellum, Brain Res., 70, ppl-18, 1974.  [18]  G.H.  Glasser and D.C. Higgins, Motor S t a b i l i t y , Stretch Responses and the Cerebellum, i n Nobel Symposium, I. Muscular Afferents and Motor Control, R. Granit, ed., Almquist and Wiksell, Stockholm, ppl21-138, 1966.  [19]  P.G.  Guest, Numerical Methods of University Press, London, 1961.  [20]  M. Hassul and P.D. Daniels, Cerebellar Dynamics: The Mossy Fiber Input, IEEE Trans. Bio-Med, BME-24, pp449-456, Sept 1977.  [21]  D.O.  [22]  J.K.S. Jansen, K. Nicolaysen, T. Rudjord, Discharge Patterns of Neurons of the Dorsal Spinocerebellar Tract Activated by S t a t i c Extension of Primary Endings of Muscle Spindles, J . Neurophysiol., 29, ppl061-1086, 1966.  [23]  J.K.S. Jansen, K. Nicolaysen, T. Rudjord, On the F i r i n g Pattern of Spinal Neurons Activated from the Secondary Endings of Muscle Spindles, Acta Physiol. Scand., 70, ppl83-193, 1967.  [24]  H.H.  [25]  Y. Kosugi and Y. Naito, An Associative Memory as a Model for the Cerebellar Cortex, IEEE Trans. Syst., Man, & Cybern., SMC-7, pp94-98, Feb 1977.  [26]  P.D.  Curve  Hebb, The Organization of Behaviour: Theory, Wiley, New York, 1949.  Fitting,  A  Cambridge  Neuropsychological  Kornhuber, Motor Functions of Cerebellum and Basal Ganglia: The Cerebellocortical Saccadic (Ballistic) Clock, the Cerebellonuclear Hold Regulator, and the Basal Ganglia Ramp (Voluntary Speed Smooth Movement) Generator, Kybernetic, 8, No.4, ppl57-162, 1971.  Lawrence and W-C. L i n , S t a t i s t i c a l Decision Making i n the Real-Time Control of an Arm Aid for the Disabled, IEEE Trans. Syst., Man, & Cybern., SMC-2, pp35-42, Jan 1972.  72 [27]  H.C.  Longuet-Higgins, D.J. Willshaw, O.P. Buneman, Theories of Associative Recall, Q. Rev. Biophys., 3, No.2, pp223-244, 1970.  [28]  D. Marr, A Theory of Cerebellar Cortex, J . Physiol. Lond., pp437-470, 1969.  [29]  W.S.  Meisel, Potential Functions in Mathematical Pattern Recognition, IEEE Trans. Comput., C-18, pp911-918, Oct 1969.  [30]  M.D.  Mesarovic, The Control of Multivariable Systems, Technology Press and John Wiley and Sons, New York, 1960.  [31]  M.D.  Mesarovic, D. Macko, and Y. Takahara, Theory of H i e r a r c h i c a l , M u l t i l e v e l Systems, Academic Press, New York, 1970.  [32]  M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge,  [33]  J.A. Mortimer, A Computer Model of Mammalian Cerebellar Cortex, Comput. B i o l . & Med., 4, pp59-78, 1974.  [34]  N.J. Nilsson, Learning Machines, McGraw-Hill, New York, 1965.  [35]  E. Parzen, Modern P r o b a b i l i t y Theory and Wiley and Sons, New York, 1960.  [36]  E. Parzen, On Estimation of a P r o b a b i l i t y Density Function Mode, Ann. Math. Stat., 33, ppl065-1076, Sept 1962.  [37]  R.G. Peddicord, A Computational Model of Cerebellar Cortex and Peripheral Muscle, Int. J . Bio-Med. Comput., 8, pp217-237, 1977.  [38]  A. P e l l i o n i s z , Computer Simulation of Large Cerebellar Neuronal F i e l d s , Hung., 5, No.l, pp71-79, 1970.  [39]  A. P e l l i o n i s z and J . Szentagothai, Dynamic Single Unit Simulation of a R e a l i s t i c Cerebellar Network Model, Brain Res., 49, pp83-99, 1973.  [40]  A. P e l l i o n i s z and J . Szentagothai, Dynamic Single Unit Simulation of a R e a l i s t i c Cerebellar Network Model. I I . Purkinje C e l l A c t i v i t y within the Basic C i r c u i t and Modified by Inhibitory Systems, Brain Res., 68, ppl9-40, 1974.  [41]  P.M.  Prenter, Splines and Sons, New York, 1975.  [42]  M.H.  Raibert, a Model for Sensorimotor Control and Learning, B i o l . Cybern., 29, No.l, pp29-36, 1978.  [43]  A. Rapoport, "Addition" and " M u l t i p l i c a t i o n " Theorems for the Inputs of Two Neurons Converging on a Third, Bui. Math. Biophys., 13, ppl79-188, 1951.  Variational  202,  1969.  i t s Applications,  John  and  the Pattern Transfer of Acta Bioch. & Biophys.  Methods,  John  Wiley  and  73 [44]  F. Rosenblatt, the Perceptron: A Probabilistic Model for Information Storage and Organization i n the Brain, Psycol. Rev., 65, No.6, pp386-408, 1958.  [45]  F. Rosenblatt, P r i n c i p l e s of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington, 1961.  [46]  T.C. Ruch, Basal Ganglia and Cerebellum, i n Medical Physiology and Biophysics, T.C. Ruch and J.F. Fulton, eds., Saunders, Philadelphia, 1960.  [47]  N.H. Sabah and J.T. Murphy, R e l i a b i l i t y o f Computations Cerebellum, Biophys J . , 11, pp429-445, 1971.  [48]  N.H. Sabah, Aspects of Cerebellar Computation, i n Proceedings of the European Meeting on Cybernetics and System Research, Vienna, Transcripta, London, 1972, pp230-239.  [49]  M. Simaan, Stackelberg Optimization o f Two-Level Systems, IEEE Trans. Syst., Man, & Cybern., SMC-7, pp554-557, J u l y 1977.  [50]  D.F. Specht, Generation of Polynomial Discriminant Functions f o r Pattern Recognition, Technical Report No.6764-5, Stanford Electronics Laboratories (SU-SEL-66-029), May 1966, AD 487 537.  [51]  D.F. Specht, A Practical Technique for Estimating General Regression Surfaces, Lockheed Palo Alto Research Laboratory, June 1968, AD 672 505.  [52]  B. Stott, Power System Dynamic Response Calculations, Proc. 67, pp219-241, Feb 1979.  IEEE,  [53]  D.F. Stubbs, Frequency and the Brain, L i f e ppl-14, 1976.  No.l,  [54]  B. Widrow, N.K. Gupta, S. Maitra, Punish/Reward: Learning with a C r i t i c i n Adaptive Threshold Systems, IEEE Trans. Syst., Man, & Cybern., SMC-3, pp455-465, Sept 1973.  General  [1]  i n the  Sciences, 18,  References  P. Cress, P. Dirksen, J.W. Graham, Fortran IV with Watfor and Watfiv, Prentice-Hall, Englewood C l i f f s , New Jersey, 1970.  74 [2]  H.W.  Fowler and F.G. Fowler, eds., The Concise Oxford Dictionary, University Press, Oxford, 1964.  [3]  A.C. Guyton, Textbook Philadelphia, 1976.  [4]  P.H.  [5]  B. Noble, Applied Linear Algebra, Prentice-Hall, Englewood C l i f f s , New Jersey, 1969.  [6]  B. Rust, W.R. Burrus and C. Schneeberger, A Simple Algorithm for Computing the Generalized Inverse of a Matrix, Comm. ACM, 9, pp381-387, May 1966.  [7]  S.M.  of  Medical  Physiology,  Lindsay and D.A. Norman, Human Academic Press, New York, 1977.  Selby, ed., Standard Co., Cleveland, 1968.  Mathematical  W.B.  Information  Tables,  Saunders,  Processing,  Chemical  Rubber  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065482/manifest

Comment

Related Items