UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Neural networks for legal quantum prediction Terrett, Andrew J. 1994-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


ubc_1995-0026.pdf [ 9.05MB ]
JSON: 1.0077428.json
JSON-LD: 1.0077428+ld.json
RDF/XML (Pretty): 1.0077428.xml
RDF/JSON: 1.0077428+rdf.json
Turtle: 1.0077428+rdf-turtle.txt
N-Triples: 1.0077428+rdf-ntriples.txt
Original Record: 1.0077428 +original-record.json
Full Text

Full Text

NEURAL NETWORKS FOR LEGAL QUANTUM PREDICTION by Andrew J. Terrett B.A., University of Warwick, England, 1990 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF LAWS in THE FACULTY OF GRADUATE STUDIES (Faculty of Law)  We accept this thesis as conforming to thejrequired standard  THE UNIVERSITY OF BRITISH COLUMBIA October 1994 © AJ.Terrett, 1994  In  presenting this  degree at the  thesis in  University of  partial  fulfilment  of  of  department  this or  thesis for by  his  or  scholarly purposes may be her  representatives.  permission.  The University of British Columbia Vancouver, Canada  DE-6 (2/88)  for  an advanced  Library shall make it  agree that permission for extensive  It  publication of this thesis for financial gain shall not  Date  requirements  British Columbia, I agree that the  freely available for reference and study. I further copying  the  is  granted  by the  understood  that  head of copying  my or  be allowed without my written  ABSTRACT  This thesis argues that Artificial Neural Networks (ANN's) have applications within the domain of law and can be built using readily available Artificial Intelligence software for the Personal Computer. In order to demonstrate this, I have built working ANN's using data made available to me by the Faculty of Law Artificial Intelligence Research Project (FLAIR) and Windows™-based ANN software that is commercially available.  The  reasons for building a system are three-fold. First, a working system is the most graphic demonstration of my above assertion. Secondly, given my background as a practitioner in law, I am concerned to ensure the quick, efficient movement of law-related technology from research laboratory to marketplace. Thirdly, the building of a neural network avails me with the opportunity to analyze comparatively the performance of the ANN's with statistical and expert system models which used the same data and were also built at the FLAIR project.  The theoretical foundation for this thesis is the view that, although many legal decisions are often reducible to a set of doctrines, policies or sub-doctrinal rules, certain domains of legal decision-making evade analysis using a rule-based paradigm.  Thus, although  relational patterns between any given facts and the law exist, they cannot always be described. Therefore in order to build "intelligent" computer programs that can assist the lawyer in his or her work, we should explore the potential of and utilize those tools that can find relational patterns automatically.  Having done this, we should attempt to  combine the same with those software tools that we understand more fully, namely expert systems and traditional programming methods. The use of hybrid neural network/expert system programs is well developed in many other domains. Legal researchers, however, have yet to even thoroughly examine neural networks as an isolated technology. This thesis is an attempt to right this imbalance.  ii  TABLE OF CONTENTS  Abstract  ii  Table of Contents  iii  List of Figures  viii  A Note about the Software  ix  Acknowledgment  x  Introduction . Chapter 1  .  .  .  .  .  .  .  .  1  An Introduction to Neural Networks  1.1  Fundamental Definitions  .  .  .  .  .  4  1.2  Deduction-v-Induction  .  .  .  .  .  7  1.3  Conventional Computing Techniques  1.4  Artificial Intelligence (Al)  1.5  Expert Systems  1.6  Fuzzy Logic  1.7  Fuzziness and Legal Language  14  1.8  The Sciences of Complexity .  17  1.9  Neural Networks  1.10  The Artificial Neuron .  1.11  Training a Neural Network  .  .  .  .  .  25  1.11.1 Back-Propogation  .  .  .  .  .  26  .  .  . .  .  .  .  8  .  .  8 11  .  .  .  . .  .  .  .  .  .  .  .  .  .  13  .  19  .  20  1.11.2 The Delta Rule  26  1.11.3 Supervised-v-Unsupervised Learning  28  1.12  Why Not Use Expert Systems?  1.13  Further Advantages of Neural Networks  30  1.14  Where Neural Networks Falter  32  iii  .  .  .  .  29  1.15  Minds and Machines .  .  .  .  .  1.16  The Biological Analogy - The Neuron  1.17  Neural Networks - A Brief Historical Analysis  .  33 34  .  .  35  .  .  36  1.17.1 McCulloch and Pitts . 1.17.2 The Hebbian Rule  38  1.17.3 The Perceptron  .  .  .  .  .  1.17.4 "The Dark Ages" - Minsky and Papert 1.18  The Re-Birth of Connectionism  1.19  Parallel Distributed Processing (PDP)  1.20  Real World Application of Neural Networks  Chapter 2  .  38 41  .  .  .  .  .  .  42 42 44  Issues of Jurisprudence  2.1  Introduction  .  2.2  Aland Law-From Where to Where...  .  .  .  47  2.3  Legal Reasoning and the Expert System  .  .  .  49  2.4  Logic-Based Approaches  2.5  Jurisprudential Theory and Artificial Models .  2.6  The Use of Rules  .  .  .  .  .  .  53  2.6.1  .  .  .  .  .  .  53  Positivism  .  .  .  .  .  .  .  46  .  .  50  .  .  52  2.6.2 Limitations of the Rule-Based Approach  2.7  2.8  2.6.3  Rule-Skepticism and Legal Realism  2.6.4  The Analytical Teleological Approach  2.6.5  Gardner  2.6.7  The Whiplash Expert System .  .  .  .  56  .  .  .  .  58 .  .  .  60 62 64  Case-Based Reasoning and the Importance of Analogy  64  2.7.1 Levi  65  .  .  Applications of CBR in Law: Ashley . iv  .  .  .  67  2.9  Susskind  2.10  Wahlgren  2.11  A Connectionist Theory of Law  2.12  Legal Reasoning As A Skill  2.13  Conclusion  Chapter 3  .  .  69 71  .  .  .  . .  .  .  .  .  .  .  .  74  .  .  78  .  .  81  .  83  Neural Networks and Law - A Literature Review  3.1  Introduction  .  .  .  .  3.2  Connectionism And Natural Language  .  .  .  83  3.3  Connectionism and Legal Open Texture  .  .  .  85  .  .  85  .  .  87  .  .  88  .  .  90  .  .  92  3.3.1  Bochereau et al.  .  3.3.2  Van Opdorp and Walker  3.3.3. Bench-Capon  .  .  . .  .  .  .  .  3.4  Raghupathi, Levine, Bapi and Schkade  3.5  The Triune Theory of the Brain  3.6  Philipps  3.7  Information Retrieval  3.8  Neural Networks for Information Retrieval  3.9  The Hybrid Approach .  3.10  Neural Networks for Document Assembly  3.11  Conclusion  Chapter 4  .  .  .  .  92  .  .  .  .  .  .  .  .  . .  .  .  .  . .  .  . .  .  94  .  96  .  98  .  99  .  101  The Whiplash Neural Network  4.1  Introduction  103  4.2  TheRoleoflCBC  103  4.3  What is Whiplash?  105  4.4  How Are Damages Assessed?. v  .  .  .  .  106  4.5  The Initial Assessment  .  106  4.6  The Statistical Quantum Advisor  .  107  4.7  Methodology  109  4.7.1  109  Test2.prg  4.7.2 Data Problems 4.7.3 4.8  .  .  .  .  112  Deceptive Data  113  The Software  115  4.8.1. Histograms  .  .  .  .  .  .  115  4.8.2  Training, Testing and Running  117  4.8.3  Severity Rating - An Explanation  118  4.8.4  The First Neural Network  119  4.8.5  The Next Eight Neural Networks  .  .  .  120  4.9  How Many Hidden Neurons - Part 1 .  .  4.10  Extreme Data Removal.  .  4.11  Tolerances  4.12  How Many Output Neurons .  130  4.12.1 Bands  132  4.12.2 Square Root  135  4.13  .  .  . .  . .  Changing Other Factors  .  .  123 . .  .  .  .  .  128 129  .  .  145  4.13.1 Inclusion Of Dates As Input Neurons.  145  4.13.2 A Comparative Analysis Of Date Inputs  148  4.14  Learning Rates  4.15  How Many Hidden Neurons - Part 2 .  4.16  A Higher Number Of Data Runs?  4.17  A Comparison With The Statistical Quantum Advisor  4.18  Conclusions  .  .  .  .  .  .  .  .  150  .  .  .  vi  .  .  .  154  .  .  .  159 160  .  164  Chapter 5  Conclusion  5.1  Where Now?  166  5.2  Improving The Search Strategy  168  5.3  Future Research  .  .  .  .  .  .  168  5.3.1  Genetic Algorithms  168  5.3.2  Hybrids - Combining Expert Systems With Neural Networks 170  5.4  5.5  5.6  Practical Problems of Neural Networks  172  5.4.1  Finding A Source of Data  172  5.4.2  Technical - Externalizing The Internalities  5.4.3  Incomplete Data  .  .  .  . .  172 .  174  The Sociological Context  175  5.5.1  Would Companies Want These Machines  175  5.5.2  Would Society Want These Machines  176  5.5.3  The Nature Of The Negotiation Process  177  Al And The Media  .  .  178  Bibliography  181  Appendix A  Test2.Prg  192  Appendix B  List of Facts Used As Inputs  201  Appendix C  Test3.Prg (segment) .  .  .  .  .  204  Appendix D  Test4.Prg (segment) .  .  .  .  .  207  Appendix E  dBASE Coding For The "Fuzzification" Of Quantum  209  Appendix F  Glossary of Neural Network Terms  211  Appendix G  Glossary of Brainmaker Terms  215  Appendix H  Graphic Representation of Figure 4.13  217  vii  LIST O F FIGURES Chapter 1 Figure 1.1  An Artificial Neuron  .  .  .  Figure 1.2  The Sigmoid Activation Function  Figure 1.3  Common Neural Network Architecture  23  Figure 1.4  Inputs And Outputs Of A Very Simple Neural Network  24  Figure 1.5  A Simplified Biological Neuron  Figure 1.6  The Perceptron  39  Figure 1.7  Graphic Representation of Hypothetical Welfare Agency Rule  40  .  .  .  .  .  20  .  .  .  21  .  .  .  35  Chapter 4 Figure 4.1(a) & (b)  Network Progress Display Graph 1  .  116  Figure 4.2  Network Progress Display Graph 2  .  117  Figure 4.3  A Healthy Learning Neural Network .  121  Figure 4.4  A "Brain-Dead" Neural Network  Figure 4.5  Results of experiments with varying number of hidden neurons  125  Figure 4.6  Results of further experiments varying number of hidden neurons  126-7  Figure 4.7  Comparative Table of Outputs Using Square Root of Quantum  136-7  Figure 4.8  Graphical Representation of Fuzzy B anding of Quantum  139  Figure 4.9  Table of Outputs from Multi-Output Neural Network  141  Figure 4.10  Comparative Table of Outputs Using Different Date Configurations 147-8  Figure 4.11  Comparative Table of Outputs Using Varying Learning Rates  Figure 4.12  Comparative Table of Outputs Varying Number of Hidden Neurons 155-7  Figure 4.13  Comparative Table of Neural Networks and SQA Outputs .  viii  .  .  .  .  122  150-2  162.  A NOTE ABOUT THE SOFTWARE  I have used Brainmaker Neural Network Simulation Software (Versions 3.0 and .3.1 for Windows™) developed by California Scientific Software of Nevada City, CA., Borland dBASE™ Version III Plus as the database, Microsoft Excel Version IV and CorelDRAW Version 2.01 for graphing throughout this thesis. I cannot, in all honesty, endorse either Brainmaker or dBASE™. In my opinion, although dBASE™ is an reasonable product as a database, for my purposes, its programming language was "clunky", very "userunfriendly" and lacked the mathematical functionality necessary to build neural networks. Brainmaker although economically priced, also lacked certain functionality of other similar neural network software. Version 3.0 contained bugs which caused unnecessary delay to the progress of my research. However, the book that accompanies the software makes an excellent introduction to neural network technology (and is almost worth the money for this feature alone.) The actual software manual, however, is bewildering (at least initially) to someone who not skilled in the art of developing neural networks.  To someone  contemplating a similar project I would offer this advice; explore the neural network software available over the Internet. The most important factor in a neural network is the algorithm.  Everything else, such as graphs, user-friendliness, etc. is of secondary  significance (once the theory has been mastered.) Shop around, don't use thefirstthing you think looks appropriate and I wish you the very best of luck.  ix  ACKNOWLEDGMENT  Thanks are due to the following people:  1. To the FLAIR programmers, (Bruce Atherton, Keith MacCrimmon and John McClean,) for patiently answering an endless stream of questions, (more often, elementary; less often, interesting) regarding neural networks, the dBASE™ programming language and whiplash injuries in British Columbia.  2. To Professors J.C.Smith and Marilyn MacCrimmon for their time and input.  3. To my parents for their support, financial and otherwise in this venture.  4. To Kelly, who while providing constant distractions in her quests for caffeine also showed immense patience in attempting to understand this obscure technology and in listening to the daily ups and downs in writing this thesis (of which there were many.)  x  INTRODUCTION This thesis is about jurimetrics or the scientific investigation of legal problems. In 1  particular, it addresses the potential application of Connectionism or Neural Computing (also referred to as "brain- style" computing to a specific type of legal problem.) The task set is to consider what role, if any, neural computing might play in a program that can mimic the legal reasoning processes of a lawyer and in particular to what extent neural computing can mimic a legal actor, whether adjuster, lawyer or judge, in determining the likely level of damages or quantum for personal injury cases involving whiplash in British Columbia, Canada. This domain of legal knowledge was chosen for both practical and theoretical reasons; practically speaking, the main reason was that FLAIR has developed 2  and maintains a database of approximately 1700 cases involving whiplash injuries (the "Whiplash Database."). Thus there was a readily available source of (largely) numerical data with which to build a system. Theoretically speaking, the assessment of quantum is a legal reasoning process that occurs throughout the development of a case;  a lawyer  constantly assesses what an injury might be worth in court and also assesses how a given piece of evidence affects previous assumptions and estimates. Attempts have also been made by members of FLAIR to develop an expert system and a conventional computer program (in the C programming language,) to predict quantum using the same data. Thus  A term coined by Lee Loevinger in his article Jurimetrics - The Next Step Forward, Minnesota Law Review. 33, 1948, 455-493 at 483. Faculty of Law Artificial Intelligence Research Project, University of British Columbia. 1  2  1  useful comparisons could be made between the various models to see which was the more accurate and why.  The impetus for this thesis comes from the fact that despite much rhetoric to the contrary, expert systems have failed to make the impact on the legal profession that many researchers believed possible. Richard Susskind, who is arguably the best qualified researcher on this area, commented in 1986 "there has yet been developed a fully operational expert system in law that is of utility to the legal profession. " At the time of 3  writing, expert systems have made little further impact on the profession.  It is arguable and indeed argued by many , that the revived interest in Connectionism is an 4  indication of a new emergent paradigm within (or parallel to) traditional Artificial Intelligence, an example of scientific revolution made known to us by Kuhn . It is also 5  arguable that when historians begin to compose the history of A l from the 1960's to the 1990's many will apply a subtitle entitled "The delusion of the successful first step in research. " 6  Supporters of Connectionism come in many shapes and forms. Equally  robust responses have followed from Connectionist skeptics, notable from Fodor and  Susskind, R.E., Expert Systems in Law: A Jurisprudential Approach to Artificial Intelligence and Legal Reasoning, Modern Law Review. 49, 1987, p. 168. See Schneider, W., Connectionism: Is it a paradigm shiftfor psychology? Behavior Research Methods. Instruments and Computers. 19, 73-83; See also Fodor, J.A., Why there STILL has to be a language of thought, Smolensky, P., Connectionism and the Foundations ofAI, Wilks, Y., Some comments on Smolensky and Fodor, Churchland,P.M., Representation and high-speed computation in neural networks, all in Partridge, D, & Wilks, Y., The Foundations of artificial intelligence - A sourcebook, Cambridge, Cambridge University Press, 1990. Kuhn, T.S., The Structure of Scientific Revolutions, Chicago, University of Chicago Press, 1970, 2nd edition, taken from the International Encyclopedia of Unified Sciences. Vol. 2, Number 2. Some authors have done just that. For instance see Graubard, S., (ed.) The Artificial Intelligence Debate: False starts, Real foundations, New York, Basic Books, 1988. 3  4  5  6  2  Pylyshyn . This thesis does not attempt to develop further the theoretical debates. They 7  are mentioned only in order to facilitate an broader understanding of the issues surrounding this area. The emphasis of this thesis is the practical; the time is right to assess the value of pure Connectionism using real-world data in the legal domain.  The first task of this thesis is to adequately describe the development, ancestry, functioning of and philosophical basis for neural network architectures and techniques. This is provided in Chapter 1. Chapter 2 details the various jurisprudential theories that may assist our analysis. In order to build valid models of law and legal operations using expert systems, one must have regard for legal theory. With Connectionism, one does not have to subscribe to a particular view of how the law functions, except to the extent that one must view the legal reasoning process as example-driven.  Given that neural computing is a relatively new phenomena, having re-arisen phoenix-like from the ashes of computational theory in the mid 1980's, there has been a surprisingly significant amount of material written on the application of neural networks in law and this will be reviewed and explored in Chapter 3. Finally, Chapter 4 describes the empirical basis of this thesis, describing the experiments run, the domain of whiplash injury quantum analysis, the software and the results obtained.  Fodor, J., and Pylyshyn, Z.W., Connectionism and cognitive architecture: A critical analysis, Cognition. 28, 3-71, 1988.  7  3  CHAPTER 1 AN INTRODUCTION TO NEURAL NETWORKS 1.1 F U N D A M E N T A L D E F I N I T I O N S Given that this thesis is primarily about artificial neural networks , it seems unnecessary 8  to provide a comprehensive history of the Artificial Intelligence discipline, in which neural networks ( at least to date) play only a minor supporting role. The past of this divided discipline is adequately catalogued elsewhere . Many commentators describe A l research 9  as divided into two competing paradigms - the symbolic and the sub-symbolic (or connectionist.) If one were to describe the history of A l to date in the form of a short fictional story, one could be forgiven for casting Connectionism (and indeed, previous emerging technologies) as a younger child within the A l family who brought great excitement at its birth, but was spurned at an early age, cast out into the wilderness during its adolescence only to returning hero-like, when the older and seemingly more successful and mature sibling failed to live up to expectations.  Whilst steering away from such melodramatic parodies, we cannot ignore the fact that the so-called symbolicist approach has gradually re-trenched, from bold schemes such as the "General Problem Solver"  10  and bold assertions  11  towards more humble goals . The 12  This thesis refers to artificial neural networks, ANN's and neural networks. These terms are used interchangeably throughout. See for instance, Dreyfus, H.L., - What Computers Can 'tDo, New York, Harper & Row, 1972, also Dreyfus, H.L., What Computers Still Can 'tDo, Cambridge, MA., MIT Press, 1992. The "General Problem Solver" was the impressive title given to a landmark computer project led by Newell and Simon, the culmination of their Cognitive Simulation (CS) approach. By 1957, computers, 8  9  1 0  4  irony of symbolic A l has been that the so-called "difficult" problems (i.e. those involving a high degree of specialized knowledge) have proved easy and visa versa. For example, two of the most famous expert systems are MYCIN and DENDRAL. MYCIN is an expert system built in 1972 which was used to diagnose and prescribe treatments for bacterial 13  infections of the blood. DENDRAL, built at Stanford University in 1964 was used to 14  determine the structure of organic molecules. We might conclude from these examples that expert systems are suited to the higher cognitive functions that humans possess and therefore possess high levels of intelligence. However, expert systems have failed to cope adequately with human functions that require native intelligence or what we call "common sense."  This includes abilities such as pattern recognition and the instantaneous  assessment of appropriate courses of action. Good examples of this type of skill include the ability to assess the speed of passing cars when crossing a road. There are no rules applied - we just know in the circumstances what would be the right thing to do - to walk or not to walk. Moreover, this type of ability reveals itself through an expert's articulation  acting as general symbol manipulators were seemingly providing the physical evidence of intelligence. Using the GPS, these researchers discovered a number of original proofs for theorems in Principia Mathematica. Such evidence seemed to suggest the correctness of the dominant reductionist worldview, i.e. the widely held belief that intelligence could be understood in terms of symbol manipulation "Within ten years a digital computer will be the world chess champion"; "within ten years a computer will discover and prove an important new mathematical theorem"; "within ten years, most theories in psychology will take the form of computer programs or of qualitative statements about the characteristics of computer programs." Simon, H.A., & Newell, A., Heuristic problem Solving: The Next Advance in Operations Research, Operations Research. Vol. 6, 1958. 1 1  For example, the words of Susskind, a leading advocate of expert systems in law; "Legal expert systems are computer programs that have been developed with the aid of human legal experts in particular and highly specialized domains of law...It must be emphasized that such an outline today amounts to no more than the research aspirations of workers in the field of expert systems in law..." Susskind, R.E., Expert Systems in Law: Theory and Practice, Cambridge Lectures, 1987. See Harmon, P., and King, D., Expert Systems, Artificial Intelligence in Business, New York, John Wiley and Sons, 1985, p.5. See also Waterman, D. A., A Guide to Expert Systems, Menlo Park, CA., Addison-Wesley Publishing Co., 1986. 1 2  1 3  1 4  5  of an area of expertise as well as in the mundane. Experts often find it easier to describe a domain by reference to examples rather than by reference to hard rules.  It would be wise to for us to avoid being drawn into the lengthy narrative that is the history of Al, or being drawn into equally lengthy and inconclusive debates regarding whether there are two competing paradigms within the discipline. The reality seems to be that Connectionism adopts certain aspects of the symbolic approach and visa versa. For instance, in the domain of law, one might construct the argument that formalizing an area of law into rules involves the recognition of patterns and that Connectionism is the repetitive application of a rule, albeit in algorithmic form. Obsessive theorizing in A l tends to obscure purpose. However, Connectionism may also be seen as an element of complexity theory (considered further below,) a scientific hypothesis relating to non-linear phenomena. Debates regarding competing paradigms are likely to continue for some time. It is more important in a thesis relating neural networks to the domain of law merely to mark the major developments in the movement known as Connectionism so that the reader is given a sense of historical perspective. Therefore, the second part of this chapter chronicles the development of neural networks from the 1940's to the present day and also describes some of the applications for which neural networks are being used.  The primary task of this chapter is to define the fundamental concepts and methodologies in building neural network software. Such analysis will be assisted by briefly describing the two computing techniques of deduction and induction to supplement our description  6  of the two apparent paradigms within Al.  Therefore, the first part of this chapter is  designed to describe the context of terms within which Connectionism is situated.  1.2 D E D U C T I O N - v - I N D U C T I O N The power of deductive thinking has been understood and emphasized as a model of rationality by metaphysical thinkers since Parmenides and Plato . 15  Deduction is the  process by which a person or computer reasons in steps towards a conclusion based upon given premises.  Deduction requires the sequential provision of answers. Thus when  accountants fill out one's tax returns, they reason deductively. For instance, if one were asked for one's marital status on a form relating to one's personal income, the answer "single" is likely to lead to a different and separate set of calculations from the answer "married." Many statutes in law are structured in a manner that encourages deductive thinking. Hence, one can see the immediate appeal of the legal domain to computer programmers interested in modeling real world phenomena and the need for the skills of a logician in legal analysis. The history of Artificial Intelligence is dominated by attempts to use largely deductive and symbolic processing techniques to model and solve problems of the real world.  Induction is the process by which a computer or a person processes data at once and draws a conclusion. The recognition of patterns, symbols and numbers are all examples of induction - a process that appears to occurs instantaneously.  Pattern recognition  For a good introduction to the differences between metaphysical and scientific thinking, see Sayegh, N.S., Philosophy and Philosophers, Ottawa, Academy Books, 1988. 1 5  7  techniques in particular have eluded A l researchers for decades - "Pattern-matching has been the Achilles heel of computer scientists for several decades," writes David Hertz . 16  Neural networks, such as those built in conjunction with this thesis, use induction to draw their conclusions and in certain circumstances , appear to excel at such tasks. 17  1.3 C O N V E N T I O N A L C O M P U T I N G T E C H N I Q U E S Conventional computers use symbolic programming; programs work step-by-step through a problem. The right data must be found at the right time or the program will cease to function, a so-called "catastrophic degradation." A computer program written in a linear form is not able to cease the operation in progress and re-compute at short notice. It must run its commands to their conclusion. Whilst highly accomplished at performing mindboggling numerical calculations or re-arranging immense amounts of data quickly and without error, a conventional computer program cannot easily deal with the "fuzzy" data that surrounds us in the real world, governed and defined by the vague boundaries of natural language, which do not easily fit into precise mathematical sets.  1.4 A R T I F I C I A L I N T E L L I G E N C E (Al) A l is "the science of making machines do things that would require intelligence if done by man. " Such a simply stated goal has proved to be very costly in terms of time and 18  1 6 1 7  1 8  Hertz, D., Thinking Computers, The Miami Herald, 5 February, 1989. See Section 1-20.  Minsky, M . , Semantic Information Processing, Cambridge, MA., MTT Press , 1968.  8  money . Researchers seek to model human expertise and, in providing the user with 19  computer-generated conclusions, offer useful explanations.  A l evolved as a concept  during the 1950's when it was considered that computers would be able to perform functions to emulate the workings of the human brain in an "intelligent" manner. The word intelligence is derived from the Latin "legere" meaning to collect, gather or assemble. The word intelligence has come to mean to understand, to know or perceive , 20  hence a computer program that encompasses these functions could be described as artificially intelligent. However, would such a machine actually be intelligent? Would it have any conscious appreciation of what it is doing?  Instead of developing and analyzing the somewhat sterile debate of whether or not Connectionism represents a Kuhnian-style scientific revolution, we will accept at face value that A l consists of two significant paradigms - the symbolic and sub-symbolic. The symbolic view states that intelligence is derived from the explicit manipulation of symbols, usually through a sequential ordering. The sub-symbolic states that intelligence is "an emergent property of the inter-connection of a very large number of very simple processing elements, all operating in parallel and communicating only sub-symbolic information. " 21  Bart Kosko, a strong advocate of fuzzy thinking states "Something like $100 billion worldwide has gone into Al... and you can't point to a single A l product in the home, office, automobile or anywhere" (quoted in the forthcoming book "Fuzzy Logic" by Daniel McNeill and Paul Freiberger.) This may be overstating the position somewhat but it is undeniable that A l is perceived by the business community as a very expensive flop. Its full definition is "The ability to perceive logical relationships and use one's knowledge to solve problems and respond appropriately to novel situations." The New Lexicon Webster's Dictionary of the English Language, Lexicon Publications, Inc., New York, 1988. Rumelhart, D.E., & McClelland, J.L., Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Cambridge, MA., 1986, Vol. 1, p.4. 1 9  2 0  2 1  9  What is the difference between these paradigms in practice? Smolensky writes; "Nowhere is the contrast between symbolic and subsymbolic approaches to cognition more dramatic than in learning. Learning a new concept in the symbolic approach entails creating something like a new schema. Because schemata are such large and complex knowledge structures, developing automatic procedures for generating them in original and flexible ways is extremely difficult. In the subsymbolic account by contrast, a new schema comes into being gradually, as the strength of the atoms slowly shift in response to environmental observation, and new groups of coherent atoms slowly gain important influence in the processing. During learning, there never need be any decision that "now is the time to create and store a new schema. "" 22  The connectionists are those A l researchers who believe that in order to build truly intelligent machines, researchers must draw, at least in part, upon the complex adaptive model that we know works extremely well, namely the brain. However, strict adherence to any particular model of the brain is to be avoided. Pagels  23  makes a useful analogy  with the legend of Daedalus and Icarus. A compulsive copying of how birds fly brought disaster, although the adaptation of certain elements may bring success. For example, modern planes, like birds, have wings and tails. Thus, say the connectionists, we should be informed by, but not chained to the biological analogy. Indeed, the software used in conjunction with this thesis bears little relation to the functioning of the brain as it is presently understood. The computationalists are those A l researchers who believe that the analogy of the brain as a model for building intelligent machines is an unnecessary diversion.  Smolensky, P., in Rumelhart and McClelland, ibid., p.261. Pagels, H., Dreams ofReason, New York, Simon & Schuster, 1988, p.114.  10  1.5 E X P E R T S Y S T E M S There is no accepted definition of an expert system. For convenience, we will adopt the generalized and accepted view that an expert system is a computer program that provides the user with the expertise of an expert through that program . Although expert systems 24  are not the main focus of this thesis, they are "predecessors-in-title" to neural networks within the A l discipline. This thesis attempts to make useful comparisons between the two types of software with reference to the domain of law.  Thus some explanation of expert  systems will assist our discussion.  Expert systems seek to capture both the raw data/information on any given domain and combine this with the rule-based procedures for deductive reasoning that are used by those actors skilled in, and knowledgeable of an area of expertise, i.e. those who might be loosely defined as "experts."  Having combined data with the replicated reasoning  procedures gleaned from the expert, the ideal expert system provides advice to the user as would an expert. At the same time, it should exhibit its internal reasoning process - the user can see the "what" and the "why" concurrently. This system transparency is a vital attribute and in the discipline of law, is of particular importance; few, if any, legal decisions are made without reference to the legal principles that underlie the decision.  There are more long winded definitions such as that adopted by the British Computer Society Special Interest Group on Expert Systems: 2 4  " An expert system is regarded as the embodiment within a computer of a knowledge-based component from an expert skill in such a form that the system can offer intelligent advice or make an intelligent decision about a processing function...The style adopted to attain these characteristics is rule-based programming." 11  An expert system consists of a knowledge base, an inference engine and a user interface. It may also be linked to a database. The knowledge contained in the knowledge base is captured in the form of if-then rules. Knowledge may be in the form of either facts or heuristics , (or a combination thereof.) The user interface acts as the intermediary 25  between the user and the knowledge-base/inference engine in order to elicit answers to a sequence of pre-determined questions. The inference engine reasons logically using the rules in the knowledge-base and the facts discerned. Unlike conventional programs, the path taken by the program (and consequently the advice provided) is entirely dependent upon the facts provided by the user.  In the mid to late 1980's, there was much talk that this new domain of applied A l would give rise to a new profession, that of the "knowledge engineer." This ill-suited term was used to describe a person who obtains the knowledge from the expert and converts it into a form that the system can use, including inputting it into the system.  In reality, the  knowledge engineering profession has been notable only by its absence (as have the number of working expert systems, both in legal and non-legal fields.) Why is this? The creation of any expert system will involve finding solutions to three inter-related problems; knowledge acquisition, representation and utilization. The problems with each of these steps with reference to the domain of law will be analyzed in Chapter 2. At this point it  Heuristics are rules of thumb that are perceived by experts in the domain to contribute to the overall solution of a given problem. Such solutions are based purely on experience. One often quoted example of a heuristic is in the game of chess; as a rule, one should try to control the centre of the board in the game. However, this does not in itself guarantee victory.  2 5  12  will suffice to acknowledge that each of the three steps were and remain more complex than they might appear at first.  1.6 F U Z Z Y L O G I C At what point does a person cease to be young and start to be old?  The graphical  representation of "young" will be a curve gradually decreasing over time and "old" will be represented graphically as a curve gradually increasing over time.  Fuzzy logic is a  recognition that everyday language is filled with terms that refuse to be bounded by classical mathematical sets and that traditional binary logic cannot fully incorporate "realworld" factors. This enlightened approach originally developed by Lotfi Zadeh, applying mathematically what had been recognized some years earlier by the eminent philosopher and linguist C.S.Peirce , attempts to overcome the black and 26  white qualities of  traditional logic approaches in representing knowledge. It also attempts to address the issue of commonsense knowledge. For instance, classical logic is defeated by the Cretan riddle, in which a Cretan asserts that all Cretans are liars. Both logical conclusions derived from this statement lead to contradiction. Classical logic cannot deal with such a paradox because the statement is both true and false. Fuzzy logic will say that the answer is 50% true and 50% false - i.e. on a scale of 0 to 1, truth value equals 0.5 and the falsity value  Pierce, C.S.(1839-1914.) There is a developing body of literature relating to Pierce and neural networks; for instance see the chapter on Semiotics and Connectionism in Aparicio, M . , & Levine, D., Neural Networks for Knowledge Representation, Hillsdale, NJ., Lawrence Erlbaum Associates, 1993. There is also an interesting body of literature that relates Pierce's work on Semiotics with the Critical Legal School; see for instance Kevelson, R., (edV Pierce and the Law: Issues in Pragmatism, Legal Realism and Semiotics, New York, Peter Lang, 1991. The relationships within this triad of theory makes an interesting theoretical under-pinning for a Connectionist theory of law, which is briefly addressed in Chapter 2. A fuller exposition is beyond the scope of this thesis. 2 6  13  also equals 0.5.  Similarly, one can fuzzify the age of a person from 26 to "old with a  fuzzy membership of 0.4" and "young with a membership of 0.4" on scales of 0 to 1.  The comp.ai.fuzzy Newsgroup on the Internet defines fuzzy logic as follows: "Fuzzy logic is an superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth - truth values between "completely true" and "completely false. " 27  Zadeh argues that rather than regarding fuzzy theory as a single theory we should regard the process of "fuzzification" as a methodology to generalize any specific theory from a crisp (discrete) to a continuous (fuzzy) form. principle. " 28  This is known as the "extension  Thus the fuzzy representation of the number 7 might be as follows,  (although this is only one of numerous possible representations;)  FUZZY V A L U E 0 0.5 1 0.5 0  NUMBER 5 6 7 8 9  1.7 F U Z Z I N E S S A N D L E G A L L A N G U A G E The language of the law is filled not only with binary (true-false) and converse predicates (grandparent-grandchild), but also gradable and contradictory predicates (hot-cold and  2 7  Kantrowitz, M., Horstkolte, E., and Joslyn, C , Answers to Frequently Asked Questions about Fuzzy  Logic and Fuzzy Expert Systems, comp.ai.fuzzy, July ,1994, ftp.cs.cmu.edu, usr/ai/pubs/faqs/fuzzy/ fuzzy .faq. See Zadeh, L., Fuzzy Sets in Information and Control. 8, 1965, pp.338-353. 2 8  14  alive-dead (i.e. impossible for both to be true.)) Reed writes of this problem, "the main type of concept that a purely rule-based system cannot adequately represent is the "judgmental question." Judgmental questions are those where the so-called rule cannot work on binary data ("yes" or "no", "present" or "absent") but requires an assessment of one party's behaviour in relation to the norm. Questions such as "Did the Defendant take reasonable care?" are not really yes/no questions. " Zadeh writes of fuzzy logic "it is 29  widely agreed that an important problem in A l is the representation of commonsense knowledge. In general this knowledge may be regarded as a collection of such propositions as "snow is white," "icy roads are slippery," "most Frenchmen are not very tall," "Virginia is very intelligent"... " The reality of the real world is that people use 30  fuzzy language - we say " it is cold out today."  This may be used as a relative or an  absolute term i.e. it may mean that the temperature ranges from perhaps, minus to plus ten degrees Celsius. It may also depend on the context - a temperature of fifteen degrees Celsius on a July day where average temperatures are 23-28 degrees Celsius might also merit the above comment. Disorderliness of everyday language does not necessarily lead to incomprehension because language inevitably carries with it nuances and connotations created by past use and present context. Yet as Goodrich writes "The professional language and legal language  in particular, however,  strives to construct a unitary  language which stands above the conflicting usage and differently oriented accents of social dialogue. " If one describes the time of the day as dusk instead of night in casual 31  2 9  Reed, C , Computer Law, London, Blackstone, 1990.  3 0  Zadeh, L., Commonsense Knowledge Representation Based on Fuzzy Logic, Computer. Vol.16, No. 10,  1983. Goodrich, P., Reading the Law, Oxford, Basil Blackwell, 1986, p.188.  3 1  15  conversation then no legal consequences arise. This is in contrast with the legal definition of night which is the time for users of the road to switch on their headlights. Failure to do so leads to possible legal consequences. A dispute may arise between the law enforcement agency and the "offender" as to whether the time of day might properly be considered to be "night-time" when the offender was stopped. Judicial interpretation may be swayed by a wide variety of factors. Thus for jurists, the borders of meaning are more important than the centre - "The word starts out to free thought and ends by enslaving it," writes Levi, paraphrasing Cardozo . This "problem", which Hart describes as the "core" of certainty 32  and the "penumbra" of doubt is considered further in Chapter 2. Popper, on the other 33  hand warns against the seeking of precision. He writes with reference to the domain of the philosophy Of science: "..both precision and certainty are false ideals. They are impossible to attain, and therefore dangerously misleading if they are uncritically accepted as guides. The quest for precision is analogous to the quest for certainty, and both should be abandoned...it is always undesirable to make an effort to increase precision for its own sake - especially linguistic precision - since this usually leads to lack of clarity, and to a waste of time and effort on preliminaries which often turn out to be useless...one should never try to be more precise than the problem situation demands. " 34  Unfortunately legal problem situations can be very demanding. Zadeh has been criticized by some traditionalists as using mere probability under a smoke-screen of logic  35  and by  Levi, E., An Introduction to Legal Reasoning, Chicago, University of Chicago, 1949, p.8, paraphrasing Judge Cardozo in Berkey v. Third Ave Ry. Co.,, 244 N.Y. 84,94,155 N.E. 58,61 (1926). Hart, H.L.A., Positivism and The Separation of Law and Morals, 71 Harvard Law Review. 1958, pp.593-629, p.607. 3 2  3 3  3 4  Popper, K., Autobiography of Karl Popper, in Schilpp, P. A., The Philosophy of Karl Popper, La  Salle, IL., Open Court, 1974, p. 1-181. "Fuzziness is probability in disguise. I can design a controller with probability that could do the same thing that you could do with fuzzy logic." Professor Myron Tribus, quoted in Kosko, B., Fuzzy Thinking, New York, Hyperion, 1993, p.3. 3 5  16  others for developing a system that is not logic at all . Traditional logicians insist that 36  there are discreet values or variables which can be quantified. However, fuzzy logic should be likened to vagueness rather than probability. It claims that the language of probability lacks the necessary expressions to deal with the real world. Therefore values can be non-quantifiables i.e. variables such as words, as well as numbers. Thus, say fuzzy logic theorists, we can add together fuzzy variables such as "young" and "a few years older" to calibrate "middle-aged." Not surprisingly, fuzzy logic offers much appeal to the "real-world" modeller in overcoming the rigidities of classical mathematics. Why are fuzzy systems important to neural network modellers? Lawrence sums up its importance succinctly - "Neural networks and fuzzy systems are similar in that they both process inexact information inexactly. " 37  1.8 T H E S C I E N C E S O F  COMPLEXITY  Complexity theory is an emerging scientific movement, based on what scientists now see revealed in real-world phenomena. Examples of complex systems include the body, its internal organs, the weather, the market economy and population development. Indeed any phenomenon that has a non-linear aspect to it, might be described as complex. Given the emergent qualities of Complexity theory, its boundaries of application are not yet fixed. Pagels describes some of its key themes as;  "Fuzzy theory is wrong, wrong and pernicious. What we need is more logic not less. The danger of fuzzy thinking is that it will encourage the sort of imprecise thinking that has brought us so much trouble. Fuzzy thinking is the cocaine of science." Professor William Kahan, quoted in Kosko, B., Fuzzy Thinking, New York, Hyperion, 1993, p.3. Lawrence, J., An Introduction to Neural Networks, California Scientific Software Press, Nevada City, CA., 1992, p. 11. 3 6  3 7  17  "the importance of biological organizing principles, the computational view of mathematics and physical processes, the emphasis on parallel networks, the importance of non-linear dynamics and selective systems, the new understanding of chaos, experimental mathematics, the connectionist's ideas, neural networks and parallel distributive processes. " 38  It is arguable that fuzzy logic is a notable omission from the above list. It is also arguable that certain domains of law exhibit characteristics of complexity . Using John Holland's 39  definition of complex adaptive systems, namely that they "form and use internal models to anticipate the future, basing current actions on expected outcomes " we can see that both 40  neural networks and the law qualify under such a heading. For instance, the entire body of Equity appears to act upon the body of the Common law in a non-linear fashion; contradictory cases co-exist side by side whilst new cases swing the dominant view from one perspective to another before eventual abandonment of a given legal understanding and the adoption of an entirely new paradigm to describe our understanding of a given legal concept.  However, this thesis is concerned merely with the connectionist's ideas and neural networks. Although Complexity theory is currently fashionable within the scientific community, we should be careful not to dismiss non-complex phenomena out of hand. It is likely that when we have understood the sciences of complexity a little further that we will look at the non-complex with a new perspective. Hence it is important not to lose sight of one's ancestry both within science and within the discipline of Al.  3 8  3 9  Pagels, H., ibid., p. 13.  See Braithwaite, M., Prolegomena to a Post-Modernist Theory ofLaw, LL.M. thesis, University of  British Columbia, October 1993, who develops a theory of law as a Complex Adaptive System. Holland, J., Complex Adaptive Systems, Daedalus. Vol.121, No.1, Winter 1992, pp.17-30 at 23. 4 0  18  1.9 N E U R A L N E T W O R K S An artificial neural network is a combination of artificial neurons, the connections between them and a learning rule  41  or algorithm.  Like people, neural networks learn from  experience, not from programming. They do not require the intervention of experts to describe the nature of their expertise in the form of an computer program/algorithm. Traditional non-parallel programs, whether of conventional or expert system variety, require the program builder to describe his/her domain. Ideally, neural networks require no description of the domain from the user; their only requirement should be large quantities of numerical data from which the neural network will induce the connections between functions.  Neural networks are good at pattern recognition, generalization and trend prediction. They require no rules or formulas with which to work and are tolerant of imperfect data. They are computational devices which, through the combined computing power of a very large number of artificial "neurons" (which deal with only very small chunks of information and carry out only very small computations by communicating along links with other neurons) can deal with large complex real-world problems. These so-called Connectionist machines make use of a parallel architecture of computing rather than traditional linear programmable code. However, and this point should be stressed, neural computing should be considered complimentary rather than in competition with  Lawrence, J., ibid., p.79.  19  conventional  computing  techniques.  Discussion  of  how  connectionist  and  computationalist architectures may be used in tandem is developed later in the thesis.  1.10  THE ARTIFICIAL NEURON  The most basic processing unit of a neural network is called a neurode or neuron which are combined into a "network" or structure.  The structure of an artificial neuron is  described in Figure 1.1.  Figure 1.1 An Artificial Neuron (x = name of connection, Wx = its associated weight) INPUTS  OUTPUT  At each neuron, every input has an associated weight, which modifies the strength of the input(s) connected to the neuron. The single neuron simply adds together the sum of all the inputs and calculates the output, through vector multiplication and addition. This is done by applying what is known as the "activation function" or "transfer function" to the  20  sum of the input values. The most popular "activation function" used by neural network modellers is called the Sigmoid activation function (see Figure 1.2.) 42  This function is  also known as the semi-linear or S-shaped function. Using the Sigmoid function, outputs vary continuously (but not in a linear manner) as inputs change. The most successful type  Figure 1.2 - The Sigmoid Activation Function  o  of function can change according to the purpose of the network. According to authority on this issue if the problem involves learning about average behaviour, Sigmoid transfer 43  functions work best. If the problem involves learning about deviations from the average, for example in areas such as bankruptcy prediction and stock picking, the Hyperbolic There are a variety of other functions such as the Linear Transfer Function, the Linear Threshold Function, the Step Transfer Function, and Guassian Transfer Function. Each of these have different qualities and are represented by a different curve. An examination of each is beyond the scope of this thesis. By default the Brainmaker software used in this thesis uses a Sigmoid Activation/Transfer Function. Klimasauskas, G , Applying Neural Networks. PC Al. May-June 1991, p.20- 24 4 2  4 3  21  Tangent function tends to work the best. (Earliest neural networks tended to use the socalled Step Transfer Function where the output was limited to two values (1 and 0.) It acted in a manner similar to a digital logic chip.) We are concerned with determining average behaviour, therefore the only activation function used throughout the experiments detailed in this thesis is the Sigmoid activation function. Under this function, output values are restricted to a value between zero and one. If the sum of the input values gives a negative value, the output value will equal zero i.e. the neuron will not fire and no learning will occur in this area. A large positive input will yield the maximum output of one. Between these, the output rises in a S-shaped curve to a maximum gain of 1. Upon activation, each neuron releases an electronic impulse as output which is fed forward to the next layer. The "neurons", which are the elementary processing units, are arranged in a particular manner. The most common network architecture is that of three layers, in which the neurons of each layer are connected to all of the neurons of the next layer (as shown in Figure 1.3.)  22  Figure 1.3 - Common Neural Network Architecture (showing 5 input, 5 hidden and 3 output neurons.)  This is only one of many possible topologies although, again according to authority on this area, it appears to be the most effective. Additional hidden layers may be added to the network although it appear that this is no guarantee of success if simpler three-layer networks failed. I have not questioned this approach and all experiments conducted in conjunction with this thesis have used the Sigmoid transfer function in a three layer network.  How does the neural network learn? Data is fed to the network at the input layer of neurons. (Each neuron at the input layer represents a real-world factor.) The input layer  23  neuron takes the numerical input through a connecting link and produces an output which in turn becomes the input for the hidden layer (or first hidden layer if there are more than one.) The middle layer is termed "hidden" because it is internal to the computer. The user sees only the input and the output. Describing what exactly happens in the hidden layer, beyond a form of vector matrix multiplication is one of the problems that neural network developers are still grappling with.  Neural networks are primarily pattern recognizers. They learn through repetitive exposure to relevant examples by developing associations between input and output data. These associations are stored in the form of weights on connections between neurons. Thus, when the network "sees" something that appears to be similar to a phenomenon it has come across before, it responds with similar output data. To describe fully how a neural network works, we will use a very simple example of a neural network that can recognize dogs based on colour, size and liveliness. The inputs and outputs for the network are described in Figure 1.4  Figure 1.4 - Inputs and Outputs of a very simple neural network REQUIRED OUTPUTS  INPUTS CHARACTERISTICS LARGE SMALL BLACK BROWN/TAN YELLOW/GOLDEN LIVELY SLOW  TYPES OF DOGS  PEKINESE LABRADOR ALSATIAN NEWFOUNDLAND WELSH TERRIER ST. BERNARD  24  An association that we might hold is that, of all the above, a small, lively, brown/tan dog is likely to be a Welsh terrier. Thus, the Welsh terrier could be represented in a database as follows; LARGE 0  SMALL 1  BLACK 0  INPUTS BROWN GOLD 1 0  LIVELY | SLOW 1 0  OUTPUT WELSH LA 1 0  Each of the other animals will be similarly represented in the database. For example, a Newfoundland might be characterized as large, black and slow. A medium sized dog might be represented as 0.5 in each of the "large" and "small" fields, indicating that the predicates "large" and "small" are only 50% true. One can immediately see the application of fuzzy logic theory here. We might wish to "fuzzify" the above data further so that the Welsh terrier is represented by a value of 0.2 in the "Large" database field and 0.8 in the "Small" database field and do likewise for all other inputs, implying that the dog is not small to an absolute degree.  1.11  TRAINING A NEURAL  NETWORK  Before the process of back-propogation is described, it should be mentioned that neural networks can be classified into two distinct categories; feed-forward and feedback. The difference is this; if signals/impulses go only one way and output is dependent only on signals from incoming neurons (i.e. there is no looping back,) then the network may be said to be feed-forward. In a feedback neural network, the output signals of neurons directly feed back into neurons in the same or preceding layer. That is to say, they become  25  inputs. It is important for us to recognize at this point that back-propogation is not a feedback "looping" network.  1.11.1 B A C K - P R O P O G A T I O N Having decided what it is that we wish the neural network to do and collected sufficient amounts of data ( in a network-readable form) to allow the network to train, the next step is to train the network. Invented in 1974 , the most popular, successful and therefore 44  most common form of neural network training is called "back-propogation," (an abbreviation of "back-propogation of error") a form of supervised-learning algorithm which is a generalized form of the so-called Delta-Rule.  1.11.2 T H E D E L T A R U L E This is the simplest of all learning rules that states: •  Feed the input pattern into the network;  •  For every output, determine the error by subtracting the actual output from the desired output.  •  Change the weights of the connections proportionately to the product of the incoming signal;  •  Repeat until a sufficient level of accuracy has been achieved.  by Paul. J. Werbos, who at the time was a doctoral candidate at Harvard University. It was re-discovered in 1985 or thereabouts by David Rumelhart and David Parker independently of one another. 4 4  26  Thus each neuron processes the piece of data that it receives by finding the sum of all inputs multiplied by the weights matrix (using vector matrix mathematics of multiplication and addition.) The resultant values percolate through the system to the output layer. The actual output is compared with the expected output for a given set of training facts and that error is "propogated" back through the network. In the early stages of training, the difference/discrepancy between expected and actual outputs (the error value) will be very large as the weights on connections between neuron are initially set to random values. The error signal is fed back through the network, altering connection weights as it goes and (hopefully) preventing the same error from repeating itself. On the basis of the error value , the network retrains itself on the same set of examples by adjusting the 45  connection weights automatically until gradually the number of "good" results (those results which match or are sufficiently close to the expected output) outweigh the "bad." Correctness of output is judged not on an absolute criteria but as the "best set of conclusions" subject to a standard deviation. The process of weighting adjustment can take a matter of minutes, hours or days depending on the amount of training information, the size of the network, the processing speed of the computer and the required accuracy of the trained network. Once this process of weighting adjustments has occurred a sufficient number of times so that the network is providing intelligible and reasonable outputs, the network is then said to have "learnt."  We should note however, that the back-  propogation rule can only change the weights of connections - it cannot change the network topology in any way. Alternative network topologies may alter the learning paradigm significantly.  4 5  See Appendix F, Glossary of Neural Network Terms.  27  If we go back to our earlier example and imagine that the network is now training, for the facts "small", ""brown" and "lively", the network produces an output of 0.5 for "Welsh," and 0.5 for "Pekinese" and an output of 0 for all other types of dog. This is telling us that the network is as yet undecided as to which type of dog it is being asked to identify although it has managed to eliminate all but two. The network will automatically make adjustments to its internal weightings of the connections between neurons. It will then move onto the next fact, (in this case, the St.Bernard) and present the facts for that type of dog. If the guess made by the network is incorrect it will make adjustments and continue to the next fact. Once the network has been through all the facts, it will return to the top of the list and re-present the facts once again. It will continue to re-present all the facts until it gets the "right" answer for all inputs. In this application, an output of 0.9 for the type of dog might be acceptable if all other outputs are approximately 0.1. (The tolerance of error is changeable according to the type of application.) Thus a neural network might not actually come to a final conclusion, rather it settles into a pattern that appears to satisfy all given inputs.  1.11.3 S U P E R V I S E D - v - U N S U P E R V I S E D L E A R N I N G It was mentioned above that back-propogation is a type of supervised learning. In supervised learning there is a human "trainer" (i.e. in this case, the author) who presents data in the form of pairs of inputs and outputs, either inputting the data manually or  28  presenting the data in the form of a database. Thus we know what the results (outputs) should be for the inputs.  With a single layer network, the trainer can actually monitor specific corrections to the weights. However, this is not possible with a multi-layer network. For instance, if we think the network is mistaken in developing significant relationships between a given input and output, it is difficult to correct the weighting of the connections in the hidden layers as we cannot see what is happening. Back-propogation with a single hidden layer of neurons is the primary form of learning used by the software in conjunction with this thesis . 46  In the unsupervised learning model, there is no trainer. The network simply receives inputs. The network is required to come up with its own classification of those inputs and reflects them in the outputs. Such approaches are rarely used in real-world applications because if we cannot provide outputs for the inputs this means that we do not understand the domain. At present it seems that unsupervised learning neural network models are confined to the research laboratory.  1.12 W H Y  NOT USE EXPERT  SYSTEMS?  Those scholars familiar with expert system applications might argue that a dog recognition system could as easily be implemented by inputting if-then rules into an expert system  However, there are a wide variety of other such supervised-learning models, Among other FeedForward Models, there are the Perceptron, Adaline, Back-Percolation, and Adaptive Logic Networks to name but a few. Feedback models include such as Fuzzy Cognitive Map, Boltzmann Machine, Mean Field Annealing and the Learning Vector Quantization models. 4 6  29  shell. For instance, using the Welsh terrier example, we might input these factors in the following form; IF the dog is brown/tan AND it is small AND it is lively THEN it is a Welsh terrier. The user interface would ask the user; "Is the dog small? - Yes or No", "Is it brown/tan- Yes or No?" and so on, until the expert system deduced that the dog in question must be, of all the above possibilities, a Welsh terrier.  Admittedly one could use either technique in the above example. However, it is one of the primary arguments of the so-called Connectionists that neural network can excel in real world domains where the relationship between the inputs and output is either not apparent or not describable to another human being.  This argument will be tested.  Neural  networks can learn directly from facts or cases, rather than from an experts articulation of a problem and is therefore, one of their primary advantage over expert systems. Very often, an expert in a given domain may understand his/her area comprehensively. However, constraints of language prevent that intuitive power from being articulated to another unskilled in the domain. We consider this phenomenon further in the Conclusion.  1.13 F U R T H E R A D V A N T A G E S  OF NEURAL NETWORKS  Neural networks require no programming - it is not necessary for a programmer to analyze the problem in sufficient depth in order to write down a set of instructions that the computer can follow. A further attribute of neural networks is that given that they can learn directly from data; they remove the interpretive element that will accompany articulation in the form of rules. Clearly in interpretative disciplines such as law this is an  30  important factor.  The ability to learn directly from examples means that in legal  application, much less consideration can be given to the jurisprudential framework and implications of the program. This is an important feature for legal applications, for as we shall see in Chapter 2, many expert systems in law projects have faltered because of either an unquestioning adherence to Legal Positivism leading to forms of mechanical jurisprudence or to excessive jurisprudential "navel gazing", where debates regarding the nature of law tend to dominate the empirical aim of producing workable expert systems.  A neural network can adapt its internal configuration during training. They can (at least in theory) deal with incomplete and/or "fuzzy" data - in theory, they can continue to operate with only a partial picture. A traditional program would inform its programmer of a error in the program and cease to function until rectified, the so-called "catastrophic degradation." Failure of any part of a traditional program dooms it to failure. Neural networks adapt their output to accommodate inconsistency and suffer so-called "gradual degradation." Good examples of gradual degradation can be found in our own bodies. Kurswell writes "In animal vision the failure of any neuron to perform its task is irrelevant. Even substantial portions of the visual cortex could be defective with relatively little impact on the quality of the end result. " The network may still function although output 47  becomes less specific. Importantly for legal applications, neural networks also have the potential to generalize with data. Thus until proved otherwise it is fair to believe that the prospect of computerized legal prediction, a goal which is as yet unattained by other forms of Al, may be achievable using this connectionist architecture.  4 7  Kursweil, R., The Age ofIntelligent Machines, Cambridge, MA., MTT Press, 1990, p.231.  31  1.14 W H E R E N E U R A L N E T W O R K S F A L T E R  A neural network is only as good as the set of examples that are fed to it. It can, in theory, learn the relationships between the number of times a person turns over in their sleep and its seeming effect on interest rate movements if one is minded to present such data as inputs and outputs. It is not intelligent in the sense that it will tell the user that clearly there is no possible relationship. Also neural networks cannot deal easily with classical logic problems, although one can see a superficial similarity between the functioning of a rule-based system and a neural network. For instance, rules are linked together starting with premises which lead to conclusions e.g. If V and W then X; If X and Y then Z. Such premises could be represented in a neural network by having V, W and Y represented by first layer neurons. X would be a hidden layer neuron and Z would be a final layer neuron. V and W combined exert influence on X, and X and Y together exert influence on Z. However, ANN's are imprecise - if one trained a neural network in nonfuzzy mathematics and asked it for the sum of 3+3, the answer given would be roughly 6, (although if enough examples were presented to the network it might eventually provide an absolutely accurate answer.) Since many humans with their complex biological neural networks have problems with certain tasks where traditional programming excels, (such as balancing a check-book, performing logic functions and keeping fine details,) expecting an ANN to be able to do likewise would be unwise. Furthermore, ANN's are "black boxes" the knowledge that is contained within the program is contained in the strength of the weighted links between neurons and in their topology. None of this is visible to the user. It may be possible to view the numeric weighting data that the network assigns to  32  particular neurons, but even in the smallest networks, this can be unintelligible. This is a serious shortcoming.  The  "programming" i.e. the continued internal weighting  adjustments performed by the network, takes place unseen. Thus, although the user may discover an answer to his or her problem, the reasons why that particular answer is given may not be discernible. It is immediately apparent, therefore that this may have limits on the value of using connectionist architecture in law. No judge hands down a judgment without copious reference explaining his or her reasoning. If the output is one's only concern, then neural networks are likely to have utility, but even in an area such as quantum assessment for whiplash injuries, where the client is often heard to cry in exasperated tones "but what is it worth?" the next question is usually "why?" Therefore any belief that neural networks can be used to replace symbolic programming in its entirety should be dispelled at the earliest opportunity. This is particularly so in an area such as law, so rich in symbolism.  1.15 M I N D S A N D  MACHINES  As the name implies, neural networks are modeled on the organizational processes believed to occur in the human brain. Although the human carbon-based version is many times slower than its silicon logic-gate counterpart, even the largest artificial neural networks in modern computer laboratories have about the same number of neurons as those contained in the "brains" of the lower order species. A simple organism may have a few dozen neurons; an insect as simple as a fly has over one million and a human brain has  33  in the region of 100,000,000,000 . If we take this number as an estimate of the number 48  of neurons needed to produce a self-conscious entity, then it is clear that we are a long way from building artificial brains. Indeed, at the present rate of progress, it is estimated that humans will not be able to build an artificial brain analogous to that of a human being for 120 years . 49  1.16  THE BIOLOGICAL ANALOGY - THE NEURON  Biological and artificial neural networks contain neurons, either real or simulated respectively. Biological neurons consist of a cell body plus one axon and many dendrites. These can be seen in Figure 1.6. The axon is the protuberance that delivers the neurons output to connections with other neurons. Dendrites are the protuberances that provide surface area to allow contact with other axons, i.e. they maximize the inputs and minimize the outputs. The average biological neuron receives impulses from up to 10,000 other neurons. (In certain parts of the brain such as the cerebellum which is crucial for motor functions, the single neuron may receive impulses from as many as 100,000 axons.)  Lawrence, ibid., p. 164. Lawrence, ibid., p. 154.  34  Figure 1.5 A simplified biological neuron Dendrites  Axon  A given neuron appears to be dormant unless the collective influence of all the inputs reaches a threshold level. Whenever the threshold is reached, the neuron delivers a fullstrength charge in the form of a pulse down the axon and into the axon branches. This is known as "firing." Because a neuron either fires or remains dormant, it is said to be an "all-or-nothing" device. The artificial neurons in computer software can function as all or nothing devices as well as continuous output devices. Each neuron (whether carbon or silicon) receives signals from a great many other neurons which will either excite or inhibit the level of activation in the given neuron. The level of activation is the sum of the number of connections, their weighting/size, and the strength of the signals. Thus whilst slower than a parallel computer, the human brain makes up for this deficiency through the sheer scale of this parallel network.  1.17 N E U R A L N E T W O R K S - A B R I E F H I S T O R I C A L A N A L Y S I S Given the nature of this thesis, it is impossible to provide a comprehensive analysis of the history of neural networks from the 1940's onwards. Whilst attempting to avoid a lengthy  35  descriptive narrative on the development of neural computing, we should however, take note of a number of important dates in its development. It is accepted that the birth of A l as a discipline took place at the Dartmouth Conference of 1956. (Notable by his absence at that conference was the eminent computer scientist John von Neumann, an avid proponent of neuro-biologically inspired computer architecture .) The conclusion of the 50  Dartmouth Conference was that the neurobiological model were an unnecessary diversion, given the apparent rapid development using linear methodology. A good example of the attitude that prevailed at the time is the following quotation: "Whether logical reasoning is really the way the brain works is beside the point," McCarthy says. "This is A l and so we don't care if it's psychologically real. " 51  1.17.1 M c C U L L O C H A N D P I T T S The first proponents of the neurobiologically-inspired computer were two mathematicians, W.S. McCulloch and W. Pitts who made the first attempt to define an artificial neuron that could be used to provide a mathematical analysis of how networks behave.  They were  largely overshadowed by conventional machine computing, from the 1960's onwards.  "A Logical Calculus of the Ideas Immanent in Nervous Activity ," is a technically 52  complex paper, written as much for the neurophysiologist as for the computer scientist and in this sense, is fairly representative of much of their work. McCulloch and Pitts  See for instance, Neumann, J. von, The Computer and the Brain, New Haven, Yale University Press, 1958, pp.66-82.  5 0  5 1 5 2  Quoted in Israel, D.S., The role of logic in knowledge representation, Computer. Vol. 16, No. 10, 1983. McCulloch, W.S., and Pitts, W., A Logical Calculus of the Ideas Immanent in Nervous Activity, 5  Bulletin of Mathematical Biophysics. 1943, pp.127 -147.  36  describe a computational device which they believe is an adequate representation of how the smallest particle in the biological nervous system might actually work, now known as the "McCulloch-Pitts neuron." This was a binary device which could have only one of two possible states (on or off), possessing a fixed threshold, receiving inputs from excitatory synapses that have identical weights or inhibitory synapses which act as a bar to any action on the part of the neuron (i.e. if the inhibitory synapse is active the neuron cannot fire.) They concluded that the structure of the network does not change with time. Their primary argument with reference to what we now call the discipline of computer science was that, if this type of neuron could be artificially implemented in an information processing machine, it could perform simple logic calculations. It followed from this, that first, any logical expression could be realized using this neuron and thus a network of neurons could perform computations of very great complexity and secondly, that the mind could be understood in computational terms.  (Although the Pitts and McCullough duo are usually described as the forefathers of neural computation, they were not the first to suggest that the activation of one brain cell by another increases the likelihood of such activation in the future. This honour is bestowed on William James and his chapter of a text entitled "Psychology" written in 1890 .) 53  James, W., Psychology (Briefer Course), New York, Holt, Chapter XVI "Association", pp.253-279.  37  1.17.2 T H E H E B B I A N R U L E Donald Hebb's article "The Organization of Behavior" was the first description of a learning rule for neural network modellers. (It is also the first known use of the term "Connectionism.")  Again, written more for the neurophysiologist than the computer  scientist, Hebb described what he believed happens to the nerve synapses when used; "when an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency as one of the cellsfiringB, is increased. " This pattern 54  of re-enforcement has come to be known as the "Hebbian rule" of "synaptic modification." Computer scientists were quick to extend this theory. They argued that it follows from this rule that the strength of connections between neurons that excite or inhibit one another should be increased or decreased respectively, proportionate to activation values.  1.17.3 T H E P E R C E P T R O N Envisaged by Rosenblatt, this one layer feed-forward model of a neural network which utilized the Delta rule mentioned above, created much interest in the community of A l scholars. In this model inputsfiredirectly into outputs as seen infigure1.7. In his highly scientific paper describing the Perceptron, Rosenblatt's enthusiasm for the project is evident; "...the system described is sufficient for pattern recognition, associative learning, and such cognitive sets as are necessary for selective attention and selective recall....with  Hebb, D.O., The Organization ofBehavior, New York, Wiley, 1949, p.50.  38  proper reinforcement it will be capable of trial and error learning, and can learn to emit ordered sequences of responses... " 55  Figure 1.6 - The Perceptron  OUTPUT  INPUTS  However, Rosenblatt was not cognizant of the problems that the Perceptron presented. For the layperson, there is no better description of the problems Perceptrons develop in relation to the representation of the real-world than the summary of their properties by Opdorp and Walker . They use the example of an apartment without an elevator that is 56  considered suitable for tenants dependent upon their age and degree of disability. If we use a basic premise of the rule, perhaps developed by Welfare Agencies, that after the age  Rosenblatt, F., The Perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review 65: p.404. Van Opdorp, G.J., and Walker, R.F.,^4 Neural Network Approach to Open Texture, in Kasperson,H.W.K., & Oskamp, A., Amongst Friends in Computers and Law, Deventer, Kluwer Law and Taxation Publishers, 1990, pp.299-309. 5 5  5 6  39  of 60 or a disability in excess of 40%, the apartment is unsuitable for a tenant. We could represent these factors graphically as a straight line (see Figure 1.7 - marked "A").  Figure 1.7 Graphic Representation of Hypothetical Welfare Agency Rule.  If in addition we understand that the Welfare Agencies consider an apartment without an elevator unsuitable for a person of (for example) 35+ with a 10% disability then this rule could be presented graphically as a curve (marked "B" above) rather than a line. An increase in disability adversely affect age and visa versa. This curvature of the dividing line between suitable and unsuitable cannot be expressed by a perceptron, even though both age and disability are obviously inhibitory with respect to suitability thus "whenever  40  the contribution of one input neuron should depend on the input values of other receptors, the perceptron will not suffice. " In examples such as this, there must be either a loop 57  between input neurons or at least one hidden layer of neurons to calculate the influence of the two factors upon each other. Such architectures were not envisaged by Rosenblatt.  1.17.4  "THE D A R K AGES" -MINSKY AND  PAPERT  "In the popular history of neural networks, first came the classical period of the perceptron, when it seemed as if neural networks could do anything...Then came the onset of the dark ages, where, suddenly, research on neural networks was unloved, unwanted and most important, unfunded. A precipitating factor in this sharp decline was the publication of the book "Perceptrons" by Minsky and Papert ." 58  Such an analysis is quite dramatic and somewhat overstated. The role played by Minsky and Papert in casting neural networks into the wilderness of underfunded computer science research is not that of key villains. However in their book, they argued that the next logical step for neural networks, that of multi-layer networks of Perceptrons would exhibit the same problems as the single layer network as described above. Such arguments were enough to kill off alreadyflagginginterest and funding. In retrospect, it seems that their conclusions are wrong. Rather than dwelling on their excellent, but misconceived analysis of why neural networks in any shape or form would fail, we should merely mark this event as an important turning point in the development of neural networks. The result of this book was that rule-based or so-called symbolic A l was founded and flourished for nearly 30 years. Despite all of this, a "cottage industry" of researchers such as Kohonen,  5 7  5 8  Van Opdorp.G.J., and Walker,R.F., ibid., p.  Anderson, J.A., and Rosenfeld, E., Neurocomputing: Foundations ofResearch, Cambridge, MA., MTT  Press, 1988, Chapter 13 p.161.  41  Anderson and Hinton continued to work on connectionist models making notable improvements to the notion of the Perceptron. For instance, Anderson and Kohonen (independently of one another in 1972) were successful in developing an artificial neuron that could vary the frequency of its firing, rather than acting as a simple binary device . 59  Also, the notion of feedback (mentioned above) was developed by Hopfield in 1982.  1.18  THE REBIRTH OF  CONNECTIONISM  The First Annual Conference on Neural Networks (ICNN) held in San Diego in June, 1987 which attracted nearly 2000 participants may be seen as the neural network equivalent of the Dartmouth Conference of 1956.  Along with the work of Hopfield and  the emergence of several commercial neural networks, most of the credit for the new found impetus in the neural network research project must go to the PDP group.  1.19 P A R A L L E L D I S T R I B U T E D P R O C E S S I N G (PDP) One of the main reasons neural computing has re-appeared as a force within (or, depending on ones perspective, in competition with) A l is the contribution made to the field by a group of researchers, notably McClelland, Rumelhart and Hinton, known as the Parallel Distributed Processing group . They represent the most plausible of all current 60  connectionists' ideas and these ideas have been utilized by many manufacturers of neural network software. Brainmaker, the software used throughout this thesis (and most other  5 9  See Kohonen, T., Corrolation Matrix Memories. IEEE Transactions on Computers. C-21: pp.353-359  and Anderson, Z.K.,A simple neural network generating an interactive memory, Mathematical  Biosciences. 14: pp. 197-220.  42  commercially available neural network software) draws on the work performed by this group. Their fundamental view is that we humans are smarter than computers because we have developed a computational architecture more suited to the tasks that surround us (i.e. those that require common sense) and those that appear to require the simultaneous consideration and manipulation of many pieces of information and the drawing of conclusions by induction. A key element to their theory is that understanding is derived from the interplay of various sources of information. One of the examples that they use in this context is of a child's birthday party at a restaurant. They write "We know things about birthday parties, and we know things about restaurants, but we would not want to assume that we have explicit knowledge about the conjunction of the two." However, from our experiences we are able to conceive and develop a likely scenario in such instances. Thus, although traditional A l approaches can capture knowledge in the form of structures whether scripts, frames or otherwise, "any theory that tries to account for human knowledge using script-like knowledge structures will have to allow them to interact with each other to capture the generative capacity of human understanding in novel situations ." They are of the belief that PDP can satisfy this requirement. Using the 61  back-propogation model with a hidden layer, they are able to demonstrate that neural networks can develop many solutions to any given problem simultaneously, instead of the traditional  form  of  computation  where  one  answer  leads  to  the  See RumelharLD.E., and McClelland,J.L, ibid. This is now the standard reference work on Connectionism. RumelharLD.E., and McClellandJ.L., ibid., p.9.  6 0  6 1  43  next.  1.20 R E A L W O R L D A P P L I C A T I O N S O F N E U R A L N E T W O R K S So what can neural networks now do?  Neural network technology appears to be  capturing the imagination of academics and business people world-wide. Not surprisingly the first users of neural networks were electrical engineers in the large defense-related research universities of the United States.  For instance, the ALVINN system  ("Autonomous Land Vehicle In a Neural Net") developed by Carnegie-Mellon University learns to drive a vehicle along roads viewed through a television camera. It mustfirstbe trained on a particular road. Once this has been achieved, it can guide itself at speeds of up to 40 mph . Another research project that has caught the imagination of the outside 62  world is NetTalk, a back-propogation neural network that takes text and learns to turn it into speech. It reads a string of characters and turns them into phonemes. It then converts these into audible signals which are outputted via the computer's speaker.  The two  developers, Sejnowski and Rosenberg trained networks using 500 words from a Grade 1 reading level book, then later with 20,000 words from a dictionary. Using a three layer network but enormous computing power and many training hours, they were able to train a network that could produce phonemes output at a level of accuracy of 95%. Users of neural computing now include manufacturers , 63  marketers  65  investment companies , 64  direct  and even makers of burglar alarms . Governments have also been quick to 66  seize the initiative. For instance the UK Government's Department of Trade and Industry  6 2 6 3  6 4  Wiriston,P.H,4rhy/c/'a///jte///ge«ce,Reamng,MA., Addison-Wesley Publishing Co., 1992,p.458. See for instance, Kinoshita, J., Neural Nets at Work, Scientific American. Vol.259, No.5, pp. 134-135.  Swales, G.S., and Yoon, Applying Artificial Neural Networks to Investment Analysis, Financial  Analysts Journal. Sept - Oct., 1992, Vol.48, No.5,p.78. See for instance, Classe, A., Little Grey Cells, Accountancy. May 1993, v . l l l (n.1197), p.67. 6 5  6 6  Somers.J. et al., Catching Intruders in a Neural Net, Security Management Feb. 1994, v.38 (n2), p.68.  44  announced a 6 million GBP (CANS 12 million) scheme to promote the use of neural networks in response to a similar initiative by Japan's Ministry of International Trade and Industry. The Canadian Imperial Bank of Commerce (CD3C) recently announced that they will be using neural networks (and in particular the Professional version of Brainmaker, in order to provide "better forecasts of Canada's Gross Domestic Product and inflation than the usual econometric models ." 67  The increase in number of publications and products might suggest a fundamental paradigm shift within A l towards Connectionism. However, we have yet to conclude whether any of these products will make a more significant impact than traditional A l approaches upon business. At present we are riding the crest of a media-inspired wave of neural network "mania."  It is difficult to draw rational conclusions regarding their  potential in such an environment. The fundamental questions to be tested in this thesis are therefore; given the increasing use- of neural computing applications in business and finance, does Connectionism have any role to play in the artificial modelling of law and more particularly, in legal reasoning and case prediction, and if so, what role(s)? Before we can answer this question, we must explore further the domain of law and examine the problems that this discipline presents.  See unnamed article in Wired. September, 1994, p.39.  45  CHAPTER 2 ISSUES O F J U R I S P R U D E N C E  2.1  INTRODUCTION  This chapter seeks to define the parameters of a sub-discipline of Jurimetrics known as Artificial Intelligence (AT) and Law - the marriage of a much-maligned science and a much-considered pseudo-science. In doing so, it reflects upon the two major trains of legal theory - Positivism and Anti-Positivism within jurisprudence and seeks to set the application of artificial neural networks within this context. Scholars in this sub-discipline have been quick to note the bilateral nature  of the relationship between the parent  disciplines. First, they point out that A l applications such as expert systems and other such software, present an excellent opportunity for the jurisprudential scholar to test theory in terms of practical application. Secondly, they describe features of the legal domain that make law an excellent representative model of the real-world and thus ideal for computer modeling. This Chapter focuses on the former at the expense of the latter.  Before we become overly concerned with what may be described as the predicament of A l applications in law, we should bear in mind that the vast majority of legal disputes, the socalled 'easy cases" are concluded without recourse to the court. They are concluded satisfactorily based upon what both sides believe the law would say were they to ask. Despite the monumental production of legal material by the courts, the number of disputes heard in Courts of Law are minimal compared with the number of disputes that occur  46  within Western societies. Nevertheless, the number of disputes heard by the courts is sufficient to merit an investigation of the viability of building programs that can potentially provide answers to the so-called 'hard cases."  This last statement also merits  investigation. We should be asking ourselves - are there correct answers, a range of correct answers to these so-called 'hard cases" or should the word 'borrect" have no place in the vocabulary of this discussion.  2.2  A l AND L A W  - F R O M W H E R E T O WHERE...  The history of A l and law can be traced to an article by Buchanan and Headrick entitled "Some Speculation About Artificial  Intelligence and Legal Reasoning *." 6  In this article,  the authors note that interdisciplinary work between the fields of computer science and law has foundered due to the misconceptions that each carry about the other.  It was  argued in that article that lawyers, in assessing the clients likelihood of success in litigation, tried to analyze the fact pattern in issue with those of previous cases. The lawyer required an ability to search through large amounts of data and needed to predict what could affect the potential of a favourable settlement. In all of the above, it was argued, computers could have significant implications. However, the authors pointed to other problems such as 'the search for programmable rules used by lawyers" and 'the disparity between the natural statement of the rule and the formal statement within the programming language. " For almost a decade these and other problems appeared to be 69  insurmountable.  Stanford Law Review. 23, November 1970, pp.40-61. ibid.,p.46.  68 69  47  Until the mid-1980's computer-based information technologies in law were little noted, outside of their immediate circle of users in legal practice.  Although heavily used for  data/information retrieval, they did not seek to assist, second guess or usurp the reasoning faculties of the legal practitioner, administrator or court official and were thus considered to be relatively unremarkable. The arrival of the expert system appeared to alter this perspective radically. They allowed the computer novice to manipulate natural language to formulate models of legal systems previously only comprehensible through the medium of text.  For legal academics the arrival of the expert system in law aroused much interest. It appeared that the modeling of law using expert systems could inform jurisprudential debate. For the Positivists, legal expert systems represented the physical embodiment of the view that law can be represented as a system of rules; in its purest form law could be objective, determinate and neutral.  However, it is argued that such a view is merely a further manifestation of the adage 'When the only tool you have is a hammer, every problem tends to resemble a nail. " It 70  was thought that A l might possibly assist in the so-called Teleological/Positivist debate, possibly explaining reasons for inconsistencies and/or developing methods to deal with this such issues.  Abraham Maslow, 1908- 1970, Psychologist.  48  2.3  L E G A L REASONING AND T H E E X P E R T S Y S T E M  Law is concerned with what is acceptable and unacceptable behaviour in a given society. In Western legal culture, there are two ways of communicating standards - either by rules or examples. Not surprisingly, researchers interested in emulating legal reasoning model use both of these forms. However, there are numerous questions to be answered such as what is the nature of a rule, what form does it take - (i.e. doctrine/principle/policy), how do lawyers use those rules or examples? Thus, the fundamental problem for creating working expert systems in law is founded in an antecedent problem of the creation of adequate legal theory. If we compare the problems of creating expert systems in law with those of creating expert systems in law's nearest professional relative, namely medicine, we see that the latter suffers few of the difficulties that law possesses. For instance, one does not have to consider the nature of medicine before one builds an expert system to perform medical diagnosis. Susskind sums up this problem for us; 'It is beyond argument, however, that all expert systems must conform to some jurisprudential theory because all expert systems in law necessarily make assumptions about the nature of law and legal reasoning. To be more specific all expert systems must embody a theory of structure and individuation of laws, a theory of legal norms, a theory of descriptive legal science, a theory of legal reasoning, a theory of logic and the law and a theory of legal systems as well as elements of a semantic theory, a sociology and psychology of law ( theories that must all themselves rest on more basic philosophical foundations. )" 71  This is no small task. Sadly, however few researchers heed this advice. Unquestioning adherence to rule-based formalism appears to predominate . Researchers tend to omit 72  Susskind, R E . , Expert Systems in Law - A Jurisprudential Inquiry, Oxford, Clarendon Press, 1987, p.20, (his emphasis.) For instance, Dave Brown (of the University of Melbourne Law School) comments "Few, if any of the papers at this conference questioned the basic assumptions of the field... A strong legal and philosophical input is needed in any major project in this area if its aims are to produce any worthwhile and useable applications." (Taken from Brown,D., "The Third International Conference of Artificial Intelligence and 7 2  49  from their modeling the view that rules in law are not merely to be followed; order and predictability in law, although perhaps desirable are not two of its primary attributes. Rules are to be extended, adapted and interpreted. The law remains permanently in a state of flux, constantly altering the boundaries between its concepts. A lawyer is obliged to present the most favorable interpretation of the rule, in pursuance of the goals of the client, thus shifting the boundaries of a given legal concept. Courtrooms rarely witness the direct application of a pre-existing rule, otherwise the matter would be settled long before trial.  Thus the use of an objective formalization methodology severely  underestimates the function of rules in the legal context.  2.4  LOGIC-BASED APPROACHES  Any system of logic is concerned with meaning, with propositions and their interaction. It will consist of notations and a set of rules. The most rigid form of rule-based formalism is that of classical logic. We should begin this brief review with what Posner describes as 'the apt and famous syllogism ," "All men are mortal; Socrates is a man; Therefore 73  Socrates is mortal." The validity of the argument is undeniable. Legal reasorring/tWnking appears to work on occasions, in such a manner.  One can point to large bodies of  statutory law in domains such as labour and immigration law, where at least, superficially, the law functions in such a manner. Thus, runs the theory, legal judgments can be viewed as logical deductions from accepted principles, abstract and divorced from experience.  Law; A Report and Comments", Journal of Law and Information Science. Vol.2 No.2,1991, pp.223- 239, at p.238. Posner, R.A., The Problems ofJurisprudence, Cambridge, MA., Harvard University Press, 1990. 73  50  Yet, what this approach fails to grasp is that logic and rational action need not necessarily equate.  Earliest forays into the area by researchers such as Kowalski and Sergot et al.  74  adopted a  strict rule-based isomorphic approach. Whilst subscribing to the rule-based 'toorldview," 75  it would appear that they were not close readers of Hart's work and his warnings against the use of deductive reasoning. Hart was very much alive to the folly of purely deductive reasoning in law. He writes; "...deductive reasoning, which for generations has been cherished as the very perfection of human reasoning cannot serve as a model for what judges or indeed anyone should do in 76  bringing particular cases under general rules.  "  Deductive reasoning cannot be entirely dismissed however. The drafting of a legal document owes much to deduction. When drafting a document such as a contract, the lawyer is effectively writing an algorithmic program to be performed by humans. Where deduction fails to play a significant role is in the process of reasoning with precedents. Unfortunately, the deductive functioning of certain statutes proved too seductive to the team from Imperial College, London, who displayed proficiency as skilled logicians but naivete vis-a-vis jurisprudence.  Notable for their absence of lawyers, this team of  computer scientists used PROLOG,  a logic programming language, to implement the  British Nationalities Act 1981 in the form of an expert system. This approach has been  7 4  Sergot M.J., Sadri, F., Kowalski, R.A., Kriwaczek,F., Hammond, P., and Cory, H.T, The British  Nationality Act as a logic program, Communications of the ACM. 29, 370-383.  Isomorphism refers to an attempt to represent the letter or substance of the law, rather than an attempt to deal with the issues of open texture. Hart, H.L.A., Positivism and The Separation ofLaw andMorals, 71 Harvard Law Review. 1958, pp.593-629 75  76  51  widely criticized by legal researchers for a naive appreciation of the jurisprudential implications of their work. In their defense however, it should be pointed out that such applications were built 'to demonstrate and test developing techniques of logic programming "rather than to prove the validity of logic/rule-based approaches to legal 77  reasoning modeling.  2.5  JURISPRUDENTIAL T H E O R Y AND ARTIFICIAL M O D E L S  So much has been written on the topic of Western jurisprudence - a full survey would consume the majority of this thesis and would also deflect us from our main task. Since ourfirstglimpse of codified law provided by the Code of Hammurabi about 2000 B.C., we have pondered upon the nature and substance of law. Aristotle made the first attempts to distinguish the rule of law from the ruler and emphasized a unemotional exercise of human reasoning as the ultimate paragon. More recently, we have seen the emergence of a feminist jurisprudence, critiques from the post-modernist, Marxist, race-based and psycho-analytic perspectives (to name but a few.) As yet they appear to have made little impact upon the development of artificially intelligent models of legal reasoning. While acknowledging the myriad of critiques, we will focus upon those perspectives and critiques that have directly inspired researchers in our field. The task is to plot a course through the morass of words and ideas, to 'tear through the veil of pretentious verbiage " that surround theories of law to extract those aspects of each jurisprudential 78  theory that actually inform the theoretical basis for the building of a neural network in law.  77  Kowalski, R., and Sergot, M., The Use ofLogical Models in Legal Problem Solving, Ratio Juris. Vol.3,  No.2, July 1992, 201-218 at 201. Loevinger, L., Jurimetrics - The Next Step Forward, 33 Minnesota Law Review, p.455 at p.466. 78  52  In this regard, we must seek the substantive conclusions that have emerged from the debate between Positivism and the plethora of anti-Positivist views.  2.6  T H E USE OF RULES  2.6.1 P O S I T I V I S M Positivism has a remarkably resilient character that re-manufactures itself in new guises. With the benefit of hindsight, we can see that H.L.A. Hart's book 'The Concept of Law" (owing much to the work of Wittgenstein and Russell,) caused a significant paradigm shift in the nature of Legal Positivism. Prior to its publication in 1961, jurisprudence was dominated by the views promulgated by, inter alia, Bentham and Austin, who argued that, whilst the law should be divorced from morality, it should be seen as a set of commands or rules, the 'positive" morality backed by a sanction, reducible to an empirical base. Hart made arguably the most important contribution to jurisprudential thinking this century by reformulating the views of Austin into the view that laws are legitimate unless there is a violation of some higher law . 79  Of most importance to the computer modeler is Hart's analysis of the nature of rules which is dominated by the notion of 'bpen texture" - the indeterminacy that overshadows both rules and precedents. We might wish to equate the term with 'tuzziness" as described  Hart describes law as a unity of primary and secondary rules. Primary rules govern conduct. Secondary rules specify how ,if necessary primary rules should be modified, eliminated, used or adapted. Hart identifies as the most important secondary rule, a "rule of recognition" which identifies some features that primary rules must exhibit in order to be considered valid. Thus, in Canada, the Charter of Rights serves as the rule of recognition.  53  in Chapter 1. According to Hart, rules have a core of settled meaning and a penumbra of uncertainty, whose extent can vary from case to case. The example used by Hart to describe the problem of indeterminacy and 'bpen texture" is a law that prohibits vehicles from a public park. There are certain vehicles that clearly fall within the core meaning of the rule such as a motor car. Other vehicles might fall into the penumbra and become those borderline cases on which the judicial actors are required to adjudicate. However, Fuller rightly point out 'Professor Hart's...extended discussion of the core and penumbra is not just a complicated way of recognizing that some cases are hard, while others are easy, " although this would seem to be a major theme of Legal Positivism in its various 80  guises. (The distinction between hard and easy cases is for most lawyers meaningless. All cases that come across a lawyer's desk tend to be hard, else the issue would be settled. Thus any expert system that deals only with easy cases is worthless.) It is a question of determining the range of reference of a word - a analysis of the fuzzy set that contains the word in issue. Fuller remarks that words are rarely interpreted in isolation. When a judge does consider the penumbral case, she/he is forced into a 'hiore creative role." Fuller also highlights the issue of the purposive or teleological function of law and in doing so, an inadequacy of Hart's theory. He uses the example of a vagrancy law, which states 'it shall be a misdemeanor...to sleep in a railway station," a law which most rational thinkers would argue is designed to prevent homeless people or travelers from sleeping in a railway station. Fuller then posits the situation where the first arrested offender is sitting upright but asleep and the second has brought a blanket and pillow with the intention of sleep, but  Fuller, L., Positivism and Fidelity-A Reply to Professor Hart, 71 Harvard Law Review 1958, pp.630672. 54  was still awake at the time of arrest. Fuller asks 'Which of these cases presents the 'Standard instance" of the sleep?" The problem here is teleological rather than semantic we know what sleep is and we also believe we appreciate the purpose of the law. He suggests that in cases such as these the purposive interpretation of the law must therefore override its literal interpretation.  This example is only one part of the so-called Hart-Fuller debate. We can see that, whilst highly informative, the debate can have no conclusion. Such a debate is false as the two parties were working from entirely different axioms. It has, however, acted an important springboard for further discussion on the nature of rules.  Hart's successor to the Chair of Jurisprudence at Oxford, the more contemporary jurisprudential figure of Ronald Dworkin is insistent that in law, as with literary interpretation, every case has a right answer and that discretion has no role to play. However, the right answer may not be found through the application of rules. A so-called 'Hghts theorist", Dworkin maintains that in hard cases legal premises consist of rules, backed by principles and policy. Thus when the rules fail to address the issue in hand, the judge must 'Weigh and balance " the competing principles and conflicting interests, 81  implying an objective test. The notion of principles should be distinguished from public policy, which Dworkin believes has little part to play.  Dworkin's writing uses the  Dworkin, R., Taking Rights Seriously, Cambridge, MA., Harvard University Press, 1977, p.22. This echoes Hart's view that "there are ...areas of conduct where much must be left to be developed by courts or officials striking a balance...between competing interests which vary in weight from case to case," Hart, H.L.A., The Concept of Law, Oxford, Clarendon Press, 1961, p. 24. 81  55  terminology of 'discovery" rather than 'creation" in this context. Thus, when the rules 'run out", a judge applies certain fundamental principles to 'discover" the law. An example of such a principle might be - a man shall not profit from his own wrong. However, upon what objective criteria does the judge perform this process of weighing and balancing? This is an unconvincing metaphor and potentially misleading as there can be no objective criteria. It would seem that such action relies largely upon the discretion of the judge.  2.6.2 L I M I T A T I O N S O F T H E R U L E - B A S E D A P P R O A C H It is easy to see the superficial similarity between the model of legal reasoning from the perspective of symbolic-based A l approaches to problem-solving. Positivism considers the majority of legal decisions to be bounded within the intellectually limited confines of doctrinal rules. The Dworkin theory outlined above, argues that, beyond the limit of rules, discretion (for want of a better title) is exercised by a weighing of factors. However, Coval and Smith show that this metaphor bears little relation to the predictable (and rulegoverned) fashion that they believe to be judicial method . Expert systems are rule-based 82  inference engines. Thus any excursion into the realm of discretion must sentence the system to failure, unless we can develop a method by which to analyze the seemingly unanalyzable.  Coval, S., & Smith,J.C., Law and its Presuppositions: Actions, Agents and Rules, London, Routledge and Kegan Paul, 1986.  56  The experience of many researchers in the field is summed up by Leith. He writes; 'My own history in the field (of expert systems and law) is of someone who began, relatively optimistically, to build such a system ...However, during this building, I grew more and more skeptical of being able to handle legal knowledge in a computer system. This skepticism can be reduced to the simple statement that real legal skill is a much more amorphous quantity than the builders of legal expert systems believe. " 83  Clearly mechanical jurisprudence, the so-called 'judicial slot-machine" theory, has little to offer and should be abandoned as an underlying methodology. However, undeniably rules do play a role in law. The question is at what level?  There are further problems with the rule-based paradigm generally. Expert systems tend to deal with highly specialized and often obscure domains. Given that it is necessary for the expert to articulate a theory of how the law works in this domain, why should the user of the system trust the interpretation of an expert, A, who has built an expert system, as opposed to the views held by expert B, who has no such technical prowess? We can use Hart's in support here; "Any honest description of the use of precedent in English law must allow a place for the following pairs of contrasting facts. First, there is no single method of determining the rule for which a given authoritative precedent is an authority. Notwithstanding this, in the vast majority of decided cases there is very little doubt. The head-note is usually correct enough. Secondly, there is no authoritative and uniquely correct formulation of any rule to be extractedfromcases. On the other hand, there is often very general agreement, when the bearing of a precedent on a later case is in issue, that a given formulation is adequate. " 84  Raz, however, finds few of these difficulties; 'In the main, however, the identification of the ratio of the case is reasonably straightforward. " Williams also sees few problems; 85  Leith, P., The Computerized Lawyer -A Guide to the use of computers in the legal profession, London, Springer-Verlag, 1991, p. 189-190. Hart, H.L.A., The Concept of Law, Oxford, Clarendon Press, 1961, p. 131, (my emphasis.) Raz, J., The Authority of Law, essays on law and morality, Oxford, Clarendon Press, 1979.p.79  84  85  57  'The Ratio Decedendi of a case can be defined as the material facts of a case plus a decision thereon. "The problem therefore, may not be extracting the rule from the case; 86  rather it is the problem of deciding whether the case in issue is sufficiently similar to the so-called 'precedent" case and if so, in what respects, so as to permit the application of the extracted rule.  2.6.3 R U L E S K E P T I C I S M & L E G A L R E A L I S M Again, it is unnecessary to explore the plethora of views held at the opposite end of the jurisprudential spectrum, unless they impact upon artificial legal reasoning modeling. Rule skeptics take a variety of forms from those who merely find difficulty with the view that 87  legal reasoning is predominantly rule-governed to those of a radically Critical orientation, who become almost nihilistic in their belief that law is entirely functional in upholding the capitalist hegemony - in essence, law is power. Rule-Skeptics question the purpose of engaging in legal reasoning. Rather than being a utilitarian method by which legal actors can predict the outcome of a given fact situation, they see it as playing a justificatory role, either obscuring or focusing attention upon existent dominant power relations. Legal realists insist that legal doctrine plays little or no role in the determination of disputes, rather it is a justificatory smoke screen for conclusions reached intuitively. Thus, they argue, it is more worthwhile to focus on the factual outcome of cases and their effects on wider society. The only real law say the Realists, is the decision of one particular judge in  Williams, G., Learning the Law, London, Stevens & Sons, 1978, p.62 Indeed, most lawyers who have ever had occasion to present a case before a judge tend to exhibit some form of rule-skepticism. 87  58  a given specific case. Even if there is a rule that would appear to apply to the case in issue, the lawyer is still well advised to consult the relevant precedents.  If any such  precedent reveals that the judge in that case erred from the accepted rule, the lawyer must analyze in what ways that case might be different and whether the case is issue exhibits any similar features.  Any discussion of the so-called 'Realist" movement must pay respect to the figure of Oliver Wendell Holmes, whose place in the development of American jurisprudence is unparalleled . An extreme member of the Pragmatic philosophical school, his work was 88  largely a reaction against rule formalism, the legal positivist obsession - 'The prophesies of what the court will do in fact and nothing more pretentious, are what I mean by the law. " 89  Posner writes; 'Holmes shows the absurdity of supposing, as did the nineteenth century formalists against whom he was writing, that legal doctrines were unchangeable formal concepts like the Pythagorean theorem. He enforced the lesson of ethical relativism, thereby turning law into dominant public opinion in much the same way that Nietzsche turned morality into public opinion. " 90  However, Holmes found agreement with his philosophical adversary, Hart, regarding the lawyers obligation to deal only with the law as it is, rather than as it ought to be and accepted the possibility of a scientific valuation in law. This is of little importance to our  Those theorists who build on Holmes' work include, inter alia, Cardozo, B., The Nature of the Judicial Process, New Haven, Yale University Press, 1921, and Llewellyn, K., Jurisprudence; realism in theory and practice, Chicago, University of Chicago Press, 1962. Holmes, O.W., The Path of the Law, in Collected Legal Paper, New York, Harcourt, Brace & Co., 1920, p. 173. Posner, R.A., ibid., p.240. 88  8 9  90  59  purpose as we seek to embody a descriptive rather than prescriptive view of the law. Unfortunately, from our point of view, Holmes was unable to articulate any such theory for the operation of legal reasoning.  Thus, although highly critical of  traditional  jurisprudential approaches, and of great significance to those theorists concerned with law and its interaction with the social and policy contexts, Holmes (and Legal Realism as a jurisprudential movement) offer little other than vague cliches and platitudes for the Jurimetrician as to how the process of legal reasoning may be explained.  2.6.4 T H E A N A L Y T I C A L - T E L E O L O G I C A L A P P R O A C H We should be careful not to dismiss all rule-based approaches as completely without merit. Rather, they require a combination with a theory that takes account of the alternative discourses of economic, political and gender-based power relations existent within law. Critical scholarship is informative in this regard. The only theory to date that appears to take account  of the accepted importance of rules whilst also acknowledging the  importance of these alternative discourses is the analytic-teleological theory or 'deepstructure" theory of law as expounded by Professor J.C. Smith. Deep structure argues that beneath doctrine there exists a non-conceptualized rule-based structure that governs decision-making within law. (Doctrine has, at most, an indirect influence.) This accounts for how given fact situations relate to the social values that are reflected within existing legal discourse. Thus judges decide hard cases in a rule-based manner however these rules are more likely to be unconsciously developed meta-rules.  60  "According to deep structure  theory, social goals teleologically justify legal doctrine," writes Braithwaite . When reading cases on the chosen domain, the expert system developer is encouraged to use his/her analytical skills and possibly his/her intuition to rationalize the law based upon goals of the law and reason inductively by scanning all available cases. For Smith, law is explained in terms of actions and goals. For example, in the law of torts, there are conflicting interests between liberty of action, (formalized in the 'remoteness" doctrine) and the prevention of harm to a third party. The law will structure these goals in terms of importance. The goals are not so much 'balanced" as Dworkin would suggest, as ranked with one meta-rule dominating the second in a given case or in an entire domain. This goal-based analysis is then formalized in the form of 'fuzzy rules" where sub-doctrinal heuristics may be brought to bear. 'Once these patterns have been identified a broadly stated meta-rule or principle may be formulated which explains the general direction of the case law in the domain independently of the 'surface discourse" of law at the doctrinal level. " Whilst utilizing a rule-based theory, deep structure overcomes the inadequacies 92  of Positivist analyzes of law at a doctrinal level. It is both Critical in that it accepts the socio-economic context in which law operates, whilst also avoiding the tendency towards nihilism that rule-skepticism exhibits. This approach has also shown a considerable degree of success in practice . Therefore, in considering existing rule-based paradigms, we 93  Braithwaite, M.J., ibid., p. 10. Kowalski, A., Case-Based Reasoning and the Deep Structure Approach to Knowledge Representation, Proceedings of the Third International Conference of Artificial Intelligence and Law. A C M Press, Oxford, 1991,p21atp.22. For example, Kowalski's systems appear to have achieved over 90% accuracy in predicting outcomes in 10 out of 11 cases. See also Deedman,G.C, Building Rule-Based Expert Systems in Case-Based Law, LL.M. thesis, UBC Faculty of Law, Vancouver, B.C and MacCrimmon, M.T., Facts, Stories and the Hearsay Rule, Pre-Proceedings of the III International Conference on Logica. Informatica. Diretto. Florence, 1989. 91  9 2  93  61  should be careful not to dismiss every attempt to emulate the real-world in such a manner as ill-founded. What should be criticized is the naive philosophical presuppositions of some researchers in law and A l that law is no more than a discreet set of doctrinal rules, a unitary model unconnected from the economic and political values and prejudices prevalent in our society.  However, there are factors that the analytic-teleological approach cannot address. It is undeniable that there is still an interpretive element involved in the formalization of the case generalizations. Some of the subtleties of the legal argument may be lost in the attempt to force the legal domain into the narrow corrfines of the logic of an expert system shell. Although Andrej Kowalski found that 'the deep structure approach successfully explains and accounts for legal decisions in areas of case-based law generally considered to be unstructured and indeterminate ," some legal problems may be so complex as to 94  defeat ones analytic skills, by for example exhibiting an apparent randomness. However, to date the approach has shown considerable success and appears to be the only credible coherent rule-based technique sufficient to explain the actions of the law whilst also taking account of non-doctrinal factors.  2.6.5 G A R D N E R One of the most notable rule-based research efforts in A l and law was by Anne von der Lieth Gardner. It was not overly successful. She used LISP (list processing) language to  9 4  Kowalski, A., ibid., p.22.  62  build a program that was designed to reason as to whether a particular set of circumstances constituted an offer and acceptance under the law of contract. Her conclusion was that; 'Legal philosophy tells us, among other things, that legal rules do not dictate legal outcomes; there is more going on in the decision of a case than ordinary deduction. One extra element is that there is often room for choice about which, of several possible decisions, is to be preferred. Another is that the choice, once made, sets a precedent, which may change the space of choices available in later cases. " 95  It would appear that two elements defeat her research. The first is the fact that there comes a point where her software concludes that the judge has a choice or discretion. Sadly, she was not able to program her software to emulate the legal reasoning process in this regard. The second element is open texture. She states: "Underlying both the use of examples and the provision for expert disagreement is the idea that legal predicates have open texture. The phrase open texture is a catchall...For some predicates there may be a clear prototype case with many possible variations. For others there may be a definition that looks analytical but is in fact defeasible. For still others the concept may be so abstract that its range can be worked out only piecemeal. But these are all phenomena of natural language. " 96  Given such conclusions, perhaps research funding might be better spent exploring the problems of the structure of natural language, of which legal language is after all, only a sub-category.  It would be beyond the mandate of this thesis to attempt to answer  questions regarding whether language reflects or shapes reality. Nevertheless, we would be wise to contemplate the wider context including the meaning of such questions when evaluating attempts to impose the models of empirical science upon non-empirical phenomena.  Gardner, A.v.d.L., An Artificial Intelligence Approach to Legal Reasoning, Cambridge, Mass., MTT Press, 1987, p. 190. ^Gardner, ibid., p. 190.  63  2.6.6 T H E W H I P L A S H E X P E R T S Y S T E M We cannot leave this area without making reference to an attempt to build a rule-based expert system to predict quantum for whiplash injuries during the summer of 1990, by Deena McLeod, (a law student at the University of British Columbia.) Unfortunately, the program is no longer working thus further dissection is precluded.  However, it is  understood that the expert system never functioned as expected and the project was abandoned. Why did it fail? It would seem that the researcher failed to discover the rulebased relationships between factors that appear to influence the decision making process. What does one do then, if one cannot discover the relationships between factors that are believed to unify the cases in the given domain? Expert systems cannot function in such domains. What if the cases offer no unifying structure, but instead exhibit an apparent randomness that evades analysis?  A further problem common to all expert systems,  including those based on the analytic-teleological approach is that the law is seldom static. Thus any problem solving model should ideally have a mechanism that can be easily modified.  2.7  CASE BASED REASONING AND T H E I M P O R T A N C E OF ANALOGY  'The life of the law has not been logic; it has been experience" writes Holmes . An 97  alternative to the rule-based approaches, whether based on doctrinal or sub-doctrinal rules is Case-Based Reasoning (CBR). If we revert to the medical analogy once again, a physician who diagnoses the illness of a new patient is reminded of the illness of a past  97  Holmes, O.W., ibid., pl73.  64  patient to see if the former diagnosis is of relevance.  In essence, the problem solver,  whether medical, legal or otherwise, uses a past experience to solve a new problem. Analogy is the appreciation of the resemblance of several characteristics between two cases, situations or states of affairs, a discovery of the community of language, an aspect of the principle of universalization, which argues that like cases should be treated alike. Connectionism, like CBR utilizes this principle of universalization and is a close relative of CBR in that they are both models of computer-based problem solving and human cognition.  2.7.1  LEVI  The jurisprudential foundation for this approach can be found in the work of Edward Levi  98  who may be seen as a writer largely within the Frank  99  (rule-skeptic) tradition.  Levi writes: 'The basic pattern of legal reasoning is reasoning by example. It is reasoning from case to case. It is a three-step process described by the doctrine of precedent in which a proposition descriptive of a first case is made into a rule of law and then applied to a next similar situation. The steps are these: similarity is seen between cases; next the rule of law inherent in the first case is announced; then the rule of law is made applicable to the second case. This is a method of reasoning necessary for the law, but it has characteristics which, under other circumstances, might be considered imperfections." 'Reasoning by example in the law is a key to many things. It indicates in part the hold which the law process has over the litigants. They have participated in the law-making. They are bound by something they helped to make. " 100  Levi, E., An Introduction to Legal Reasoning, Chicago, University of Chicago Press, 1949. Judge Jerome Frank, Courts on Trial, Princeton, Princeton University Press, 1949 and Law and the Modern Mind, Garden City, NY., Doubleday Press, 1963. Levi, E., ibid., p.5. 9 9  1 0 0  65  Cardozo offers a helpful analogy at this point. He describes the process of discovering analogy as a process of search and comparison. Of some judges, he argues that their notion of duty is to match the colours of the case in hand against the colours of many sample cases in front of the judge. The sample nearest in shade supplies the applicable rule. Levi augments this view. He does not dismiss the view that the law is a system of rules. However, he argues that the rules are unknown. Rather they evolve over time, in the process of determining similarity or difference with the precedent cases. However, there are problems with this approach; Lashbrooke writes of Levi, " ...the system (described by Levi) is imperfect unless there is some overall rule that provides that the finding of similarity is decisive. No such rule exists. " Again, the decision as to whether 101  similarity exists between two given cases is within the discretion of the judge. What then is similarity? Upon what basis does a judge decide that there is sufficient similarity so as to be guided by the precedent case? Are similar cases to be likened to certain legal concepts such as malice or pornography - very difficult to define but 'you know it when you see it'? Alternatively, there may be specific criteria which justify a particular form of categorization; if so what are they? There appears to be no objective process by which one performs this task.  Mital and Johnson conclude " it is problems such as the above (theories of similarity) which have made neural computing technology look attractive, both to developers of practical systems and to those who are primarily interested in clarifying issues in legal reasoning...there are certain features available with neural networks which help a machine  101  Lashbrooke, E.C., Legal Reasoning and Artificial Intelligence, Loyola Law Review. 34, 1988, p.287. 66  to go beyond the constraints of having to follow a pre-defined set of rules or a priori relevancies. "With these similarity problems in mind, we can review various attempts to 102  build Case-Based Reasoners in law.  2.8  A P P L I C A T I O N S O F C B R IN LAW:  The best example of CBR  103  ASHLEY  in law and one that avoids the empirical weighting of cases is  the HYPO system built by Ashley and Rissland . Having appreciated the inevitable 104  complexity of the judicial decision-making process, they shied away from case prediction as a goal for the system. Nevertheless, this system is one of the most impressive and efficient available. The underlying rational is that 'reasoning by analogy is the key method of legal argument. " 105  The system works as follows; a case is represented by a frame consisting of up to 13 dimensions each representing a factual situation, known as a factual predicate. The use of only 13 dimensions might suggest that the case analysis functions lack a necessary depth. However, the system appears to have been developed successfully with so few. Ashley describes dimensions as representing 'the features of a legal case that are important for the strength of a Plaintiffs position on a particular kind of claim. They represent the legal  Mital, V., and Johnson, L., Advanced Information Systems for Lawyers, London, Chapman & Hall, 1992, p.257. For an excellent review of the variety of CBR projects in law, readers are referred to Chapter 14 of Mital, V., and Johnson, L., ibid. Rissland, E., and Ashley, K., HYPO: A Precedent-Based Legal Reasoner, in Vandengurghe, G.P.V., Advanced Topics of Law and Information Technology, Hillsdale, NJ, Lawrence Erlbaum, 1989, p.327. Ashley, K., Reasoning by Analogy: A Survey of Selected Al Research with Implications For Legal Expert Systems, in Walter, C , Computing Power and Legal Reasoning, St.Paul, Minn., West Publishing Co., 1985. p. 105. 1 0 2  1 0 3  1 0 4  105  67  relationship between various clusters of operative facts and the legal conclusions that they support or undermine. " Examples of such dimensions might be 'an employee has 106  switched employer." Once all dimensions have been adequately described, the system searches through its internal database of cases and retrieves those cases which most closely match the dimensions of the case, including those which potentially favour the other side, and lists them hierarchically. (This approach closely bears certain similarity to the so-called 'Quadrant" feature of the FLEXICON system as developed by the FLAIR project under the guidance of J.C.Smith  107  although the latter system can be potentially  applied to any domain of the law as opposed to merely trade secrets law. (However, FLEXICON has no pretensions vis-a-vis Al. It is not a Case-Based Reasoner, but a tool to enhance legal information retrieval.) One of the most important functions of HYPO is its ability to identify the legal dimensions of any given case, including those which either favour the Plaintiff or Defendant. Hypotheticals can be created and examined by varying values in the problem case, thus extending the system's ability to create legal argument.  Although an important advance in the development of law-based CBR, Mital and Johnson point to a deficiency of the HYPO project, which is also mirrored by current Neural Networks techniques. They state 'the reasoning by which the decision in the case was reached - or supposed reasoning, considering that a common law court tends not to state  Ashley, K., Modelling Legal Argument: Reasoning with Cases and Hypotheticals, Ph.D thesis, Univ. of Mass., 1988. See Smith, J.C., & Gelbart, Daphne, Beyond Boolean Search: FLEXICON, A Legal Text-Based Intelligent System, Proceedings of the Third International Conference on Artificial Intelligence and Law. Oxford, A C M Press, 1991 1 0 6  107  68  definitively how a particular decision was arrived at - is not represented explicitly.  The  second way in which Neural Networks may be seen to mirror the HYPO approach is that the term 'dimension" appears to bear close resemblance to the connections between neurons in the neural network. The key difference between HYPO and neural networks is that, in theory the ANN can learn the important dimensions of a domain and also learn to ignore the trivial factors.  2.9  SUSSKIND  Susskind's book 'Expert Systems in Law ", (a book based on his doctoral thesis and 109  relating to his practical experience with the 'Oxford Project," a proto-type expert system dealing with grounds for divorce under Scottish matrimonial law) attempts to shed some much-needed light upon the jurisprudential debate in respect of A l and law. He attempts to develop a core of consensus between jurisprudential thinkers such as Dworkin, Raz, Ross and Hart, on key to the jurisprudential questions that relate to the problems of building expert systems in law. With regard to the question of the primary purpose of engaging in legal reasoning, whether justification, prediction or persuasion, (one of the primary points of divisions between Positivist and Rule-Skeptics,) Susskind cuts the Gordian knot by claiming that the legal knowledge engineer does not have to decide if the program is to assist in justification, prediction or persuasion. He argues that all parties share some fundamental assumptions. These include a shared acknowledgment of 'the possibility of legal reasoning, that is, an activity whereby legal consequences can be  108  Mital, V., and Johnson, L., ibid., p.233. Susskind, R.E., Expert Systems in Law A Jurisprudential Inquiry, Oxford, Clarendon Press, 1987.  1 0 9  69  attached to acts, events, and states of affairs of our world," that legal reasoning is guided, if not by principles of logic then by principles of rationality and finally that all three functions of law presuppose a more fundamental function "namely that of stating what is true or false within the universe of legal discourse...together with that of deriving the implications of such truth or falsity in respect of particular facts. " Consequently, runs 110  his contention, once derived some may justify it, others use it for prediction and others bring it to bear in their rhetorical deliverances. Such arguments are unlikely to convince the rule-skeptics. Nevertheless, this impressive intellectual 'footwork" neatly sidesteps many of the problems that tend to beset jurisprudential debate and permit Susskind to further develop his contentions.  He goes on to identify the important characteristics of a legal domain for the building of an expert system; the domain should be relatively autonomous, sources of law should be limited and well defined, there should be agreement between experts as to their scope and content and the use of so-called commonsense knowledge should not be necessary in the reasoning process. Speaking from experience, experts may agree on the interpretation of cases yet the weight to be assigned to a given precedent with regard to issues of substantive law is often in issue. Although the scope of the law is usually reasonably defined (in terms of which cases are relevant) the application of the laws/rules to the facts in issue are often disputed. Indeed, this should not surprise given that the adversarial process and the various legal institutions actually encourage disagreements of this nature to be aired.  1 1 0  Susskind, R.E., ibid., p.42.  70  The factors mentioned above would seem to severely limit the domains in which expert system in law may be applicable. Moreover, as time elapses and the limitations of the legal domain for computer modeling remain, it seem increasingly less likely that  we will  succeed in capturing the elusive qualities of legal reasoning through deontic logic. Seven years have elapsed since the publication of Susskind's book, yet workable expert systems have still failed to materialize in the legal office. Thus, in summary, his book not only reveals the inadequacies of expert systems in law based on traditional views within jurisprudence, but also reveals how little we know about how legal actors combine human judgment with the complexities of legal formalism.  2.10  WAHLGREN  We cannot leave the domain of A l and law without making mention of Peter Walhgren's influential book 'The Automation of Legal Reasoning " which must rank alongside the 111  works of Susskind, Ashley and Gardner in terms of the contribution made to understanding in this field.  It would be difficult to find a more comprehensive review of legal theory and legal reasoning than Walhgren's. However, the importance of this work lies not with the literature review but with the methodological approach. Wahlgren settles with the view that legal reasoning is primarily rule-based. The obvious criticism to direct at Wahlgren is  111  Wahlgren,P., The Automation ofLegal Reasoning -A study on Artificial Intelligence and Law,  Deventer, Boston, Kluwer Law and Taxation Publishers, 1992.  71  that the adoption of this view neglects the bahavioural analysis approach of the ruleskeptics outlined above. However, he is able to counter this criticism, by pointing to his coherent methodological approach. He argues that his purpose is to create a model of legal reasoning that can be represented on the computer, rather than attempting to capture all forms of legal phenomena in his programming. Thus the model does not need to reflect all competing theories. Models of law are simplifications; as Smith writes 'in creating models we select certain features and leave out others. We can circumvent the problem of not having a single theory of law, by selecting a particular theory and creating a model on the bases of the presuppositions of that theory. " 112  Thus the validity (or otherwise) of  legal positivism is no longer in issue. This is a novel and intelligent approach, given the morass of complexities and (to state the matter more crudely,) utter verbiage that characterizes a large degree of legal theory. However, the models created would be of no more than academic interest, as a result.  Although Wahlgren does not follow this argument through, his methodology suggests that it may be possible to adopt any theory of law, (i.e. Marxist, feminist, etc.,) select certain features of the domain and build an expert system model of that domain. The deficiencies of the adopted theory are of little relevance as one could argue that the expert system is merely a simplified model. Nevertheless, if the selected theory, for example feminist legal theory was successful in explaining the institutional bias against females in the domain of, for example child custody law, then one might argue that one had gone some way in  112  Smith, l.C, Review of The Automation ofLegal Reasoning, Stockholm, 5 Juridisk Tidskift. Nr.l,  1993-1994, p.247- 256 at 253.  72  providing empirical proof that validates the foundational theory. However, this argument could be countered by an argument (yet again) that such computer representation are merely models not real-world representations.  Wahlgren argues that the process of legal reasoning is highly complex. It may be divided, he says, into six sub-processes. First, the identification process whereby the legal actor defines the facts in terms of a legal description - defining the relevant facts, discovery of important missing facts and establishing uncertain or disputable facts. Secondly, the lawsearch process, or the discovery and inspection of relevant and possibly contradictory legal materials, at documentary, rule or concept levels.  Thirdly, the interpretation  process using context and analogy. Fourthly, the application of rules guided by methodological rules that explain the presuppositions for rule-application. Fifthly, the evaluation process, the process by which the legal actor determines the likely conclusion from the application of the chosen rule(s) and may involve a reformulation if the application indicates an outcome that is not in the best interest of the client. Finally, the process offormulation, the process by which the legal actor decides how best to describe the matter in issue. Speaking from the position of some experience in legal practice, commercial lawyers would rarely go through this process in its entirety . It seems that 113  for all our efforts, the development of a unified theory of legal reasoning may be a misconceived project.  The reasoning processes of a non-litigious commercial lawyer  dedicated to formalizing joint venture agreements would seem to have little relevance to  It is difficult to discern what thought processes occur. Moreover, there is no reason to believe that all lawyers reason in the same way.  73  the work of a tax administrator or judge of the Admiralty Court. The processes used by different types of legal actors must be distinguished.  It is hardly surprising that Wahlgren should adopt a Legal Positivist approach to legal reasoning, as this view tends to predominate European jurisprudential thinking. Nevertheless common law case-based reasoning plays a substantial role in law, at least in the Anglo-American legal context. To ignore these complexities, is to provide a shallow theory for the purposes of modeling the law of North America and the UK. Furthermore, Wahlgren's theoretical analysis of the process of legal reasoning, although comprehensive, appears complex and unwieldy and is likely to prove difficult in implementation by others.  2.11 A C O N N E C T I O N I S T T H E O R Y O F L A W ? Perhaps it is the case that prior modeling of law using Artificial Intelligence techniques has failed because of our insistence on the correctness of our view that legal reasoning is essentially a sequential process? Perhaps we are guilty of a blinkered vision; to re-use the adage; 'if the only tool you have is a hammer, then everything tends to resemble a nail." Is it the case that legal reasoning is, in fact, a parallel process?  The practical model described herein uses a parallel architecture, therefore we should posit the question - is there a parallel-processing/Connectionist theory of law? Before we attempt an answer to this, we  should ask an antecedent question, albeit rhetorically,  namely - do we need yet another theory of law to supplement the mire of existing  74  literature on the nature of law? In answering this latter question, we should remember that the purpose of examining the nature of law serves a very practical need in this subdiscipline. This is not merely an intellectually rigorous exercise performed for its own merit. Hence if a theory can be found, then it should be sought. We cannot (and should not) offer a definitive answer until we have workable models as empirical proof. Even then it is still arguable that the parallel processing model works only in chosen discreet domains of law and cannot be utilized in the wider body of sequentially-based law.  Certain authors, notably Warner, seem to have few doubts on the matter. Although the pattern of legal reasoning appears to be the 'decomposition of a problem into issues or unit problems, the resolution of each unit problem, and the synthesis of the unit solutions into a resolution of the whole problem," i.e. a seemingly sequential model of problemsolving, he argues that the "...resolution of the unit problem will impose a 'fctate change" on the entire problem domain. This state change will render invalid earlier resolutions of unit problems, requiring their recalculation, and alter the calculations required for the resolution of remaining unit problems. This approach to problem solving appears inherent in the reasoning process used by lawyers. " In support of this contention, Warner 114  directs us back to Levi ; "...it appears that the kind of reasoning involved in the legal process is one in which the classification changes as the classification is made. The rules change as the rules are applied. More importantly, the rules arise out of a process which,  Warner, D.R., A Neural Network-Based Law Machine: Initial Steps, 18 Rutgers Law and Technology Journal. 1992. p.52. 1 1 4  75  while comparing fact situations creates the rules and then applies them.  Such  conclusions cry out for empirical supporting evidence.  However, as stated above, it is important, if not vital, to distinguish the processes used in legal practice. Legal analysis does not necessarily equate with legal reasoning. By legal reasoning, we mean the processes by which judges decide cases, involving the use of precedent and thefindingof analogy. Legal analysis is defined by Meldman as 'the logical derivation of a legal conclusionfroma particular factual situation in the light of some body of legal doctrine " or the process by which lawyers draw conclusions from the 116  description of a given problem. (The term rational is to be preferred to logical in this context.) What Warner does not make clear is, in referring to the 'state change", he is referring to legal reasoning or legal analysis.  In any event, Warner is of the opinion that a sequential problem solving approach is insufficient. He writes 'The sequential problem solving model fails largely because it overlooks the dichotomy between doctrine and utility. It fails because law cannot be rigidly defined in most instances." One interpretation of this criticism would be to view this description of the dichotomous character of the law as no more than a rephrasing of Fuller's criticism of Hart - i.e. a denial of the purposive content of law and therefore, an inadequate analysis. Warner continues "...the legal system has been described as 'bpen textured" which means it is infected with varying degrees of indeterminacy. When lawyers  Levi, E., An Introduction to Legal Reasoning, Chicago, University of Chicago Press, 1949, pp.3-4. Meldman, J.A.,^4 Structural Model for Computer-Aided Legal Analysis, Rutgers Journal of Computers and Technology Law, 6,1977, p.30. 115 116  76  engage in the problem solving process, they do not strive to find the one answer to the problem, but rather their objective is the identification of a range of solutions. " Whilst 117  experience suggests that this argument has an element of truth, lawyers do tend to adopt one position. They pursue the best of the potential range of solutions to meet their clients' needs, while also attempting to leave their position unfettered in order to pursue an alternative should the first option fail to materialize. A good example, might be exploring the possibilities of a 'without prejudice" settlement while also serving pleadings/affidavits on the other side to show the opposing side that lawyer and client take the dispute seriously. Does this amount to parallel processing or is Warner trying to bend his analysis to suit his purposes?  Given that we have yet to build adequate models of law using a Connectionist architecture, it seems to be a little premature to define law using a Connectionist vocabulary. Instead of developing a theoretical foundation with which to justify his intuitive belief that Connectionist architecture is an adequate descriptive model of the legal reasoning processes, scientific method dictates that Warner should first develop working models, observe the results and then tune his theoretical arguments regarding the Connectionist attributes of law. However, we should also be minded of the irony that some of the most important development in the so-called logical, rational sciences were products of intuition. For instance, Einstein wrote of his discovery of the elementary laws  Warner, D.R., Towards a Simple Law Machine, Jurimetrics Journal. 29, 1989, p.460.  77  of physics 'to these elementary laws there leads no logical path, but only intuition, supported by being sympathetically in touch with experience. " 118  Once again, Warner's work implicitly calls for a sociological analysis of legal procedures; solid empirical research that shows where lawyers use analysis and where they use reasoning would greatly enhance the level of debate. Leith argues that our pre-occupation with the legal text may be deflecting our attempts at comprehension of the legal environment. He writes "...law is an interconnected body of practices, ideology, social attitudes and legal texts, the latter being in many ways the least important. " 119  Development of a hierarchy of importance of the above, although of significance, is nowhere near as important as an overall appreciation of the complex interaction between these factors.  2.12  L E G A L REASONING AS A SKILL  The difference between connectionist or parallel thinking and sequential thinking is evident in the area of learning. Law School provides the student an appreciation of no more than a skeletal outline of the law. Although legal academics explain to their students that the law operates within a distinct political and socio-economic environment, the lesson of how that environment operates is something best taught in the real-world, rather than in any law school. In addition legal practice teaches the novice lawyer when to use the law, when not to use it and how to use it. Quotation taken from Holton,G., Thematic Origins of Scientific Thought; Keplar to Einstein, Cambridge, Harvard University Press, 1973, p.357. Leith,P., ibid., p.213.  1 1 8  1 1 9  78  If we consider the practice of law to be no more than a developed skill, then we can make useful comparisons between learning law in practice and any other skill that a person might pick up. For instance, driving a car is a skill that most people learn by practice. After a few months, the skill develops from a deliberate conscious sequential set of actions to becoming natural and seemingly intuitive. brothers  120  Some thinkers, notably the Dreyfus  have concluded that in the process of mastering a skill, we progress from  sequential to parallel thinking. A sudden reflection upon what the person is doing is likely to bring with it a sudden degradation in performance. For example, although we all know that there are rules for the point at which a person should change gear in a standard/stickshift car, (in terms of RPM,) few drivers rely on their rev counter for an indication. They know when it 'feels right." They conducted research into how a person acquires a skill. They conclude that there are five  121  stages to skill acquisition, whether the playing of  chess, flying of planes, riding a bike or learning to read. These are novice, advanced beginner, competence, proficiency and expertise. During the first stage, they write, 'the novice learns to recognize various objective facts and features relevant to the skill and acquires rules for determining actions based upon those facts and features. " They 122  acquire 'context-free" facts and rules. For instance, in a legal context, the law student who wishes to practice in litigation learns 'context-free" facts and rules such as the court  Dreyfus, H.L., & Dreyfus, S.E., Mind Over Machine, New York, Free Press, 1986. For more on this, see also Dreyfus, S.E., and Dreyfus, H.L., Towards a reconcilation of phenomenology andAI, in Partridge, D., and Wilks, Y., The foundations of artificial intelligence -A sourcebook, Cambridge, Cambridge University Press, 1990. Whether there are four, five or even ten stages of learning is not important. Instead, we should be aware that different skills are employed according to our stage of development. Dreyfus.H.L., & Dreyfus, S.E., ibid, p.21. 1 2 0  121  122  79  procedures, the meaning of  'Without prejudice" negotiations, methods of cross-  examination, etc. During the second stage, the beginner starts to recognize context-free features by a 'perceived similarity with prior examples ."Rules become located within a 123  given situation. Using our legal example, an advanced beginner in legal practice might learn spot a problem with the client's evidence, having seen similar problems previously. Thus being able to spot the problem through experience becomes more important than being able to describe the problem in terms of rules. The key factor for Dreyfus and Dreyfus in the third stage is the sense of what is missing as well as what is there.  The fourth stage is proficiency. At this point, the proficient performer has a wealth of experience. Knowing what to say in a court room in any given situation that will appeal to a judge's understanding and beliefs is intuitive and can only by gained by experience -  it  is an holistic situation-based similarity recognition process - the sense of 'tleja vu" and recalling what worked last time. The lawyer will discriminate factors in any given situation as conspicuous against those to be ignored. 'The relative salience of features will change " with events. This is intuition. Dreyfus and Dreyfus write 'intuition or know124  how, as we understand it, is neither wild guessing nor super-natural inspiration, but the sort of ability we all use all of the time as we go about our everyday tasks... (The irony is not missed.)..an ability that out tradition has acknowledged only in women usually in interpersonal situations, and has adjudged inferior to masculine rationality ." Thus true 125  experts can act in a seemingly irrational manner and yet still succeed.  Dreyfus, H.L, & Dreyfus, S.E., ibid. p.22. Dreyfus, H.L, & Dreyfus, S.E., ibid. p.28. Dreyfus, H.L, & Dreyfus, S.E., ibid. p.29.  123 124 125  80  2.13  CONCLUSION  In jurisprudence as with other complex philosophical domains, one should not seek answers but merely attempt to clarify increasingly complex questions. Having accused other theorists of excessive 'navel gazing" regarding the issue of the jurisprudential foundations for the building of A l models in law, this thesis is also in danger of surrendering itself to such problems. In the war of attrition between the army of A l researchers aimed with their weapons of choice and the citadel that contains that nebulous phenomenon known as the complexity of law, the latter is standing firm against almost all offensives.  Speaking from my own experience of working in a law firm, although there may be a variance of opinion amongst lawyers as to what the law actually is in a given instance, there is usually an agreement over what the appropriate action in each case should be. The lawyers tend to be aware of the range of solutions. This is not so much an awareness of what the law will say in thefinalinstance but rather an agreement as to what would be the rational (but not necessarily logical) action for our client to take. Equally, there is usually agreement as to what action the opposing side should take and to prepare accordingly. Could we, therefore, build computer models of law based largely on legal intuition and ignore the formalistic complexities of the law. This would undoubtedly prove to be an interesting research topic.  81  In any event, from the review outlined in this Chapter, it is clear that with a few notable exceptions, this application of expert systems in law is beset with theoretical difficulties. The problems with symbolic representations of law are mirrored by what is now accepted as the failure of the traditional A l reductionist paradigm. Over ten years of research has been expended on A l in law. Given that the Connectionist paradigm is being exploited in other domains, it seems appropriate to now evaluate its potential with reference to the domain of law.  82  CHAPTER 3 NEURAL NETWORKS AND LAW - A LITERATURE REVIEW 3.1  INTRODUCTION  Open texture is a problem of everyday language, although its problems carry particular significance in law, as open texture often becomes become a matter of day-to-day debate in any Court of Law. Problems of open texture are not only reflected in law, they are also magnified. Therefore, before reviewing connectionist applications in legal language, we should ask if connectionist models can overcome the difficulties of natural language where symbolic approaches have largely failed. We will then look at the progress to date on topics within Connectionism and law and also look at the research on the use of connectionist architectures for alternative applications in the legal field such as for information retrieval.  3.2  CONNECTIONISM AND NATURAL  LANGUAGE  The view that until we can understand natural language we will toil in vain trying to comprehend legal language, although not often aired in public, is a fear that almost all research workers in this field must have felt at one time or another. Studies of natural language are particularly complex and controversial. We know that we are able to expand our vocabulary as we use language. We also know from a reading of Chomsky  126  Chomsky, N., A Review of Verbal Behavior by B.F.Skinner in Language. 35, 26-58, 1959.  83  that  natural language cannot be described in simple linear terms. Similarly, it is arguable that we change the legal language as we use it. Therefore if connectionist approaches to natural language prove successful, they may also prove successful with reference to legal applications which as we saw earlier are often concerned with meaning on the borders of language. Without wishing to be drawn into a review of all A l approaches to natural language which would inevitably involve a review of a substantial body of literature beyond the scope of this thesis, we should acknowledge the efforts of researchers such as Rumelhart and McClelland and the PDP research group , who represent efforts from the 127  Connectionist perspective. McClelland maintains that "conventional symbolic approaches only partially characterize human language, because human language is not strictly compositional. " i.e. a word does not make the same semantic contribution to the 128  meaning of every expression in which it occurs. In short, the whole cannot be explained as the sum of the parts. The parts are useful insofar as they are clues to the meaning of the whole. Meaning is also determined by context. Thus words exert an influence over one another. We can also examine the structure of the sentence to analyze its influence on meaning.  McClelland has developed a technique for learning natural language by  presenting constituents of sentences to a connectionist network one at a time, taken from a small universe of vocabulary. (Constituents of a sentence might include agent, action, patient (a noun), instrument, location, co-agent and recipient. An example of the type of sentence that McClelland might feed to the network might be "the boy ate the soup with haste." Such sentences are fairly simple and could not adequately incorporate the  See note to Chapter 1. McClelland, J.L., Can Connectionist Models Discover the Structure of Natural Language? in Minds .Brains and Computers, Norwood, NJ., Ablex Publishing Co., 1992, Chapter 7, p. 168-189. 127  1 2 8  84  metaphors, slang and context dependency that we encounter every day. However, using a limited vocabulary and a back-propogation rule, McClelland has found a degree of success in developing his model. However, he stresses that this neural computing application is likely to be most successful when used in conjunction and co-operation with traditional Al "compositional" methods. Early steps have proved promising using a PDP approach to "fill in" gaps that are missed by conventional symbolic approaches to natural language. The obvious question to ask of this research is - can they take the next step in thenresearch and expand their model? As yet we have no answer.  3.3  CONNECTIONISM AND L E G A L OPEN T E X T U R E  At present North American researchers lag behind their European counterparts in this area. This part of the thesis briefly presents various attempts by French, Dutch and English researchers to analyze the nature of open texture using neural computing.  3.3.1  BOCHEREAUETAL.  The French trio of Bochereau, Bourcier and Bourgine  129  address an aspect of the domain  of French case-based administrative law using a training set of 331 bye-laws and 47 byelaws as the test data. In this highly technical paper, their purpose is to examine controversial mayoral decisions, by looking also at the legal domain, the use of legal norms and factual events to determine if the Council of State (administrative court)  Bochereau, L., Bourcier, D., and Bourgine, P., Extracting Legal Knowledge By Means of A MultiLayered Neural Network Application to Municipal Jurisprudence,, Proceedings of the Fourth International Conference on Artificial Intelligence and Law, Amsterdam, A C M Press, p.288 - 296. 129  85  decisions are correct in annulling or validating municipal laws that come before them. Their research is made all the more interesting because they attempted to perform the same tasks using a series of expert systems, written in PROLOG.  In their system, they use 49 input variables, divided into 4 subsets of regulations, bye-laws, factual standards and normative standards. Factual standards are those aspects of a case which involve an appreciation of circumstances. Normative standards include aspects such as Public order, Public Safety, Public Decency, etc.  A mere 4 neurons were used in the  hidden layer and there were 2 possible outputs - either annulment or confirmation of the bye-law. (Only 4 neurons were used in the hidden layer because of the few examples available.) Using the back-propogation method, they achieved a success rate when presenting new cases of around 80%.  They also achieved some success with extracting logical rules, although extracted rules such as those showing no relationship between "the standard "breach of public order" and "mayoral decisions in the field of traffic policy, " would be dismissed as mere common 130  sense reasoning and exposes a simplicity unlikely to be of value to policy makers and enforcers.  ibid., p.293.  86  3.3.2 V A N O P D O R P A N D  WALKER  Aside from providing one of the best explanations of neural networks for the uninitiated, Van Opdorp and Walker  131  offer an examination of the problems that one faces with  various types of neural network architectures. They make the very valid point that simply listing the facts of a case may not be sufficient - some facts are more important than others. Therefore to work in open-textured domains, the system developer should involve an expert in the area who, although may not be able to describe their expertise in terms of rules, may possess a high degree of legal intuition and therefore know a bad example of the law when he or she sees it. Their example with regard to issue of open texture is as follows; "An apartment without an elevator is considered suitable depending on the age and degree of possible disability of the tenant." Thus, an expert in assigning sheltered housing would know if a given apartment is suitable for the tenant in question. Equally, with our domain of whiplash injury quantum assessment, an expert (whether lawyer or adjuster for ICBC) would know whether a given case is representative of how the domain works even though they may not be able to rationalize it in the form of rules. However, pure connectionists might argue that this conclusion undermines the value of neural computing because the great advantage of Connectionism over traditional A l was precisely that (ideally) one does not have to understand or articulate an understanding of the domain in order to build intelligent systems in that domain. I have not used an expert in the development of any of my neural networks except for a determination of which factors ought to be used as inputs and which should not be used at all.  Van Opdorp.GJ., and Walker,R.F., ibid.  87  3.3.3  BENCH-CAPON  Ironically, we also find that Trevor Bench-Capon, a long-time proponent of the rule-based paradigm, is one of the few other research workers rising to the challenge of this issue of open texture. He sets himself three inter-related questions: "can a net classify cases successfully; can an acceptable rationale be uncovered by an examination of the net; and can we derive rules describing the problemfroman examination of the net? " i.e. acquire 132  symbolic knowledge of open texture through a sub-symbolic methodology. Bench-Capon uses thefictionaldomain of welfare benefits paid to senior citizens to cover their costs of visiting close relations in a hospital. The domain envisaged,  although showing some  evidence of open texture, is governed by rules which could as easily be implemented using Boolean variables; e.g. A person should be of pensionable age, (English law still discriminates between the sexes on the issue of  pensionable age;  for women, the  retirement age is 60, for men it is 65), have paid contributions in four out the five relevant years, should be the spouse of the patient, should have savings of no more than 3000GBP, if the spouse is an in-patient should live X milesfromthe hospital, if an out-patient, X + Y miles.  2400 testfictionalcases were generated from a LISP program, consisting of 1200 cases where the randomly generated outcome satisfied all relevant conditions. Of the remaining 1200, each case would fail on at least one of the six conditions. He justifies the use of fictional rather than real-world cases on the grounds that "a real world domain would not  132  Bench-Capon, T., Neural Networks and Open Texture, Proceedings of the Fourth International  Conference on Artificial Intelligence and Law. Amsterdam, A C M Press, 1993, p.292-297.  88  provide the best test since, if ex hypothesi the domain is not well understood, there is no way of evaluating the rationale arrived at by the network. " Bench-Capon constructed 133  several neural networks, using one, two and three hidden layers. His results were that all three of the test networks converged producing success rates of 99.25%, 98.90% and 98.75% respectively. (Success rate means that a network discovers and settles into connection patterns that satisfy the relationships between input and output data.) He writes "This was a very encouraging level of performance...acceptable, even in a legal application" However, unlike Bochereau et al., Bench-Capon found that the "...analysis of the networks proved disappointing. " The networks did not appear to appreciate the 134  effect of age for men and women on entitlement; another condition - the distance/inpatient status seemed always assumed to be satisfied. He concludes "...an apparently acceptable level of performance can be achieved without any identification of some of the significant features of the problem. " Thus he tried a second time, using data where, of 135  the 1200 failing cases, only one condition failed. For one and two layer neural networks, he achieved a convergence in excess of 95%. This time the network appreciated that there was a discrimination between men and women of 5 years with regard to pensionable age, (although for some inexplicable reason, it concluded that women become senior citizens at 45 and men at 50.) The distance/patient status condition also appeared to be satisfied, providing a "picture...acceptably close to the truth and almost exact in the case of inpatients."  What this experiment proves is that successful training is not necessarily  indicative of the correctness of the underlying rationale. A neural network can come to  133 134 135  ibid, atp.293. ibid, atp.294. ibid, atp.295.  89  the right answer for the wrong reasons. This is an important point for legal theorists; it confirms empirically what I argued earlier, namely that any belief that neural networks could somehow reveal the inherent "deep structure" of a given legal domain, should be dispelled.  As to whether we can derive rules from sub-symbolic architectures, he found some encouraging results (albeit combined with some significant deviations.) Thus BenchCapon remains skeptical of our ability to infer rules from the results of a neural network, particularly if the domain involves complex combinations of factors that are perceived to have significance which will undoubtedly be the case within the domain of law.  3.4  RAGHUPATHI, LEVINE, BAPI AND  SCHKADE.  This team of research workers (who are prolific advocates of Connectionism in law ) 136  may be unfamiliar to many in the domain of computers and law. Their enthusiasm and belief in the task is refreshing; they write "In the long run, results from applications of neural processes to the legal domain will not only lead to a deeper understanding of fundamental legal decision processes but also enable study of the normative aspects of the legal system. " Recalling what we concluded earlier in Chapter 3 in relation to the work 137  See for example, Raghhupathi, W., Levine,D.S., Bapi, R.S., and Schkade,L.L., Towards Connectionist Representation of Legal Knowledge, in Levine,D.S, and Aparicio, M., Neural Networks for Knowledge Representation and Inference, Hillsdale, NJ., Lawrence Erlbaum, 1994, pp.267 - 282. See also Raghupathi, W., Schkade, L.L., Bapi, R.S., and Levine, D.S., Exploring Connectionist approaches to Legal Decision Making, Behavioral Science. 36,1991, pp.133-139. Raghupathi, W., Schkade, L.L., Bapi, R.S., and Levine, D.S., Exploring Connectionist Approaches to Legal Decision Making, 36 Behavior Science. 2, April 1991, p. 134. 137  90  of Warner  , regarding legal reasoning being to a large extent about the selection of one  option from a range of many, Raghupathi et al. wholeheartedly embrace and develop this view.  In support they refer to work by Thagard  MacLean  139  140  on explanatory coherence and  who developed the so-called triune theory of the brain.  So far no other  researchers have applied these theories. Thagard's connectionist network developed in LISP allows the user to compare and explain alternative hypotheses in the pure sciences as well as in law. Applying his network to factual and evidentiary issues of criminal cases, Thagard built a network called ECHO in which nodes represented hypotheses about a given case based on newspaper reports of a given trial. (The examples he addresses are murder cases in which there were no witnesses and juries had to infer on the basis of circumstantial evidence as to what actually happened.) His conclusion was that if there was an explanatory coherence between potential hypotheses, the nodes exhibited a excitement otherwise absent. Using such methodology, Thagard ran networks that could imply the guilt of a Defendant in a given case. It should be mentioned that Thagard does not propose to introduce this technology in any form within the legal discipline, at least not in its present state. He was merely using law as an example of how humans reason in parallel using competing hypotheses, choosing between two or more explanatory versions of events. One can imagine an extension of this so-called localist connectionist network to actual legal texts involving civil law rather than merely newspaper reports. The ECHO  See note 114 in Chapter 2. Thagard, P., Explanatory coherence, in Behavioral and Brain Sciences. 12, 1989, pp.435-502. MacLean,P., The triune brain, emotion and scientific bias, InF.O.Schmitt (ed.) The Neurosciences: Second study program, New York, Rockerfeller University Press, 1970, pp.336-349. 1 3 8  1 3 9  1 4 0  91  program would appear to possess the ability to map out the analogical relationships that exist between a multitude of precedents and the case in issue.  3.5  T H E TRIUNE T H E O R Y OF T H E BRAIN  MacLean's hypothesis is that the human brain may be divided into three layers, based largely on our evolutionary development.  The first, "reptilian" brain focuses on the  survival of the species through instinctive behaviour. The second, "old mammalian" brain is responsible for survival of the individual through emotions such as fear, love and anger. The third "new mammalian" brain is responsible for rational thoughts and strategies. The team of researchers led by Raghupathi maintain that this theory of the integrated instinctive, emotional and rational brain provides an appropriate conceptual framework for human thought processes. Thus reliance on pure rationality as a model to determine choice in legal reasoning or economic behaviour modeling is too shallow. It follows from this that any A l application in law that ignores the relevance of the instinctive and intuitive character of humankind will inevitably prove inadequate.  Although developing a firm  theoretical base, to date, Raghupathi et al. have yet to produce substantive models in support of their contentions with regard to Connectionism and law.  3.6  PHILIPPS  Another seemingly undaunted advocate of neural computing is Lothar Philipps . He is 141  of the belief that rather than supplying these decision-support tools to particular parties in  See Philipps, L., Distribution of Damages in Car Accidents Through The Use of Neural Networks, 13, Cardoza Law Review. 1991, pp.987-1000, and^re Legal Decisions Based on the Application of Rules or  141  92  a dispute, they should be looked upon as modern science's "tool for justice, an especially subtle set of scales. " The system that Philipps describes interestingly uses the domain 142  of traffic accidents. However, he does not propose to develop a system designed to predict the value of quantum but rather its purpose is to assist the legal official in determining the percentage of the fault to be attributed to each party to the accident. He rightly points out "this calculation is not easy as fault does not have to be proportionate to the damage. The smallest mistake - may cause enormous damage. Thus the relationship between cause and effect is typically non-linear ." The data used is taken from a 143  catalogue suggesting ratios for common types of accidents under German law. Some commentators accuse the judicial authors of these catalogues of being too rigid. Philipps argues that they can be made moreflexiblethrough the use of a neural network. Several networks are used; one for each type of accident. The inputs are ten different scenarios for traffic accidents involving A and B, such as " A changed lanes," " A turned into a parking space", " A changed lanes and B was driving extremely fast", etc. For each input, there is an output of a number between -1 and +1.  -1 means that B pays everything; +1  means A pays everything. Very often the statutory statements read in isolation appear to be contradictory. For instance, there may be general law of the road which we shall call A, which has no exceptions, caveats or "loopholes." In the neural network, as in real world, a more specific rule, B, overrides such general rules. Why is this? " Philipps  Prototype Recognition? Legal Science on the Way to Neural Networks, 2 Pre-Proceedings of the Third International Conference on Logica. Informatica. Diretto, Florence, Martino, A.A. (ed.), 1989. Philipps, L., Distribution of Damages in Car Accidents Through The Use of Neural Networks, 13, Cardoza Law Review. 1991, p.990. Philipps, L., ibid., p.992. 1 4 2  143  93  argues that this is the way the law operates - "it is not a question of logic but of "psychology.""  Of most interest from a methodological point of view is Phillips' conclusion that the outputs varied according to the learning rate prescribed. The learning rate determines how large an adjustment the network algorithm will make to the connection strengths when it gets the output wrong. Philipps uses the useful metaphor of a game of golf. - you reach the target more quickly but may miss it several times. At different learning rates, he discovered outputs of .87, .97 and .61 for a given input statement. I performed similar experiments.  The results are set out later in Chapter 4.  In his experiments, different  topologies did not provide any noticeable improvement in performance.  3.7  INFORMATION  RETRIEVAL  One of the most pressing needs of the legal worker is quick access to a manageable number of relevant legal documents, whether cases, statutes or other secondary legislation. There are a number of existing methods of retrieving information using Boolean, vector space or probabilistic search methods. The most popular forms of information retrieval uses keywords with Boolean connectors. This is used by Quicklaw in Canada, Lexis in the UK, Australia and New Zealand and by West Publishing in the USA. However, there are serious deficiencies with full-text retrieval using single or complex Boolean search operators, such as AND, OR, and NOT, etc. One of the most important  94  studies performed in this area  was performed on an IBM full text retrieval system,  known as STAIRS. It revealed that its retrieval effectiveness was "surprisingly poor." The two most widely used measures of document-retrieval effectiveness are Recall and Precision. The study showed that "STAIRS could be used to retrieve only 20 percent of the relevant documents " although the users believed that they were retrieving a far 145  higher percentage. The study was not presented as a critique of the individual system but as a critique of the principles on which it and other full-text document-retrieval systems are based. The problem is succinctly summarized by Rau: "...words and lexical items in general are a poor approximation to meaning. Even with the addition of thesaurus or a phrasal lexicon, the approximation is not good...It is insufficient to treat words as indices irrespective of conceptual relationships between them. " 146  Given that detennining relevance is a gradable rather than binary operation and heavily reliant upon context, what are the alternatives? Systems which rely on manual extraction and coding of information are feasible up to a certain point. However, this approach would rapidly become uneconomic when dealing with large bodies of legal material. Thus, methods which rely on the automatic codification of text that has already been analyzed or is easily analyzable are to be preferred.  Neural networks have features that  could feasibly provide access to documentary information in new ways.  This approach  Blair,D.C, & Maron,M.E, An Evaluation ofRetrieval Effectiveness for a Full-Text Document Retrieval System, Comrnunications of the A C M . New York, 28,3,1985, pp.289-299. Blair,D.C, & Maron,M.E, ibid., p.293. Rau, L.F., Knowledge organization and access in a conceptual information system, Information Processing and Management 23, pp.269-283. 145  1 4 6  95  builds on views held by Bing  who argues that the words and lexical items by which  documents are indexed should be weighted. In "normal" full text retrieval, all words, whether keywords or merely "noise" words are weighted similarly. One theory is that the relative frequency of the occurrence of words bears some relation to the meaning of a given paragraph. Thus words should "carry" a weight that reflects their importance.  3.8  NEURAL NETWORKS FOR INFORMATION RETRIEVAL  Both Belew's ATR (adaptive information retrieval) and Fernhout  148  adopt this weighted-  word approach , the theory being that the number of times a given word appears reflects 149  its importance in giving the document meaning. Thus the number of times a word appears in any given document is reflected in the strength of the connections between neurons. Automatic text analysis provides the neural network with the requisite numbers, showing how many times a word appears in a document and also how many times it appears in the document collection as a whole. This means that no training is required and that the system is usable very quickly. In large legal databases this approach may not be feasible as the network would take a very long time to train. Moreover, every time a new document is added there are major difficulties. Either the system must be re-initialized entailing a loss of the system's previous "learning" and a dilution of the strengths of the links between a  Bing, J, The law of the books and the files: possibilities and problems of legal information retrieval, in G.P.V. Vandenburghe (ed.) Advanced Topics in Law and Information Technology, Hillsdale.NJ, Lawrence Erlbaum. 1990, pp. 151-182. Fernhout, F., Using a Parallel Distributed Processing Model As Part Of A Legal Expert System. 1 PreProceedings of the Third International Conference on Logica. Informatica. Diretto, Florence, Martino, A.A. (ed.), 1989, pp.255-268. Belew,R.K.,4 Connectionist approach to conceptual information retrieval, Proceedings of the First International Conference on Artificial Intelligence and Law. Boston, MA., 1987, pp. 116-126. 148  149  96  word and a document, or the links remain unchanged by the addition of a new document. Neither of these approaches appear satisfactory.  An approach that seems to largely follow in the footsteps of the work of Belew, is that of Mital and Gedeon . These apparently fearless advocates of Connectionism find fault 150  with vector retrieval approaches because they allow for only one link between a word and a text unit. They argue "this (single link) does not deal with the indirect connectivities, such as are taken into account in a neural network based information retrieval system...In a neural network there can be many paths between a text unit and a word; some direct, some through other text units or words.  It is postulated that this extra connectivity can  discover and capture the semantic significance at a sub-symbolic level. " The system 151  they suggest is complicated. It involves the use of more than one link between any given word and document where that word appears in the document. The first type of links (known as Textual-Associative) are bi-directional and the weight assigned to the link is determined by a normalized word frequency. That is to say, extreme examples of this factor i.e. very high or very low word frequency measure are screened out as being of little value in discriminatory retrieval.  Gedeon, T.D., and Mital, V., Information Retrieval in Law Using a Neural Network Integrated With Hypertext Proceedings of the International Joint Conference on Neural Networks (ICJNN), Singapore, A C M Press, 1991, pp.1819-1824. Gedeon, T.D., and Mital, V., Information Retrieval in Law Using a Neural Network Integrated With Hypertext, Proceedings of the International Joint Conference on Neural Networks (ICJNN), Singapore, A C M Press, 1991, p. 1820. 151  97  In addition they suggest the use of a second type of link called a "latent link." These also exist between every word and every document where the word occurs. These links are assigned  an  initial  weight  of  zero.  The  strength  of  the  "latent  link"  is derived from training during actual use of the system, rather than learning from a training set of materials. The latent link strengths are modified during use based upon user responses supplied by query and answer, gleaned through a hypertext-based user interface. Degrees of relevancy are monitored accordingly and reflected in the strength of the latent links. Thus situations where a judge makes an exclusionary statement (i.e. states "what is not in issue here are issues of privacy,") over time the system will be able to assign negative relevance to the term "privacy" and the given document, even though it is mentioned. Although interesting, one major criticism of the system must be that unless operated on an extremely powerful computer, the response time might be quite slow on a large database. The value of the empirical study must be questioned as the researchers tested their system on a mere 19 documents.  3.9  T H E HYBRID APPROACH  In a later paper  152  Belew is joined by Daniel Rose to argue that both symbolic and sub-  symbolic knowledge is embedded in the law. In order to demonstrate this thesis, they describe their system entitled SCALIR (Symbolic and Connectionist Approach to Legal Information Retrieval), a legal information retrieval system that combines sub-symbolic  Rose, D.E., and Belew, K.B., Legal Information Retrieval: A Hybrid Approach, Proceedings of the  Second International Conference on Artificial Intelligence and Law. Vancouver, A C M Press, 1989, p. 138146.  98  and symbolic paradigms. In this system two types of links between neurons are used which allow for both logical and associative queries and inferences to be drawn. Thus the user can ask for cases "like A v.B" as well as cases that include required keywords. The first type of link are called C-links which embody sub-symbolic data that is apparent to the system without human intervention and interpretation. Examples of this might be the fact that two cases are often retrieved together or that certain words frequently occur in the document.  The second type of link (S-links) are labeled and unweighted symbolic  representations of factual relationships, showing for instance, the dependencies between different sections of a given statute. It is arguable that the system described above might also work equally well in litigation support contexts.  3.10 N E U R A L N E T W O R K S F O R D O C U M E N T A S S E M B L Y Only one group of researchers, (again Mital and Gedeon ) have been so bold as to 153  suggest that neural networks might also have a role to fulfill in the process of document assembly. Document assembly is the process by which lawyers develop documents from precedents with the aid of computer software. There are two main models of document assembly in use - template-based drafting and interactive assistance. The simplest form of template-based document assembly is the "merge" feature on any existing wordprocessing package, which can be used to formulate simple pleadings, tax forms and minor court forms. More advanced template-based drafting tools can create complex but  Mital V., and Gedeon, T.D., A Neural Network Integrated with Hypertext for Legal Document Assembly, Proceedings of the Twenty-Fifth Hawaii International Conference on Systems Sciences. Hawaii, IEEE Computer Society Press, 1992, p.533-539.  99  relatively commonplace legal documents using so-called "boilerplate" clauses that require variation to suit the current purpose. Computerized interactive assistance in document assembly is desirable where a significant  amount of legal knowledge is required.  Commercial programs such as Blankety Blank  154  and Expertext  155  have used logic-based  approaches to develop powerful tools for document assembly. Mital and Gedeons suggest that as an alternative to the use of templates or the use of logic-based tools, lawyers should leave the text in the precedent documents unaltered and instead the system should "simply allow the user to adapt the information contained in them to the current situation. " They then go on to outline how their Textual-Associative and Latent links 156  system could be used to augment hypertext to create a full-text retrieval system, the only difference being that the content of the database would be precedent documents as opposed to cases or statutes. This approach misses the point with regard to existing document assembly. The purpose of a document assembler is to lessen the amount of expertise and time taken in preparingfirstdrafts of complex legal documents. The neural network that they suggest should be implemented does no more than retrieve the precedent in its skeletal form. Much time and effort would still be required to apply the document to its context.  What Mital and Johnson suggest here is no more than a  document delivery system. As we have seen, more document assemblers are used to remove some of the hindrances in drafting complex legal documents. For instance, neural networks do not offer users interactive assistance to ensure global consistency in terms of  Produced by Softstream Technologies Inc., Hollywood, Florida, USA. Produced by Expertext of Toronto, Ontario, Canada. Mital V., and Gedeon, T.D., ibid., p.534.  100  structure and grammar of the document or allow younger, less experienced fee-earners to deal with matters that were previously the domain of more senior lawyers. Mital and Gedeon seem to be stretching the limits of the application of neural networks with this type of application.  3.11  CONCLUSION  We can summarize research to date on neural network applications in legal reasoning or legal "open texture" as follows; either researchers are skeptical of the ability of neural networks to rise to the tasks within the domain of law and the researchers' results tend to compound that skepticism, (e.g.  Bench-Capon.)  wholeheartedly embraces the theoretical  Alternatively, the researchers  perspective developed from philosophy and  cognitive science, that artificial neural networks are likely to prove more successful at tasks which have proved insurmountable for traditional Al. However they have yet to develop results that confirm their suspicions. This thesis fallsfirmlyin the camp of the latter. The task of this thesis is develop a neural network that is successful, to develop results that confirm my suspicion that neural networks are as good as other forms of computer programming in certain specified tasks within the domain of law. However, an important caveat should be entered at this point. Of the 5 networks relating to legal reasoning (as opposed to indirectly related legal tasks such as document assembly and information retrieval,) only 2 used real world data. Of these two, neither is as complex in terms of number of input neurons or connections, as those networks developed by myself and described in Chapter 4.  101  With regard to present applications of neural networks in law, it would appear that neural network technology is finding immediate and successful usage in information retrieval. Regardless of whether they can out-perform traditional statistical computing techniques at the task of predicting quantum, they have some immediate uses and we are likely to see more of this type of technology.  102  CHAPTER 4 T H E WHIPLASH NEURAL NETWORK 4.1  INTRODUCTION  The domain in issue is the assessment of awards for whiplash injuries in British Columbia. After briefly identifying the role of the insurance company and the nature of a whiplash injury and how an assessment is made by the insurance company, this chapter details the attempts made to build a neural network using the legal domain of personal injuries and in particular utilizing the data contained in the "Whiplash database."  4.2  T H E R O L E O F ICBC.  To non-residents of British Columbia (B.C.), motor vehicle insurance arrangements within the Province are slightly unusual. All motor vehicle insurance is provided and administered through the Insurance Corporation of British Columbia (ICBC). Set up in 1973 by the Insurance Corporation of British Columbia Act, (R.S.B.C., Chapter 201,) ICBC is a Crown agency, a non-profit organization and monopoly provider of basic liability and accident benefit coverage to all owners of vehicles licensed in the Province. The rationale behind this arrangement is that having a single licensing and insuring agent bring with it numerous advantages. First, ICBC can almost guarantee the population of B.C. that every resident in the Province has motor insurance.  (It is estimated that there are between  250,000 and 1 million uninsured vehicles being driven on the roads of Ontario .) 157  Comparative Aspects of the Insurance Corporation ofBritish Columbia and Private Sector Insurance Companies in Canada, ICBC, Vancouver, B.C., 1990, p.6. 157  103  Secondly, all vehicle owners contribute to the Autoplan fund out of which claims are paid. The reasoning behind this goal is that no single group of road-users should feel treated harshly. According to ICBC, premiums appear to compare favourably with the private sector (although admittedly the comparative figures published are usually only provided by ICBC.) Premium increases appear to be minimal despite Vancouver drivers reputedly having a poor driving record . This situation of one insurance company for the entire 158  province leads to unusual circumstances whereby ICBC acts for both sides in a dispute. To many, it appears that there is a conflict of interests.  The motor vehicle insurance industry in B.C. is immense. ICBC employs nearly 4000 staff, has 900 brokers and in 1993 wrote over $2 billion worth of vehicle insurance. From that fund, in the fiscal year 1993, atotal of 804,115 claims were made and $1,021,000,000 was paid out in injury related claims . 159  In 1989, there were 587 fatalities in road  accidents and 47,471 people injured .  According to the Ministry of the Solicitor  160  General, Motor Vehicle Branch of B.C., that averages an accident every 3.63 minutes and an injury in an accident every 11 minutes. Needless to say, the task of assessing the cost of those accidents falls on the shoulders of an army of ICBC adjusters.  Should the  matter turn litigious, the final arbiter of the matter will be a judge.  An article in the Report on Business magazine, (June 1989) provided statistics compared 27 accidents per 1,000 people to 36 in Montreal and 70 in Vancouver "which has the worst drivers in the country." Insurance Corporation of British Columbia, ICBC Annual Report, Vancouver, B.C., 1993, p.5. Ministry of the Solicitor General, Motor Vehicle Branch, Motor Vehicle Statistics- 1989, 1989, Victoria, B.C. 1 5 9 160  104  4.3  W H A T IS W H I P L A S H ?  Whiplash is one of the commonly diagnosed syndromes in patients who have been involved in motor vehicle accidents. The term was first used in a medical paper in 1928 . 161  Since that time it has proved to be an ambiguous condition which although common, has not been thoroughly researched by the medical profession. Part of the reason for the lack of interest in the condition is what is described by Croft as "its rather unsavoury reputation, owing to its common association with litigation, discouraging most researchers from delving into this "Pandora's box. "" The term covers a wide variety of complaints 162  such as neck pains, blurred vision, head-aches, dizziness, nausea, back pain and numbness. It may also result in surface hemorrhages of the brain and cerebral concussion. Whiplash may be combined with other injuries such as fractured or broken bones or skull, and crushed vertebrae.  When a car is struck from behind, the torso of the occupant is accelerated whilst the head is snapped backwards. It may also be complicated by other movement of the head i.e. from side to side or the head may be at an angle when the injury occurs. As the force of the collision dissipates the head then snaps forward and is forced onto the chest. If the car strikes any further objects such as a car in front, then the injuries may be exacerbated. However, whiplash is also an extremely easy injury to fake and patients who complain of any of the above symptoms after a car accident are often perceived by the medical  Crowe,H.E., Injuries to the Cervical Spine, paper presented at the meeting of the Western Orthopedic Association, San Francisco, 1928. Foreman,S.M., & Croft, A.C., Whiplash Injuries - The Cervical Acceleration/Deceleration Syndrome, 1988, Wilkins & Wilkins, Baltimore. 162  105  profession as either malingerers or hypochondriacs. This argument is, according to certain Plaintiffs' Counsel , one commonly made by Defense attorneys. 163  4.4  HOW  A R E D A M A G E S ASSESSED?  An adjuster is the representative of the insurer (i.e. ICBC) who seeks to determine the extent of the ICBC's liability for loss when a claim is submitted. There are three main types of investigation - coverage i.e. determining under what head the loss falls; wage loss investigation; and medical investigation - the so-called corefromwhich all damages flow. These three types of investigation provide the adjuster with a quantum figure to offer to the Plaintiff. If the figure offered by the adjuster is thought by the Plaintiff to be too low, the process of obtaining an award will become adversarial as the Plaintiff is required to seek independent legal advice. It will be the job of Plaintiffs Counsel to present the clients case in the most comprehensive manner, consistent with the truth, by emphasizing the nature and duration of the symptoms, the progress and present status of the patient. Defense Counsel who will represent ICBC will focus on those factors which adversely affect the recovery process and elucidate on the degree to which the symptoms have been resolved.  4.5  T H E INITIAL ASSESSMENT  The job of making an initial assessment is given to an ICBC-employed adjuster. Employees of ICBC can spend their entire careers assessing injury. Therefore, many of them are very skilled in the art. However, a junior adjuster begins to develop experience  Keith and Laura Magee of McGee and Company, Vancouver, B.C., a firm of lawyers that specializes in Plaintiff personal injury. 1 6 3  106  of the value of a whiplash injury by using what Plaintiffs' Counsel commonly describe as "the meat chart", a table of values developed internally by ICBC showing what figure they believe an adjuster should settle with in any given case. From this, the adjuster goes on to develop experience of likely quantum figures. How then does an adjuster arrive at a figure? Does she/he use a rule-based mechanism, does she/he simply know a $30,000 case when she/he see one or is there some other complex reasoning system at work? (At this point we should have in mind the arguments developed by the Dreyfus brothers, mentioned in Chapter 2.)  Evidence gained from interviewing Plaintiffs' Counsel  164  suggests that lawyers use a combination of the above methods, although the balance between them depends on the case. It was suggested to me that one must also have a feeling for the social, economic and political environment in which legal actors operate: These factors are more thoroughly explored in the conclusion. But how does a judge reason? We have very little evidence because judges are generally very unwilling to permit third parties to scrutinize their practical reasoning processes, partly for fear that it might permit Counsel or other legal actors to "second-guess" them.  4.6  T H E STATISTICAL QUANTUM  ADVISOR  This is the system against which the whiplash neural network is to be judged. Members of the FLAIR project have been quick to recognize the importance of an automated quantum estimator which would work in conjunction with an information retrieval mechanism to retrieve relevant cases. Although the software developed as part of this thesis does not  Keith Magee, see note 163.  107  include an information retrieval mechanism it is clear from the review of neural computing applications in law, in Chapter 3 that the potential also exists for a neurally-inspired information retrieval mechanism. Using a statistically-based model and implemented in the C programming language under DOS, database parameters are used to assess the weight given to each field in the database and the relationships that may exist between them. To ensure fairness, the Whiplash neural network uses identical original data.  The Statistical Quantum Advisor is methodologically problematic. The neural computing advocates would point out that case law has developed organically. How then, can a rigid mathematical formula that is no more than the summation of one person's belief as to how quantum is assessed, possibly capture the seeming complexity of the domain? On the face of it, the neural network ought to out-perform the Statistical Quantum Advisor for two reasons; First, the neural network develops its conclusions "organically" by trial and error; Secondly, the data is untouched - it does not involve any human interaction or interpretation, before entry into the computer.  This is not to say that a computer's  reasoning is inherently superior to that of a human being. Indeed, the experience of A l over the last forty years suggests quite the reverse. However, neural networks develop their own interpretation of data. Thus there is one less level in which interpretational mistakes can be made. The neural skeptics would counter such arguments by pointing out that law is a hermeneutic science - its very essence is interpretation. To argue that the involvement of human reasoning in some way infects the pure body of reason that is the law is to misconceive the nature of law. Rather than adopting one side or the other of  108  such a debate, we should focus on the empirical data, such as that described in this Chapter.  4.7  METHODOLOGY  Having briefly discussed the context, in this part of the Chapter, I outline how the data in the Whiplash database was manipulated, data problems and the process of building a neural network.  There are six stages to the design process of building an artificial neural network. These are as follows;  •  Decide what it is that the Network is required to do. The project that incorporates the practical aspect of this thesis is the development of an artificial neural network that attempts to predict quantum in a given case involving whiplash injuries.  •  Decide what information is to be used. The prediction that is made by the ANN is based on the relationship(s) that exist between various categories of data (the input) and the quantumfigurein previous reported cases of whiplash (the output).  •  Get the data - The Whiplash Database  •  Build a network.  •  Train it  •  Test it.  4.7.1. T E S T 2 . P R G The first task was to decide which cases could be used to train the ANN.  This was  discussed with John McClean, a programmer with FLAIR and a qualified lawyer in B.C.  109  who has significant experience in dealing with personal injury cases involving whiplash injuries. With the assistance of Keith MacCrimmon, who is also a programmer at FLAIR, was able to build a short program using a BASIC-like programming language in dBASE™ that was able to strip out all of the unwanted cases and provide myself with a relatively clean data set. A copy of the code is attached, with comments at Appendix A (entitled Test2.Prg.) Having removed all cases that mentioned any of the above factors I was left with a data set of approximately 660 cases, from an original 1661 consisting of approximately 60fieldswhich would be used as inputs. While my experiments continued more cases were added to the database, which in turn increased the set of useful cases to 705.  Within the existing whiplash database there are 115 fields  165  Many of thesefieldswere  irrelevant for my purposes. (For instance, whether FLAIR has a hard copy of the case in its records is of no consequence to the purpose of predicting quantum.)  After  consultation it was agreed that a total of 54fieldsshould be deleted. These included, inter alia; Court registry location; File number; Judge's title; B.C. decision citation; Pain and suffering award - (a character string, sometimes mentioned in the reports); Pain and suffering award - a numeric range; Case comments 1-3 (this were fields in which descriptions of symptoms could be included e.g. "The Plaintiff has felt slight neck pains on and off..."); Whether legal issues of proof or remoteness were mentioned in the case; Whether the case included cross references to other cases in the database; Whether there  A database field is defined in the dBASE manual as "An individual data item in a record." See Ashton-Tate, Learning and Using dBASE III Plus, 1987, p.G-8.  110  was a hard copy of the case in FLAIR's records, etc. Some of these fields such as file number were clearly irrelevant to the overall award. Others, such as age and sex of the Plaintiff were considered by myself to be material. However, on advice from John McClean, they were also deleted.  Other databasefieldswere used to act asfiltersi.e. if the case included any of the factors mentioned below , the case was skipped because it was felt that the information would 166  initially "confuse" the network. Other factors used as to "filter out" potentially difficult cases were; If the case involved a Jury trial - a rare event which is believed by lawyers to "buck the trend" that is believed to exist in Whiplash damages awards. At one time, it was a commonly-held belief amongst lawyers that juries tended to be far more sympathetic to the ailments of the Plaintiff than case-hardened judges. (It now appear that this trend has reversed. Indeed according to Plaintiffs Counsel , ICBC now look for opportunities to 167  use jury trials because they believe that, in these times of harsh car insurance premiums, juries are proving to be less sympathetic than the judges. In any event, jury trials tend to upset the accepted trend of awards whatever it may be.) A full list of those facts which were used as inputs is attached at Appendix B.  For instance, if the database record mentioned that the case made mention of the fact that different types of damages were included in the non-pecuniary award (such as economic loss); if there were multiple accidents (i.e. if there were more than two parties involved in the accident that would complicate the award because some damages would be awarded against one party and some against another); if any mention was made of the notion of the "thin skull' concept; if there were any pre-existing injuries. If any of these factors were mentioned the case was not used for training purposes. Keith Magee, see note 163 167  111  4.7.2 D A T A P R O B L E M S The problems with the database were many. For instance, the whiplash database has evolved and continues to evolve over time. Many people have been involved in inputting the data and we have no guarantee that they did not interpret cases in different ways from one another. Despite the fact that thefieldscontain largely numerical data, the human data processors are required to exercise a degree of discretion. Also newfieldsand codes have emerged with use.  The first problem encountered was that very often certain field had no entries. For instance, sometimes the date of the injury would not be mentioned in the judgment. However, the processors has only the written case judgment to work with. Therefore we do not possess a full account of all the evidence presented at trial. We have only a truncated version. The question was - what should one do in order to maintain the accuracy and scientific legitimacy of this experiment? Was it safe to presume that if a particular matter (such as a visit to a physiotherapist) was not mentioned in the database record that it had not been raised at trial? There are a number of possible scenarios. There may have been one or more visits to a physiotherapist, but the lawyer for the Plaintiff omitted to mention the matter for one of many reasons, the most likely scenario being that the lawyer decided that the judge was not one who takes much account of visits to physiotherapists.  Alternatively the judge, despite hearing evidence of visits to  physiotherapists and such like, might have decided that this was indeed irrelevant to the overall award. Afinalalternative is that the judge thought it was important but omitted to  112  mention it from the judgment. We have no way of knowing without going back to the original trial transcript and accompanying documentation for each case record. Thus a pragmatic decision was taken that if a particular factor was not mentioned then it should be assumed that that factor was not present. When an entry in a field was considered to be important, the hard copy of the case was consulted for clues to the missing data. For instance the date of the injury was often omitted. From a perusal of those cases, it was concluded that it would be safe to imagine a delay of no more than 1 year from injury to trial. In order to avoid possible data errors made through intelligent guesswork, as a rule, those fields were not used.  Researchers on neural networks have made numerous claims about the ability of neural networks to work with incomplete data. This was a golden opportunity to test such assertions. Although these issues would be highly problematic if one wished to build a system for commercial exploitation by ICBC or B.C. lawyers,  for the purposes of  comparison, these issues are irrelevant, because the Statistical Quantum Advisor also worked with the same incomplete data.  4.7.3 D E C E P T I V E D A T A In order to fine tune the network further, I felt that it was necessary at least initially to take account of inflation and therefore the year in which the judgment was handed down. All dates in the database were in a form "--/--/--" which is clearly unusable. Days and months were less important to the experiment. However year figures could provide a  113  rough estimator of the inflation.  However, dates can be deceptive data for neural  networks. There are two ways of interpreting them in databases. We might wish to use dates in a way so that 1986 is no more important than 1985 or less important than 1987. That is to say that each is a unique year. They merely need to be distinguished from one another. Therefore to place all dates in the same column of data would be a mistake. However, in the discipline of law, dates are important in deciding whether a given case is still precedent - e.g. depending on the nature of the dispute, in a given case one party might argue that because a case is from, say 1926, it is too old to be relevant. Also, given two factually similar cases, each with different outcomes but one decided in 1985 and the other in 1994, the fact that one is more recent than the other is an important consideration.  If we take the view that each year is different rather than more or less important than another, there are two solutions. The first is to rearrange the data so that each year or groups of years have their own columns and given either a positive (1) or negative (0) value. The second is to convert the number into a symbol. By executing a Brainmaker command one can make unique neurons of each different number in the same column of data. If however, we take the view, and we do here, that a more recent case is a more valuable precedent than an older case, then we can place all the years in the same column so that the network interprets 1986 as a number more important than 1985 by an increment of 1. This view was adopted and utilized in the Test2.prg program. However, early experiments with the neural network software did not use this data.  114  4.8  THE SOFTWARE  The Brainmaker ANN software is one of the most popular neural network applications for the PC available . Used throughout this project, it comes with a built-in database entitled 168  Netmaker that, amongst other things, allows the user to identify which columns of data are to be used as inputs and outputs. This facilitates the quick creation offileswhich are used by the neural network software.  4.8.1 H I S T O G R A M S Brainmaker includes a number of tools that allow the user to test how well the network is training as the network trains. These tools are known as the Network Progress Display (NPD) and the Column Histogram. The NPD consists of two graphs.  Thefirstgraph is described as "a histogram of the errors over the whole (single training) run." According to the software manual "the error for each neuron is the absolute value of the difference between the pattern (expected value) and the output (actual) value. The error display is a histogram where the height of each bar corresponds to the number of error values within the range of the bar ." That is to say that the horizontal axis is the 169  error level and the vertical axis is the number of outputs that were within an acceptable range. Thus the ideal NPD diagram shows columns bunched together on the left side of  According to the manufacturers, California Scientific Software, by September 1994, over 18,000 copies of the software have been sold. California Scientific Software, Brainmaker User Guide and Reference Manual, California Scientific Software, Nevada City, CA., 1993, p. 10-34. 1 6 8  1 6 9  115  the Histogram as shown by Figure 4.1(a) overleaf where all the error values are within a tolerance of less than 0.1.  Graph 4.1(b) shows a typical graph when the network  commences training with error values up to 0.5 (i.e. 50%) off target.  As training  progresses, the bars gradually move leftwards across the graph.  Figure 4.1 (a) & (b) - Network Progress Display Graph 1  30  0.1 X  The  0.2  HH  = F i g u r e 4.1(a) i d e a l  network  I I I—I  = F i g u r e 4.1. (b) typical network at c o m m e n c e ment of learning  X  = error level, Y = n u m b e r of outputs within error  range.  second NPD graph below is a display of the progress of the RMS (Root Mean  Squared) error. According to the software manual, "the RMS error is much like an average error In a healthy learning network, this value should decrease from one run to the next. A good training progression is represented by an overall downwards trend from left to right.  A network that is not learning as easily would show a more gradually  declining curve or a curve that showed occasional increases in value.  116  Figure 4.2 Network Progress Display Graph 2  Y = overall error level for a r u n , t h e R M S error  0.34  X  = r u n  n u m b e r  Y  0.060 0  300  100  4.8.2 T R A I N I N G , T E S T I N G A N D  400  RUNNING  Once the user has decided which data fields to use as inputs and outputs, Brainmaker automatically divides the database in question into a set of training records (by default, set at 90% of the total database, written to ****.fct) and a set of testing records (the remaining 10%, written to ****.tst.) Every tenth case is placed in the Testing file automatically. Brainmaker takes every 6th, 16th, 26th case, etc.  Training is the process by which the input/output pairs are delivered through the network repeatedly. The learning algorithm (the back-propogation rule mentioned in Chapter 1) corrects the network until the tolerance parameters for all facts in the training set are satisfied. Testing is the process where those 10% of the cases set aside are put through the network.  This allows the user to track results without correcting the network.  117  It is  important to note that for a fact to be considered "good" it must be afigurewithin the Tolerance band (i.e. 10% if the Tolerance is set to 0.1) of the maximumfigurewithin the output set. Thus, using the Whiplash database which has a maximum quantumfigureof $100,000 (in the case of Adachi  v. Westhaver) the required output must be within  $10,000 (10% of $100,000) of the actual output in order to be considered a "good" fact. Tolerance is defined as a percentage of the range of output.  Testing should be distinguished from running. Running is the process whereby the user presents new, unseen data to the network and requests an output. Of the many features provided by Brainmaker, only training and running are required although, for a developer not to test the trained network before presenting it with new data would be extreme folly. The testing tolerance is set by default at 0.4 (i.e. 40%) Thus clearly unacceptable results such as an output of $28,000 against a required pattern output of $14,000 would be considered a "good" fact by the network because it was within $40,000 (40% of $100,000) of the required output.  4.8.3 S E V E R I T Y R A T I N G - A N  EXPLANATION  Myfirstidea was to have two outputs - Quantum and the Severity Rating. This field is a numerical factor used in the database to identify how the judge perceives the case before him/her in terms of severity i.e. if the judge says in his/her summary - "this is a mild case of whiplash," this would merit a "1" in the appropriate databasefield, whereas if the injuries are described as severe, this would merit a "6." Clearly this field cannot be used as an input as it is afindingof the judge. However, I felt it would be both interesting and  118  helpful to see if the network could distinguish severe from mild whiplash injuries based on the 60 or so fields of data used as inputs.  4.8.4 T H E F I R S T N E U R A L N E T W O R K The first neural network failed to yield any results of value. Randomness is the kindest adjective one could use. After 40 hours of training during which time the network failed to finish training on the training set, the network had still to distinguish the most severe from the mildest cases of whiplash. The results appeared to be very arbitrary indeed. However, the network appeared to have made more progress with the Quantum factor. Thus I felt it would be more prudent to not bother with the Severity Rating as a conclusion until I had crossed the first and more important hurdle of predicting quantum.  A cursory glance at the 30-40 new cases that were not used for either training or testing of the neural network gave some clues as to why attempting to predict the Severity Rating was problematic if not almost impossible. For instance, in the case of Zizic v. Rinta (1) (BCSC), Judge Boyd describes the whiplash injury sustained as moderate to severe and yet awards damages of $10,000 whereas only four months earlier, Judge Hunter awarded $25,000 for a "moderate" case of whiplash in Stewart v. Mitchell. Equally paradoxical is the earlier case of Phut v. Dhaliwal (1987) (unreported) who, for a whiplash injury which merited a "6" on the Severity Rating scale, received a measly $850. However, perusal of the hard copy of the case reveals that at the time of entry into the database, "6" referred to an "unclassified" rating and in fact the range within that field numbered from 1 to 5. Small wonder therefore that the neural network appeared to show a degree of what the  119  manufacturers describe as simulated "brain damage. But what of neural networks' famed ability to cope with conflicting data? It should be said that the network did settle into a pattern, but in order to accommodate the data, the network had to be generalize so much as to be worthless. This type of contradictory information also suggested at least one reason why the rule-based approach to Whiplash quantum assessment was doomed to failure from the outset.  4.8.5 T H E N E X T E I G H T N E U R A L  NETWORKS  Having decided not to use the Severity Rating as a Pattern Output , Network #2 trained 170  comparatively well. The tolerance factor was set fairly high (0.2) so that the network was more likely to form some kind of conclusion regarding the relationships between inputs and outputs. This policy succeeded. By the 222nd run of the 595 training records, some 572 were "good" and a mere 23 were "bad." However testing revealed large errors between expected and actual outputs.  I wanted to stop the training early, (i.e. within the first few hundred epochs/data runs) because writers in this area, (and in particular, the manufacturers of the Brainmaker software) warn against creating a network, which whilst satisfying the tolerance criterion, fails to generalize well; i.e. it performs well in training but tests poorly. What I wanted to develop was a network that could generalize extremely well even though it might not train correctly on every given training fact.  See Appendix G - Glossary of Brainmaker Terms.  120  How does one know whether the neural network is "learning" properly? The software possesses a further graphing capacity which allows the user to check on the mental health of the network. A healthy network with a capacity to learn is typified by a histogram resembling a "New York skyline" as shown in Figure 4.3, where most weights are clustered around zero.  Figure 4.3 A "healthy" learning neural network (Y = Number of Neuron Weights; X = Weight Values)  Y  O  X  +  A network with a histogram as shown in Chart 4.4 below (which might be likened to a U shaped valley) reflects a "brain-dead" network; that is to say, a network that has memorized rather than learnt from the facts, with connection weights piled at both extremes.  121  Figure 4.4 A "Brain-Dead" neural network (Y = Number of Neuron Weights; X = Neuron Weights.)  0  Networks #3 and #4 were left to run for many hours as a test of the performance degradation curve. Early examination of the Histograms showed a healthy learning graph but later results e.g.  runs 200-500 suggested a "brain-dead" network that had  "memorized" rather than learnt the required results. From this I learnt that although the amount of time taken for the network to learn was not important, but the number of runs was; Using this data, on over 200 runs it seemed that the network could learn nothing further from the data. I found this result quite surprising given that it was so complex. However given the results it seemed likely that the network that provided the best generalization also would provide the best predictions. The Network Progress Display (NPD) curves suggested that the 1st 20-40 runs of the training data removed the majority  122  of the errors.  The next 80 removed more and the next 100-150 were fine tuning.  Therefore, on this data, the network developed in the 100th-200th run seemed most likely to provide the network that would satisfies our requirements in terms of the most accurate predictions. This seemed very surprising indeed as neural networks often can take days to learn the required patterns in order to produce satisfactory outputs. I wanted to test the network by training it on thousands of runs of the data. Brainmaker would take days to perform such a task. Therefore I resorted to an alternative neural network written in C which ran on the SUN computer. The results relating to this experiment are provided further on in this chapter.  The results of Network #5 were invalidated because of my own human error. The column entitled Severity rating was included as a input. This is an inappropriate use of such data because this field reflects a judicial conclusion and, if anything should be used as an output.  4.9  H O W M A N Y HIDDEN NEURONS?  Knowing how many hidden neurons to use is part of the art of developing a good neural network.  It seems that there is no definitive answer as the number of neurons is  application-dependent. As a general rule, if one uses more facts and the fewer hidden neurons, the network will "generalize" more easily. However, this rule does not always hold true. Very often it is possible to "prune" out superfluous hidden neurons ( or adding neurons) to improve performance without losing previous training. Literature on neural  123  networks provided me with some guidance. For instance one idea mentioned by the developers of Brainmaker recommends the following formula; No. of hidden neurons = (inputs + outputs) 12  In this scenario, with 60+ inputs and 1 output this means in the region of 30+ hidden neurons. An alternative rule of thumb is that the number of hidden neurons should be equal to the error tolerance times the number of training facts. However this would be equal to 60 hidden neurons - almost as many inputs. Other literature  171  suggested that the  upper bound for the number of hidden neurons can be determined by the number of training examples according to the rule - five examples for each weight. We have nearly 600 examples therefore we easily satisfy this criterion. Early experiments were made with the number of hidden neurons around the 30 mark.  Network #6 used 27 hidden neurons in the middle layer. After 196 runs of the data, it successfully trained on a Tolerance of 0.1. Network #7 used 37 and trained successfully after 340 runs of the data. Network #8 used 17 neurons and trained even faster with a mere 102 runs of the data.  All other factors remained the same.  The makers of  Brainmaker warn the user that reducing the numbers of hidden neurons can impair the networks performance. Thus whilst network #8 trained very quickly it did not test at all well. Each network was tested by taking random sample results from the 66-case testing file which Brainmaker generates automatically. The results are set out in Figure 4.5 below and the most accurate output for each fact is marked with a star (*);  171  For example Klimasauskas, C , Applying Neural Networks. PC A l . May-June 1991, pp.20- 24.  124  Figure 4.5  Results of experiments with varying number of hidden neurons  CASE NUMBER REQUIRED OUTPUT 9 19 29 39 49 59 66  $8,000 $18,000 $15,000 $12,000 $7,500 $2,500 $30,000  NETWORK #6 (27) $10,165* $11,532* $28,329 $11,679* $8,334* $5,599* $11,434  NETWORK #7 (37) $10,311 $34,945 $16,073* $17,147 $12,606 $9,359 $16,195  NETWORK #8 (17) $11,019 $24,594 $25,570 $24,374 $14,926 $8,749 $36,410*  Given that all of the above three networks showed significant errors in testing, I decided to run a separate trial of networks using similar data, with various numbers of hidden neurons but concentrating purely on the training statistics that the network made available to me. I used networks with 60, 50, 40, 30 20 and 10 neurons respectively in the hidden layer and trained them over 100 runs of the data. Tables showing the number of "good" and "bad" runs together with the RMS error are provided below. A network that trains perfectly would attain 598 "good" facts and 0 "bad" facts in the shortest possible time. It should also provide an RMS error as close to zero as possible which would be represented graphically as a steep declining curve from left to right as in Figure 4.2. However, we still need to bear in mind that the network that trains the best does not necessarily also test the best. As Lawrence points out "a network that can't get all the training examples correct may still test well and be more useful than the one that trains well " 172  Lawrence, J., ibid., p.205.  125  Figure 4.6  Results of further experiments varying the number of hidden neurons  60 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 498 541 556 564 571 567 557 574 564 558  BAD FACTS  RMS ERROR  BAD FACTS  RMS ERROR  BAD FACTS  RMS ERROR  100 57 42 34 27 31 41 24 34 40  0.0774 0.0668 0.0627 0.0594 0.0564 0.0568 0.0624 0.0564 0.0626 0.0627  50 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 513 547 549 561 554 563 560 574 568 558  85 51 49 37 44 35 38 24 30 40  0.0744 0.0667 0.0656 0.0614 0.0642 0.0627 0.0626 0.0585 0.0606 0.0609  40 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 533 546 561 561 558 567 563 561 557 594  65 52 37 37 40 31 35 37 41 4  126  0.0718 0.0641 0.0622 0.0627 0.0639 0.0615 0.0621 0.0618 0.0604 0.0519  Figure 4.6 (continued) 30 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 524 547 555 556 557 551 556 555 547 560  BAD FACTS  RMS ERROR  BAD FACTS  RMS ERROR  BAD FACTS  RMS ERROR  74 51 43 42 41 47 42 43 51 38  0.0725 0.0626 0.0650 0.0646 0.0633 0.0649 0.0647 0.0635 0.0648 0.0652  20 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 524 542 562 554 556 552 559 557 548 564  74 56 36 44 42 46 39 41 50 34  0.0769 0.0679 0.0521 0.0516 0.0648 0.0628 0.0626 0.0625 0.0631 0.0600  10 Hidden Neurons, Tolerance of 0.11 RUN  10 20 30 40 50 60 70 80 90 100  GOOD FACTS 515 524 528 538 545 542 540 548 551 562  83 74 70 60 53 56 58 50 47 36  127  0.0820 0.0759 0.0718 0.0693 0.0692 0.0702 0.0689 0.0678 0.0682 0.0669  From the above tables we can see that networks with extreme numbers of hidden neurons do not train as well as those with neurons with 20-40 neurons. The network with 60 hidden neurons trained erratically as did the network with 10 hidden neurons. One of the most significant training results provided by this experiment was the severe dip in the number of "bad" facts in the 40-neuron network on the 100th run of the data set. The RMS error also dropped dramatically at the same time from 0.0604 to 0.0519. I took this change as an extremely good omen and decided that in future experimental networks, the number of hidden neurons would be set to 40. (It should be pointed out that this type of experiment is a fine-tuning device, best done when a moderately useful network has been developed. Thus I return to this experiment later in the Chapter.)  How many hidden layers?  The rule would appear to be "more is not better. " 173  According to the manufacturers of the software used in this set of experiments, adding another layer of neurons offers no guarantee that the network will find a solution where a network with a single layer has failed. Also the network takes much longer to train as the errors have tofilteracross a further layer of neurons. Therefore I did not perform any experiments where I changed the number of layers of neurons.  4.10 E X T R E M E D A T A R E M O V A L The distribution of data can drastically affect the networks ability to learn. Performance of the network is likely to improve if the network is forced to look more at the common values. Whilst I was aware that the neural network needs a large training set of data with  1 7 3  Lawrence, J, ibid., p. 194.  128  which to learn, it is advised by the makers of the software that isolated facts at the extreme minimum or maximum of the range of data are to be avoided and therefore certain unusual cases such as the case of Adachi v. Westhaver (where an exceptional damages quantum of $100,000 was awarded) were identified for removal from the database once all other factors had been tested. Other cases, such as those where quantum had been identified at $0 were also identified. I was aware that the Statistical Quantum Advisor program did not use certain problematic cases and so far as possible, I wanted to ensure that the same cases were removed. We return to this issue later in this chapter.  4.11  TOLERANCES  Tolerance specifies how accurate the output must be to be considered correct or "good." It is defined as a percentage of range of the output. It may be likened to a camera being focused very slowly. A high tolerance gives a blurred image of the pattern that we are trying to create. One may liken the process of tightening the tolerance to focusing a camera on an object and adjusting the lens so that the object is clearer through the view finder. If the tolerance level is set at 0.1 (as it is by default) and the network provides an output that is within 10% plus or minus of the maximum output then the fact is considered good. Not all of the above networks converged during training; in other words, they were having trouble learning/focusing. However, when the tolerance factors were loosened the networks seemed to train more easily. Even when the tolerance was loosened only slightly to 0.12 from 0.1 i.e. 12% instead of a 10% margin of error, the network trained far more easily requiring fewer runs of the data. By gradually tightening the tolerance factor the  129  network trained quickly on the small remaining set of "bad" facts. This was a minor but nevertheless useful discovery if speedy training is an important criterion for the user. If the user is having difficulties running a trained network, one of the ways it can be improved is by starting training anew with a very loose tolerance (for instance, 0.4) and gradually tightening it over successive data runs. I tried running one network using such a style of training. I found that the mental health of the network could be preserved for much longer using this process of gradual reduction of the tolerance level. However, there was no mechanism for the automatic adjustment of the tolerance level and therefore, during training, the network required constant supervision.  It would have been an  excellent addition to the functionality of the software had this been possible. Using this system I began training with a high initial tolerance of 0.4 (40%) and over successive runs lowered it gradually to more acceptable levels. However, after 2000 data runs, and a lowering of the tolerance to 0.1 (10%) the networks mental health began to degrade. Also it did not seem to test any better than networks trained more quickly.  4.12 H O W M A N Y O U T P U T N E U R O N S ? This may seem to be a very odd question. After all, is it not the case that this neural network's purpose is to provide onefiguresignifying quantum in any given case? Why then should one need more than one output neuron? It would appear that one of the main reasons why earlier networks provided such erroneous results was precisely because I had used only one output neuron on a data set that contained extremes of data i.e. the vast majority of awards are between $5,000 and $40,000 although some "maverick" awards in  130  the region of $60,000 to over $100,000 also exist. There are no hard and fast rules described in any of the relevant literature on how many output neurons should be used, although the general sentiment as regards input and output neurons is "more is better." Therefore I had to rely on intuition and experimentation. There are a variety of ways in which the quantum figure could be represented; First I could set up neurons representing bands of possible quantum e.g. $1- $10,000, $10,001 - 20,000, etc.; Secondly, I could arrange the data into a binary form with one neuron representing $1,000, one representing $2,000, one for $4,000, one for $8,000 and so on, building up to a final neuron for $64,000. Thus, for example, the quantumfigureof $29,000 would be represented by a binaryfigureof 0011101. This output would then have to be reconverted into a decimal form to be readable by the end user. The problem with this approach would be that, at the higher end of the scale a difference of $5,000 may be dismissed as de minimis. However, at the lower end, an error of $5,000 is far more significant. Such is the nature of the data that one would need the network to train strongly on facts at the lower end of the quantum scale and less strongly on facts at the higher end. For instance, if the network provided an output of $45,000 as opposed to a required pattern output of $40,000 many lawyers would view such a result as sufficiently close as to make the network created of significant interest and value. However, an error of $5,000 between a required output of $1,000 and pattern output of $6,000 is a far more serious error. In order to perform this type of training function, one would have to write the neural network code from scratch rather than using an off-the-shelf package such as Brainmaker. Such a task was beyond my capabilities as a programmer and beyond the parameters of this thesis. (However this  131  would make a very interesting research project for a programmer interested in the internal workings of a neural network.)  Another possible alternative would be to take the logarithm of the quantum figure so that the effect of extremes quantum figures would be lessened. However having spoken with the FLAIR programmers on this point I decided against this notion, the reason being as follows; If I used log (base 10), the sum of $100,000 would map to a value of 5 and the sum of $10,000 would map to a value of 4. That would mean that most of the awards would be bunched together between 4 and 5. It seemed that using this solution, I would be going from one extreme to another. The purpose of the data manipulation exercise is to "clip" the extreme values rather than using a metaphorical sledgehammer on all awards, extreme or otherwise.  At their suggestion I decided to attempt a less powerful data  manipulation exercise by taking the square root of all quantum values instead.  4.12.1  BANDS  The output values in the original database do not fall neatly within bands of $10,000. A brief analysis of the quantumfiguresin the Whiplash database shows that awards in excess of $50,000 are quite rare. The majority of the awards are between $5,000 and $40,000. The bands in the adapted program, Test3.prg, (a segment of which can be seen in appended form at Appendix C) reflect this fact. Initially, a total of 10 output neurons were used representing the following values; less than $150,000, less than $100,000, less than $80,000, less than $60,000, less than $40,000, less than $30,000, less than $20,000,  132  less than $15,000, less than $10,000 and less than $5,000. The network was trained with 40 hidden neurons and a tolerance of 0.12.  The first observation made was that the  network trained much more slowly than previous networks. The RMS error curve began at a higher point (0.3410) and took longer to descend - an indication of slower training. The Column Histograms which reflect the mental health of the network degraded surprisingly quickly. By the 60th run of the data, they were showing characteristics of a "braindead" network, suggesting that nothing further could be learnt. However, for the first time, the network seemed to test extremely well. (Changing the number of hidden neurons made little difference to the training cycle.) From this experiment I drew two possible conclusions; either the use of 10 output neurons was too much for the network to handle and therefore I needed to find a compromise solution, of between 1 and 10 neurons that would allow the network to continue to train well whilst maintaining the newly discovered accuracy during testing or alternatively that the data needed further manipulation.  The former conclusion was not convincing. Nowhere in any neural  network literature is the possibility of too many output neurons mentioned. Indeed, most authors suggest using more input and output neurons rather than less; however they do suggest "pruning" the number of hidden neurons. It was suggested to me that I using large figures such as $40,000 would require the network to devote much of its computing power to solve calculations. Networks seems to train better when the numbers used are largely between 0 and 1.  133  To satisfy my curiosity, I decided to try a new experiment with 6 output neurons, maintaining the cluster of neuron values at the lower end of the dollar-value scale and using fewer neuron values at the top end. The values represented by these 6 neurons were as follows; less than $120,001, less than $60,001, less than $40,001, less than $25,001, less than $15,001 and less than $7,501. The Test3.prg program had to be changed and became Test4.prg.  (A copy of the amended part of that program can be found at  Appendix D.) At commencement of training the network revealed an even larger RMS error and the network again trained very slowly. However a higher degree of accuracy was achieved at the testing stage.  There is a methodological problem with this type of neuron architecture. The categories I picked (e.g. less than $25,000, less than $15,000) were chosen by intuition to reflect what I felt were helpful categories. However, in a given case, the required Pattern output might be $16,000, clearly closer to $15,000 than to $25,000. However, with the data set-up left unamended, it would be classified in the "less than $25,000" category. This problem was reflected when testing this network. Very often the results would show an output of approximately 0.95 against the neighbour of the required output neuron suggesting to me that the network was learning extremely accurately but that it was being thwarted by the data manipulation process. The problem was one of fuzzy values; the dBASE™ program used formalistic mathematical functions. In dBASE™ it is difficult to represent "around $16,000" as a number or a symbol; the programming language has no facility for the automatic fuzzification of data. Given the above results, it seemed pointless to pursue this  134  banding method any further. I was being defeated by a "less than" sign. I wanted a network that would say to the user "the quantum is $x,000 give or take $2-3,000 either way." To add ironic insult to injury, when the learning "fired" the neighbouring neuron to that which was required, the network statistics facility told me that, in such situations the given fact was a "bad" fact. However, when the network showed a "good" fact, it was very "good" indeed. This gave me some encouragement that my purpose was achievable. It was a question offindingthe right mixture of inputs and outputs and the appropriate form of data manipulation. In retrospect, this has proved to be no small achievement.  4.12.2  T H E SQUARE ROOT  I was advised to try taking the square root of the output (instead of the logarithm) in order to lessen the effects of extreme examples. A simple command inserted in the Test3.prg program  174  allowed me to produce a new set offigures,between 0 and approximately  320. I then trained a network with 61 input, 40 hidden and 1 output neuron. Using an initial tolerance of 0.12 which was lowered gradually to 0.10 as the network continued to train, the network trained successfully on the training set in 160 data runs. The accompanying "New York skyline" Histogram suggested that the network remained "healthy" and was able to learn more at completion of training.  The network also  appeared to test well; a full table of results is set out below. I squared each output figure that the network produced and was able to quantify the error between pattern output and actual output. The table shows that of the 70 facts reserved for testing, the largest error  174  "IF T->QU >=0, REPLACE B->QU WITH SQRT (T->QU) ENDIF 135  between output and pattern output was $70,228 and in one case, the network trained to within less than $1 of the required pattern. This was one of the more successful network developed, as judged by the testing criterion. (Values in Columns 4 and 5 are rounded up or down to the nearest dollar.)  Figure 4.7 CASE NUMBER 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  Comparative Table of Outputs Using Square Root of Quantum. VOF ACTUAL AWARD 100.04 130.03 63.05 97.00 112.04 81.04 100.04 112.04 71.07 97.00 123.10 87.04 100.04 92.02 141.01 157.65 39.06 126.06 148.02 55.02 87.04 112.04 212.05 141.01 122.01 84.00 100.04 173.03 92.02 134.00 122.04 97.01 130.03  NETWORK OUTPUTV 122.33 147.60 99.96 151.06 97.94 97.94 94.47 116.68 116.68 110.43 126.06 84.42 147.26 112.80 102.66 157.99 66.17 132.82 224.98 106.46 109.67 194.57 213.57 134.17 107.98 138.39 93.12 179.45 106.63 177.00 82.65 117.87 100.55  136  ACTUAL AWARD ($)  OUTPUT SQUARED  10,008 16,907 3,975 9,409 12,553 6,568 10,008 12,553 5,051 9,410 15,154 7,576 10,008 8,468 19,884 24,854 1,526 15,891 21,910 3,027 7,576 12,553 44,965 19,884 14,886 7,056 10,008 29,939 8,468 17,956 14,894 9,411 16,908  14,721 21,785 9,992 22,819 9,592 9,591 8,925 13,614 13,614 12,195 15,891 7,127 21,686 12,724 10,539 24,960 4,378 17,641 50,616 11,334 12,028 37,896 45,612 18,002 11,660 19,152 8,671 32,202 11,364 31,329 6,831 13,893 10,110  (S)  DIFFERENCE ($) 4,713 4,879 6,017 13,410 2,961 3,023 1,083 1,061 8,563 2,785 737 449 11,678 4,256 8,655 106 2,852 1,750 28,706 8,307 4,452 36,343 647 1,882. 3,226 12,096 1,337 2,263 2,896 13,373 8,063 4,482 6,798  34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  100.04 112.04 141.01 122.01 100.04 110.01 110.01 200.06 173.03 200.06 122.01 66.05 157.99 187.05 134.00 141.01 224.05 173.03 173.03 157.99 132.06 173.03 141.04 83.98 128.09 157.99 157.99 157.99 187.05 173.03 126.06 187.05 141.01 87.04 46.99 100.04 246.03  102.75 182.07 108.66 84.76 106.12 64.23 102.15 223.76 114.23 155.54 99.71 99.46 162.13 164.75 74.45 140.42 271.02 173.28 202.68 96.98 131.30 107.64 158.67 122.85 132.56 122.85 321.23 126.99 183.25 116.60 139.07 212.14 114.57 108.07 95.65 229.03 233.42  10,008 12,553 19,892 14,886 10,008 12,102 12,102 40,024 29,939 40,024 14,886 4,363 24,961 34,988 17,956 19,884 50,198 29,939 29,939 24,961 17,440 29,939 19,892 7,052 16,407 24,961 24,961 24,961 34,988 29,939 15,891 34,988 19,884 7,576 2,208 10,008 60,531  10,558 33,149 11,807 7,184 11,261 4,125 10,434 50,068 13,048 24,193 9,942 9,892 26,286 27,143 5,543 19,718 73,452 30,026 41,079 9,405 17,240 11,586 25,176 15,092 17,572 15,092 103,189 16,126 33,581 13,596 19,340 45,003 13,126 11,679 9,148 52,455 54,485  550 20,596 8,085 7,702 1,253 7,977 1,668 10,044 16,891 15,831 4,944 5,529 1,325 7,845 12,413, 166 23,254 87 11,150 15,555 200 18,353 5,284 8,040 1,165 9,869 78,228 8,835 1,407 16,343 3,449 8,015 6,758 4,103 6,940 42,447 6,049  The ideal testing network, in my opinion, would be one that provided results within $23,000/3,500 of the pattern output. Slightly more than this is still acceptable although an error in prediction of greater than $4-5,000 either way is likely to be enough to persuade a Plaintiff one way or the other on whether to pursue the matter all the way to court. Of the above 70 test cases, 26 cases could satisfy the testing criterion of less than $3,500 either  137  way. This translates to 37% of the cases. However, I spoke with two Law students who had worked on inputting cases into the Whiplash database. They agreed that they had developed an intuition of what the case might be worth which was accurate within a $7-10,000 range. Using this scale, 52 of the 70 cases or 74% of the test cases could be considered satisfactory. As can be seen from the above table some of the results were remarkably accurate, some coming within $500 of their required target (e.g. .Facts 12, 16, 51 and 54) whilst others were wildly inaccurate, the worst situation being Fact 60 which was nearly $80,000 off its target output. Clearly there was still a long way to go.  There was also a debt to be paid for using the square root function. The tolerance level of 0.1 was no longer acceptable; if I wanted an maximum error of no more than 10% then the tolerance had to be set at an absurdly low level, because all the outputs would have to be multiplied by themselves to provide an actual dollar figure.  (For some unknown reason the pattern output, when used in training tends to approximate the actual quantum values very slightly. For example, a quantumfigureof $10,000 might become $10,008. In the larger scheme of things, I felt that such alterations were de minimis)  The next task was to see if I could combine the seeming advantages of square rooting the pattern output values with the use of more than one output neuron. Myfirstidea was to develop some form of "fuzzy" banding mechanism which allows for a degradation of data,  1 7 5  Lois Patterson and Serena Chandi.  138  which could overcome the rigidities of banding arranged by reference to the "less than" symbol. The ideal situation would be developing a database that provided values in the following type of graphical form;  Figure 4.8 - Graphical Representation of Fuzzy Banding of Quantum (Y= Pattern Output Neuron Value; X = Square Root Value)  1  I wanted to be able to represent discrete numbers as a continuous variable using a number of neurons that would each partially fire. I decided to divide the square-root of the quantum figures into 5 bands of 108 units with an overlap of 54 (i.e. half) in each band. The above diagram (which shows the use of 5 output neurons each represented by a triangle) is a graphical representation of this design. The number 54 would be represented by neurons 1 and 2 having values of 1.0 and 0 respectively, whereas the number 81 would be represented by values of 0.5 in both the first and second neurons. Similarly, the value  139  of 100 (which represents the square root of $10,000) is given a value of 0.15 in Band 1 and 0.85 in Band 2. Having consulted further with the FLAIR programmers, I became convinced that I could represent a single square-root of quantum figure using more than one neuron. My initial thought was that this could be achieved by a simple coding procedure. In fact the coding proved to be a little more complex. (An appended form of the code with accompanying comments can be found at Appendix E.) The hope was that the neural network, when trained would produce two figures in adjacent bands, one representing the downside slope of a band, the second representing the upside slope of its neighbour. If the trained network could also produce twofiguresin adjacent neurons for every testing fact then this was a good indication that the network had trained well. As can be seen from the table of outputs in Figure 4.9 below, this was not always achieved.  The network developed using this procedure used 61 input neurons, 40 hidden neurons and 5 output neurons. I trained the network over approximately 240 data runs. It did not train fully to completion. Indeed, the statistics provided by the software during training suggested that it trained really quite badly - the RMS error never fell below 0.08 as opposed to minimum values of approximately 0.04 in previous networks and the descent of the RMS error curve was slow and erratic. However, I was able to map the neuronal values back to a rough quantumfigureby using a conversion graph similar to Figure 4.8. This proved to be an incredibly slow process but the alternative of writing a calculation program in the C programming language was equally unappealing, particularly as this was  140  an experiment which might not be used again. The testing results for the first 35 of the 70 testing facts used in this network are set out in Figure 4.9 below; Figure 4.9 - Table of Outputs from Multi-Output Neural Network CASE NUMBER  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  PATTERN  OUTPUT  (TARGET VALUE)  (1 OUTPUT NEURAL NETWORK) 122.33 147.60 99.97 151.06 97.94 97.94 94.47 116.68 116.68 110.43 126.06 84.42 147.26 112.80 102.66 157.99 66.17 132.82 224.98 106.46 109.67 194.57 213.57 134.17 107.98 138.39 93.12 179.45 106.63 177.00 82.65 117.87 100.55 102.75 182.07  100 130.03 63.05 97.00 112.04 81.04 100.04 112.04 71.07 97.00 123.10 87.04 100.04 92.02 141.01 157.65 39.06 126.06 148.02 55.02 87.04 112.04 212.05 141.01 122.01 84.00 100.04 173.03 92.02 134.00 122.04 97.01 130.03 100.04 112.04  FIRST BAN SECOND VALUE BAND VALUE  152 161 73 102 111 81 91 101 91 99 138 87.5 123 111 86 85  159 No Value 84 85 108 81 70(124) No Value 79 99 162(195) 92 142.5 109 55 60  (5 OUTPUT NEURAL NETWORK) 155.5 161 78.5 93.5 109.5 81 80.5(97) 101 85 99 150(178.5) 90 133 110 70.5 72.5  75 160 81 94.5 105 115 102.5 134 83 85 93 79 109 75 52 122 81 101  81 No Value 84 82.5 111 129 102 118 84 96 156 96 157 73 185 138 76 116  78 160 82.5 88.5 108 122 102.2 126 83.5 90.5 124.5 87.5 133 74 118.5 130 78.5 108.5  ?  141  AVERAGE OUTPUT  ?  ?  Using a table similar to that in Figure 4.8,1 was able to find the average $ value for each output. If there were two values, I could take an average. If there were three values for any given fact, I was advised by the FLAIR programmers to think of the first two figures as a set and the second and third figures as a separate set and take averages of each. Generally, there were two large values with a very small third (and possibly fourth) value so calculations of this nature were not problematic. If the two values were very close then it was an indication that the network had trained well on that fact and that it would be safe to have a high degree of confidence in the network. If the values showed a high degree of variance, I could have no such confidence. Certain testing facts caused problems. A good example of this is Fact 11 for which the network gave outputs of 0.452 on Band 1 which converts to 138 (a dollar value of $19,044), 0.999 on Band 2 which converts to 162 ($26,244) and 0.655 on Band 3 which converts to 195($38,025); each of these outputs has a fairly significant values on a scale of 0 to 1 and therefore could not be ignored as de minimis. Thus I had to include 2 values, by taking the average of the square root values for the first and second outputs, then repeating this for the second and third values. (The less accurate of the two numbers is in brackets.)  Sometimes the network showed  characteristics of indecision by providing very low output values in all output neuron bands. A good example of this phenomenon is Fact 17 for which the network provided outputs of 0.0094 in Band 1, 0.0768 in Band 2 and 0.0445 in Band 3. These values were beyond quantification using the above type of graph. One possible reason for the failure of the network in this particular instance was that this network did not take account of the dates of any of the cases. As stated above, Brainmaker takes every 10th case, starting  142  with the 6th, the 16th, the 26th, etc. Fact 17 equates to the 176th case of 705 training cases. Therefore it was possible that this was quite an old case and that factor combined with a unusually low quantum figure of less than $2,000 made it impossible for the network to form any sort of conclusion.  Although there was little I could do to  compensate for a low quantumfigure,apart from not using the case, I decided to include the date factor in future networks. It was pointed out to me subsequently that another reason why the 5 output network might have failed to deal adequately with extreme quantumfigureswas a subtle design flaw. Values of less than 54 had only 1 output neuron to fire, whereas all other values above thisfigurehad 2. Equally at the opposite end of the spectrum, values of over 270 also had only one output neuron to fire. There was nothing against which to balance the first figure. What I needed was two further bandsfrom0 to 54 represented as a downwards slope andfrom270 to 324 as a upwards slope. This might have overcome the problems of extreme values.  (Given the scale of the graphs I was not able to usefigureswith a values less than two decimal places (e.g. a value of 0.08 is too small to quantify on any graph. The alternative would be to write a short computer program. The results of this experiment were that in 19 or approximately 56% of the 34 usable testing cases, the network with 5 outputneurons proved to be more accurate than previous networks using only 1 output neuron and in 15 or approximately 44% of the 35 cases, the 5 output-neuron network proved to be less accurate. The above results also show that in certain circumstances the 5 output neuron network outperformed the 1 output neuron network by a significant margin . 176  1 7 6  For example, Facts 26, 30 and 33.  143  Other facts showed the opposite network . 178  . Still other facts showed little difference under either  In short, I could find no compelling reasons to use the 5 output-neuron  network. The ideal situation would be to take the best of each network and combine them into one network. But building a network with 6 outputs, (i.e. 5 output neurons for Bands 1 to 5 and a single output neuron for an unaltered square root value) offered no guarantee that the best features of each network would be used. Nevertheless, I felt that it would be a interesting and potentially worthwhile experiment.  This next network that I built consisted of 61 inputs, 40 hidden neurons and 6 outputs. It trained on over 280 runs of the training data. The RMS error curve was slow in descent and very erratic. I tested the network on the 283rd run of the data. My analysis revealed that the 6 neuron network to be less accurate than previous networks, using any criteria for analysis. It was clear that, far from combining the best parts of previous networks, this 6 output neuron network was more confused than ever. There may be a way around this problem by developing two separate neural networks, one which has 5 outputs and one which has only one output and using the latter network for verification. I did not explore this option further as I felt that I was being drawn into diversionary experiments away from the real purpose of the thesis.  Therefore, I decided to abandon this combined  banding and square-root data structure approach and return to the use of a single output neuron and attempt to improve that network, by including the use of dates, changing the  For example, Facts 11,16 and 23 For example, Facts 1 and 2.  144  number of hidden neurons, changing the learning rates and training for a longer period of time.  Experiments with banding proved to be costly in terms of time taken to manipulate data but showed a few improvements on the single output neuron networks developed previously.  Although the 5 output-neuron network reduced the number of wayward  results, problems with interpreting output data had to be weighed against this. Also in this domain, for the time taken to build the networks, they offer little advantage. However, given a slightly different data structure, it may be that they would prove considerably better suited.  4.13 4.13.1  CHANGING OTHER  FACTORS  INCLUSION O F D A T E S AS INPUT N E U R O N S  I returned to the use of a single-output (square root of quantum) neural network.  I  decided to include the date of the case as part of the input data. There were two potential methods of dealing with dates; either taking the view that each year was more important than previous years by an increment of 1 and thereby implementing the year of a given case through the use of 1 input neuron or banding the years together. As we have seen from the above experiments with banding of output values, banding (fuzzy or otherwise) is an arbitrary method of dealing with awkward data. Therefore, I decided to start initially with 1 input neuron to represent case dates, with a 1994 case being represented by the number 11, a 1993 case by the number 10 and so on. I trained this network on 62 input  145  neurons, 40 hidden neurons and 1 output neuron, for over 250 data runs. The 104th data run had the lowest RMS error therefore I retrieved the 100th network and trained it over 4 more data runs.  I trained the second of these networks with 64 input neurons (utilizing 3 input neurons to reflect case date bands of 1993/1994, 1991/1992 and 1990 or prior, instead of 1), 40 hidden neurons and 1 output neuron which was the square root of quantum. I decided to train this network for a longer period to see how it trained. With over 6 hours of training and nearly 1,400 data runs, the network showed a very unusual RMS curve that was erratic in its initial descent and which then curved upwards in later runs. (The Histograms informed me that the networks "mental health" was irrecoverable by the 1000th run.) The network statistics revealed that the 221st data run provided the lowest RMS error and the least number of "bad" facts. (After the 300th run the results suggested that the network had started to memorize results rather than trying to learn.) I retrieved the 220th network and trained it on one further data run. The comparative results of testing of all three networks, (the "date = 3 inputs" network, the "date = 1 input" network and the "no date input" network) are set out below;  146  Figure 4.10  Comparative Table of Outputs Using Different Date Configurations (#1) = most accurate network; (#2) second most accurate; (#3) third most accurate (or least accurate) network;  CASE NUMBER  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  PATTERN  100.04 130.03 63.05 97.00 112.04 81.04 100.04 112.04 71.07 97.00 123.10 87.04 100.04 92.02 141.01 157.65 39.06 126.06 148.02 55.02 87.04 112.04 212.05 141.01 122.01 84.00 100.04 173.03 92.02 134.00 122.04 97.01 130.03 100.04 112.04 141.01 122.01 100.04 110.01  (1 OUTPUT & NO DATE INPUT) NEURAL NETWORK  (1 OUTPUT & 1 DATE INPUT) NEURAL NETWORK  (1 OUTPUT & 3 DATE INPUTS) NEURAL NETWORK  122.33(#1) 147.60(#1) 99.96(#4) 151.06(#4) 97.94(#3) 97.94(#3) 94.47(#1) 116.68(#1) 116.68(#3) 110.43(#3) 126.06(#1) 84.42(#2) 147.26(#3) 112.80(#2) 102.66(#3) 157.99(#1) 66.17(#2) 132.82(#1) 224.98(#3) 106.46(#2) 109.67(#3) 194.57(#2) 213.57(#1) 134.17(#1) 107.98(#3) 138.39(#3) 93.12(#2) 179.45(#2) 106.63(#3) 177.00(#3) 82.65(#3) 117.87(#2) 100.55(#3) 102.75(#1) 182.07(#3) 108.66(#2) 84.76(#3) 106.12(#1) 64.23(#3)  136.96(#3) 97.85(#3) 96.92(#2=) 93.46(#1) 115.25(#1) 87.97(#1) 177.33(#3) 131.30(#3) 87.63(#2) 104.18(#1) 216.19(#3) 79.44(#3) 109.33(#1) 117.27(#3) 123.1(#2) 203.25(#3) 124.37(#2) 116.35(#2) 313.42(#2) 102.66(#1) 95.40(#2) 200.06(#3) 178.26(#2) 111.28(#2) 126.74(#1) 118.8(#2) 101.82(#1) 171.68(#1) 106.29(#2) 175.81(#2) 94.30(#1) 124.03(#3) 122.60(#2) 105.87(#2) 154.19(#2) 117.78(#1) 132.99(#2) 136.70(#3) 122.77(#1)  137.13(#4) 99.03(#4) 96.92(#2=) 89.15(#2) 102.35(#2) 69.21(#2) 116.35(#2) 97.94(#2) 67.95(#1) 106.63(#2) 157.74(#2) 85.77(#1) 88.56(#2) 111.87(#1) 147.26(#1) 152.16(#2) 46.49(#1) 72.00(#3) 170.66(#1) 111.28(#2) 88.21(#1) 114.40(#1) 148.36(#3) 111.45(#3) 136.11(#2) 100.97(#1) 75.13(#3) 108.32(#3) 99.79(#1) 100.89(#1) 84.42(#2) 102.24(#2) 129.69(#1) 85.26(#3) 86.11(#1) 98.27(#3) 122.43(#1) 87.63(#2) 87.55(#2)  147  40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  4.13.2  110.01 200.06 173.03 200.06 122.01 66.05 157.99 187.05 134.00 141.01 224.05 173.03 173.03 157.99 132.06 173.03 141.04 83.98 128.09 157.99 157.99 157.99 187.05 173.03 126.06 187.05 141.01 87.04 46.99 100.04 246.03  102.15(#1) 223.76(#1) 114.23(#3) 155.54(#2) 99.71(#2) 99.46(#2) 162.13(#3) 164.75(#2) 74.45(#3) 140.42(#1) 271.02(#3) 173.28(#1) 202.68(#2) 96.98(#3) 131.30(#1) 107.64(#3) 158.67(#2) 122.85(#3) 132.56(#2) 122.85(#2) 321.23(#3) 126.99(#2) 183.25(#1) 116.60(#2) 139.07(#2) 212.14(#3) 114.57(#1) 108.07(#2) 95.65(#1) 229.03(#3) 233.42(#1)  157.40(#3) 248.63(#2) 142.36(#2) 126.82(#3) 106.12(#1) 113.98(#3) 150.47(#1) 178.26(#1) 150.73(#1) 159.51(#3) 216.23(#1) 189.25(#2) 208.42(#3) 146.5(#1) 167.87(#3) 126.14(#2) 130.37(#1) 118.80(#2) 129.35(#1) 144.73(#1) 178.01(#1) 130.87(#1) 193.81(#2) 141.27(#1) 164.50(#3) 188.15(#1) 89.826(#3) 131.30(#3) 161.45(#2) 176.49(#1) 214.42(#2)  114.32(#2) 135.94(#3) 142.79(#1) 199.97(#1) 84.47(#3) 84.59(#1) 160.86(#2) 102.24(#3) 172.10(#2) 123.78(#2) 263.50(#2) 122.68(#3) 179.87(#1) 142.79(#2) 119.98(#2) 143.21(#1) 162.13(#3) 93.04(#1) 117.28(#3) 44.127(#3) 198.22(#2) 111.53(#3) 142.87(#3) 113.73(#3) 127.83(#1) 194.48(#2) 102.83(#2) 84.00(#1) 178.18(#3) 179.62(#2) 160.52(#3)  A C O M P A R A T I V E ANALYSIS O F D A T E INPUTS  The second network (using only one neuron for date inputs) was the most accurate network most of the time (27 times out of a possible 70) and least accurate the least number of times (18 out of a possible 70.) The "banded date" network was the second most accurate network; of the 70 testing facts, 39 fact outputs (56%) were more accurate  than the network with no dates and 31 facts (44%) were less accurate. Of these, 20 testing fact outputs were significantly better  179  and 14 tested significantly worse . 180  The results seemed to justify the use of one input neuron to represent the dates of the case. However, one can also find in the data evidentiary support for using more than 1 neuron e.g. the problematic Fact 17 with its quantum square root figure of 39 ($1,521) was matched by an output from the "banded date" network of 46.49 ($2,161) as opposed to an output of 66.17 ($4,378) from the "no date" network, an output of 124.37 from the one input network (and we will recall, no calculation at all from the 5 output neural network.)  Bearing in mind the accepted rule amongst neural network developers  regarding the number of input neurons that of "the more inputs the better," I decided to try one last variation on this theme and build another network with 1 input neuron for each different year. Thus there would be 11 input neurons to represent the dates of all the training cases. This network was trained on over 170 data runs. This network seemed to suffer from "mental health" degeneration earlier than the other networks for reasons which were not immediately apparent.  The network with the lowest RMS error and least  number of "bad" facts was the network developed on the 130th data run. This was tested. The results suggested that the performance surpassed that of the "no-date network" but did not compare favourably with the two networks which also took account of date inputs. Thus with this particular aspect of the data set, I was not able to confirm the belief  With an accuracy improvement of over 20 points, equating to an increased accuracy of over $400. Facts 4,9,13,15,17,19,22,26,35,37,42,43,48,52,53,55,60 and 69 all tested significantly better Using the same 20 point criterion, Facts 2,11,18,23,24,28,41,47,49,51,59,62,68 and 70 all tested significantly worse. 1 8 0  149  that with regard to input neurons 'more is better." However, I was able to corifirm that the use of one input neuron to represent dates worked best.  4.14.3  LEARNING RATES  We recall from Chapter 3 that the learning rate  181  determines how large or small an  adjustment the network will make during training. Philipps  182  used the useful metaphor of  a game of golf. Changing the learning rate may be likened to changing one's golf club you may reach the target more quickly with a driving wedge but may miss the hole several times if one persists with its use. At different learning rates, Philipps discovered different outputs. I performed various experiments using different learning rates. The learning rate must be greater than zero; (Brainmaker sets the learning rate at 1.0 by default.) Using the same data set as before and the same number of hidden neurons throughout, I experimented by changing the learning rate to 0.8, 0.4 and also up to 2.0 The results were as follows;  Figure 4.11  Comparative Table of Outputs Using Varying Learning Rates (#1) = most accurate network; (#2) second most accurate; (#3) third most accurate (or least accurate) network;  CASE NUMBER  PATTERN  1 2 3 4 5  100.04 130.03 63.05 97.00 112.04  LEARNING RATE = 1 176.49(#2) 166.94(#3) 103.34(#2) 107.64(#3) 156.47(#3)  See Appendix G, The Glossary of Brainmaker terms. See note 141 to Chapter 3.  150  LEARNING RATE = 0.8 107.90(#1) 151.82(#2) 105.79(#3) 100.30(#2) 126.40(#1)  LEARNING RATE = 0.6 187.47(#3) 125.30(#1) 85.94(#1) 92.53(#1) 155.54(#2)  Figure 4.11 (continued) 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53  81.04 100.04 112.04 71.07 97.00 123.10 87.04 100.04 92.02 141.01 157.65 39.06 126.06 148.02 55.02 87.04 112.04 212.05 141.01 122.01 84.00 100.04 173.03 92.02 134.00 122.04 97.01 130.03 100.04 112.04 141.01 122.01 100.04 110.01 110.01 200.06 173.03 200.06 122.01 66.05 157.99 187.05 134.00 141.01 224.05 173.03 173.03 157.99  85.52(#1) 134.34(#2) 123.61(#2) 94.81(#2) 111.28(#2) 200.40(#2) 95.401(#2) 111.11(#2) 141.43(#3) 160.86(#2) 176.74(#1) 87.72(#3) 135.69(#2) 203.86(#2) 128.26(#3) 106.72(#3) 173.03(#3) 195.92(#1) 103.42(#3) 104.01(#2) 128.43(#3) 115.50(#3) 131.55(#1) 130.28(#3) 217.97(#3) 103.17(#1) 134.17(#3) 144.56(#1) 104.27(#1) 141.01(#1) 132.14(#1) 144.81(#3) 112.04(#3) 129.61(#3) 175.22(#3) 123.69(#2) 106.88(#1) 149.37(#3) 69.64(#3) 119.22(#3) 140.34(#2) 140.50(#2) 172.10(#2) 116.77(#2) 286.73(#3) 166.78(#1) 188.99(#1) 147.09(#1)  100.30(#2) 152.92(#3) 169.65(#3) 100.40(#3) 127.16(#3) 260.88(#3) 100.30(#3) 94.89(#1) 83.49(#2) 146.08(#1) 132.31(#2) 59.42(#1) 132.82(#1) 204.96(#3) 123.89(#2) 102.83(#2) 136.20(#2) 146.90(#3) 133.41(#1) 133.66(#1) 116.01(#2) 97.51(#1) 113.90(#2) 112.80(#2) 164.00(#2) 87.98(#2) 85.18(#1) 95.82(#2) 91.35(#2) 153.34(#1) 131.13(#2) 99.96(#2) 131.38(#2) 110.26(#1) 151.15(#2) 252.43(#1) 91.18(#2) 173.62(#1) 87.97(#1) 96.16(#2) 156.47(#1) 144.90(#1) 189.50(#3) 217.29(#3) 213.74(#1) 132.23(#2) 197.19(#2) 209.35(#3)  71.50(#3) 65.84(#1) 110.69(#1) 82.05(#1) 110.94(#1) 140.42(#1) 84.92(#1) 74.87(#3) . 106.80(#1) 112.04(#3) 128.43(#3) 59.67(#2) 101.73(#3) 177.67(#1) 100.97(#1) 80.28(#1) 112.67(#1) 179.45(#2) 119.64(#2) 167.79(#3) 115.59(#1) 90.67(#2) 70.29(#3) 96.67(#1) 139.66(#1) 76.56(#3) 82.73(#2) 170.24(#3) 81.80(#3) 144.31(#2) 116.01(#3) 110.77(#1) 102.49(#1) 113.90(#2) 126.48(#1) 117.02(#3) 64.991(#3) 154.36(#2) 76.90(#2) 84.17(#1) 172.44(#3) 96.92(#3) 144.98(#1) 140.76(#1) 179.11(#2) 159.43(#3) 200.14(#3) 189.84(#2)  Figure 4.11 (continued) 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  132.06 173.03 141.04 83.98 128.09 157.99 157.99 157.99 187.05 173.03 126.06 187.05 141.01 87.04 46.99 100.04 246.03  156.05(#3) 149.54(#2) 148.87(#1) 103.00(#2) 143.38(1=) 74.28(#3) 158.41(#1) 131.89(#3) 194.15(#2) 138.65(#1) 162.30(#2) 181.64(#1) 95.91(#2) 101.65(#1) 123.1(#3) 165.59(#2) 158.67(#2)  139.58(#1) 108.74(#3) 163.73(#2) 112.80(#3) 143.38(1=) 125.30(#1) 192.99(#2) 137.80(#2) 198.54(#3) 132.48(#2) 145.15(#1) 185.61(#2) 104.01(#1) 141.94(#3) 105.45(#2) 134.25(#1) 114.66(#3)  143.04(#2) 155.20(#1) 115.41(#3) 91.18(#1) 155.54(#3) 123.44(#2) 119.64(#3) 138.39(#1) 182.15(#1) 79.77(#3) 165.42(#3) 187.05(#1) 91.01(#3) 102.07(#2) 81.63(#1) 169.82(#3) 170.49(#1)  (N.B. The above results should be looked at for comparative purposes only. The reader should not be concerned that certain results appear far from the required pattern output.)  The network with the learning rate of 2.0 trained very erratically indeed. It exhibited some very unusual behaviour. Indeed after a mere 60 or so runs of the training set, the Column Histogram looked very similar to that representing a "brain dead network" i.e. a network that had memorized rather than learnt. The results obtained from this network have not been included because they were nonsensical. Clearly, the network was making far too large adjustments with every run and this meant that it was failing to settle into a satisfactory learning pattern. Equally, the network with a learning rate of 0.4 took so long to show progress during training that I began to wonder if it would ever train to completion. I took the best network (i.e. the one with the lowest RMS error and smallest number of "bad" facts) that each of the 3 training sessions produced and ran the testing  152  data through it. The results are set out above. The network with a learning rate of 0.8 should have trained more slowly than the network with a learning rate of 1.0. Interestingly, the 0.8 learning rate network trained to its best network more quickly rather than less, with under 100 runs of the data as opposed to 120 to reach its lowest RMS error figure. I was (and remain) unable to explain this phenomenon, although from this one might conclude that the learning rate of 1.0 was too high. Not surprisingly, the network with the 0.6 learning rate took much longer to reach its lowest RMS-error network. The best network was produced on the 150th run of the data.  As regards outputs, we can see from an examination of the above table that, like Phillips, I obtained different outputs using different learning rates. Of the three rates, the network with the learning rate of 0.6 provided me with the most accurate outputs, with 30 out of the 70 testing facts testing the best of the three networks. However, it also came third in 26 of the 70 possible testing cases. The network with the learning rate of 0.8 came third the least number of times (17) and still had a high number offirstplaces. Despite being unable to establish unequivocally that the lower learning rate is better, it is clear from the above table that with this data, the learning rate of 1.0 is not appropriate and that outputs can be improved by a lower learning rate. It is open for debate as to the exact level of the most appropriate learning rate in this context. I decided to compromise and set the learning rate at 0.7 in my future work.  153  4.15 H O W  MANY HIDDEN NEURONS-PART 2  I stated earlier in this Chapter that I would return to this experiment.  My earlier  experiment with this variable showed that the optimum number of hidden neurons is likely to lie somewhere between 20 and 40 and that networks can test reasonably well without having to train on all the given facts. However, I wanted to provide myself with outputs for networks with extreme numbers of hidden neurons, as sometimes (depending on the type of data,) some neural networks can work adequately on as little as one-tenth of the total of inputs and outputs , which in this situation (with 64 inputs and 1 output) would 1  equate to 6-7 hidden neurons.  I tested a number of networks taking the best results for other variables from other experiments that I had run. For example, the learning rate was set at 0.7; there was one date neuron; one output neuron represented the square root function of quantum, etc. I trained 7 networks with the number of hidden neurons set at 10, 15, 20, 25, 30, 35 and 40 respectively. It was hoped that some kind of trend in the testing output might reveal itself. I already knew that with a larger number of neurons, the network would train well, possibly to completion (i.e. satisfaction of the tolerance level,) but it might not necessarily test well. However, it may be that in such circumstances the tolerance level is set too high. The results are set out in Figure 4.12 below. The most accurate network for any given fact is indicated by a star (*).  The FLAIR project have been working on a neural network for information retrieval purposes. Their proto-type network works adequately and consists of 50 inputs, 99 outputs and 15 hidden neurons.  154  CASE  191.44 99.79 84.58 84.00 152.75 77.16 62.54 63.89 77.92 106.21 180.54 82.98 62.88 107.05 106.80 158.58* 70.99 134.08 101.73 103.59 77.24 124.29 175.81 132.14 141.52 121.08 96.25 145.66  = 15  163.65 113.98 78.93 94.81 117.95 56.46 105.20* 157.48 53.59 104.01 178.35 74.03 56.21 112.63 201.33 148.19 59.25 119.64 166.94* 96.42 92.28 135.77 177.84 117.61 117.95 93.46 104.52 100.63  = 10  HIDDEN NEURONS  HIDDEN NEURONS — «« 2 O  Q o  N  Qu II  CI  <n  00 Ov o  l-H  cs cn  155  m  o -*  *—i 41.68* 83.07 254.02 97.01 74.87 54.35 168.42 118.12 176.15 122.09 109.17 110.94  55.02 157.03 87.38 82.14 154.19 77.58 122.01 103.25 81.04 108.07 172.86 88.56 106.04* 87.71* 95.40 177.84 66.00 124.62* 240.27 115.84 79.27 140.42 209.86* 102.83 133.92* 112.80 101.06 105.20  S g 11  152.25 163.40 77.83 93.04 149.12 70.99 95.82* 66.00 67.95* 100.04 130.20* 67.27 98.02 146.84 40.83  N  100.04 130.03 63.05 97.00 112.04 81.04 100.04 112.04 71.07 97.00 123.10 87.04 100.04 92.02 141.01 157.99 39.06 126.06 148.02 55.02 87.04 112.04 212.05 141.01 122.01 84.00 100.04 173.03  PATTERN o aj  vo r -  cs cn •* •n vo ir- 00 00 Ov © CS CS CS cs c s cs CS es cs  136.96 97.85 96.92 93.46 115.25 87.97 177.33 131.30 87.63 104.18 216.19 79.44 109.33 117.27 123.10 203.52 124.37 116.35 313.42 102.66 95.40 200.06 178.26 111.28 126.74 118.80 101.82* 171.68*  132.65* 138.98* 86.79 92.53 109.84* 89.57 123.44 77.408 91.35 99.71* 142.70 74.03 120.23 100.72 111.36 121.33 68.37 119.81 210.45 93.97* 83.24 121.08* 122.01 118.20 101.31 99.54 115.50 81.04  = 30  HIDDEN NEURON S = 35  HIDDEN NEURONS 147.60 146.00 80.027 80.79 138.90 69.81 147.26 113.81* 67.02* 98.19 216.53 63.98 134.93 106.38 131.04* 183.42 54.43 . 162.64 237.48 97.01 86.70* 174.63 157.23 126.65 113.22 126.31 123.10 100.30  = 40  HIDDEN NEURONS  CASE  ON  CN cn  o  cn  CS cn cn cn  cn  io cn  NO  cn  r~ cn  00 O N cn cn  o  156  CS cn  NO  r-  173.03 173.03 157.99 132.06 173.03  134.00 141.01 224.05  157.99 187.05  200.06 122.01 66.05  173.03  122.01 100.04 110.01 110.01 200.06  141.01  100.04 112.04  97.01 130.03  122.04  134.00  92.02  PATTERN  e  00  ON  o  rH IO  CS cn  IO  •«t io  io io  76.48 124.62* 77.75 132.65 165.26 75.63 112.80* 94.64 104.94 126.65 105.79* 232.16 160.19 90.00 157.23 78.08 100.04 159.68* 83.66 123.69 171.00 247.53 153.60 193.72 105.03 147.85 76.48  =10  HIDDEN NEURONS 81.46 153.94 79.94 97.34* 143.97 82.14 140.84 104.01 101.39 105.79 78.08 136.20 171.25 136.03 166.02 106.55 98.53 131.89 243.48 134.08* 170.75 197.27 162.47* 146.00 175.39 121.84* 117.70  HIDDEN NEURONS = 15 107.73 157.91 82.39 113.22 109.67 96.84 123.19 134.51* 126.82* 116.94 115.16* 166.27 208.84 109.08 198.88* 96.16 94.30 108.24 128.85 174.38 202.51 166.78 193.55 148.11 153.34 98.44 157.48  HIDDEN NEURONS = 20 88.31 179.70 82.39 81.21 140.84 82.73 130.54 125.89 92.11 102.83* 88.31 91.60 188.82 134.42* 131.21 80.53 100.04 134.59 100.80 127.58 155.96 214.25* 156.22 194.22 110.85 157.91 180.54  HIDDEN NEURONS = 25 94.39* 166.94 83.41 83.83 128.93* 94.30 171.51 104.10 104.01 122.17 85.94 123.95* 195.16 104.86 170.41 100.21 91.68* 131.47 176.32 105.36 144.39* 207.15 137.13 182.15* 160.02* 170.83 172.35*  106.29 175.81 94.30* 124.03 122.60 105.87 154.19 117.78 132.99 136.70 122.77 157.40 248.63 142.36 126.82 106.12* 113.98 150.47 178.26* 150.73 159.51 216.36 189.25 208.42 146.50 167.87 126.14  HIDDEN HIDDEN NEURONS NEURON S = 35 = 30  199.47 215.77 173.70 187.64  161.62  117.36 139.24 80.79 117.95 92.19 119.81 155.71 127.16 124.37 166.69 96.08 70.74 197.95* 127.66 235.96 85.60 111.62 141.69 239.68 147.01 156.64 244.66  HIDDEN NEURONS = 40  J  CASE  ©  so  1-H c s  so  SO  cn so  157.99 157.99 187.05 173.03 126.06 187.05 141.01 87.04 46.99 100.04 246.03  PATTERN  HIDDEN NEURONS = 15 179.19 137.13* 169.06 81.55 149.29 192.96 89.74 95.232* 110.60 150.04 141.27  HIDDEN NEURONS =10 126.74 112.80 130.37 130.79 232.41 213.91 95.74 106.04 131.64 137.72 203.10  HIDDEN NEURONS = 25 167.28 120.99 138.48 141.18 167.54 191.44 121.58 113.56 112.54 139.49 118.54  HIDDEN NEURONS = 20 163.57* 116.68 116.68 167.87* cn i ^  so  235.79 140.93* 105.03 118.63* 223.54 148.70  *  vi  so  so so  r-  so  so  00 ON  so  ©  157  198.54 123.10 146.16 151.06 212.14 231.23 73.02 103.59 74.54 234.61 114.57  HIDDEN NEURONS = 30 HIDDEN NEURONS = 40 171.68 133.83 187.89* 166.86 164.16 241.87 84.00 146.67 104.77 123.10* 129.02  HIDDEN NEURON S = 35 178.01 130.87 193.81 141.27 164.50 188.15* 89.826 131.30 161.45 176.49 214.42*  On some facts all networks trained adequately therefore the accuracy of any given network is less crucial. For example, on Fact 3, the most accurate network (hidden neurons = 20) produced a translated $ error of $2,083, whilst the least accurate network (hidden neurons = 35) produced a translated $ error of $5,418. This is a fairly negligible difference compared with, say Fact 22 where the most accurate network (hidden neurons = 30) produced a translated $ error of $2,107 and the least accurate network (hidden neurons = 35) produced a translated $ error of $27,471. Therefore, where all outputs for a given fact are bunched closely together, there is no emboldening as clearly all of the networks had no problems dealing with the data relating to that given fact.  If two  networks produced outputs with almost identical errors and these two networks also happened to be the most accurate for that fact, they are both emboldened. Using these criteria, the network with 30 hidden neurons appeared to be the most accurate, (15 outputs being the most accurate out of possible 70.) However, in my opinion, in this type of comparative analysis, being the most accurate of seven networks for any given fact is not as important as providing a consistently accurate output for all 70 facts. Even the best performing network was showing wayward results that seemed incomprehensible. Trying to discern which network is the most consistent was a particularly difficult process, short of translating all the output figures to a $ value.  One must be very careful when thinking of these figures in terms of true dollar figures as they can be deceptive. For instance, the difference between 90 squared ($8,100) and 100 squared ($10,000) is not 10 squared ($100) but 43.59 squared ($1,900), whereas the  158  difference between 210 squared ($44,100) and 220 squared ($48,400) is 65.57 squared ($4,300.) Therefore, one cannot pick a figure at random (e.g. 30 points) and use it as a test of accuracy because as we can see, in true $ values, an error of 10 at the lower end of the output spectrum is much less than an error of 10 at the higher end. I was hoping that I would find one network that was both consistent and accurate. The network with 30 hidden neurons came closest to satisfying this demand, but even this network had occasional hiccups . A further perusal of the results suggested to me that I should use 1  more than 1 network to be tested in comparison with the Statistical Quantum Advisor program. In the comparative tests described below I used 2 networks ( one with 30 hidden neurons and one with 25 neurons.) I also developed a network that used an identical number of training facts as the SQA and used it for comparative analysis.  4.16 A H I G H E R N U M B E R O F D A T A R U N S ? In order to perform this experiment, I had to transfer my data onto a SUN Sparc Workstation that had a much higher computing capacity than that of Brainmaker™ running under Windows™ on a 386/25Mhz PC. My neural network software would take 6-7 hours to perform the same operation that could be accomplished on the SUN computer in 20 minutes. However, the program used on the SUN computer was written in C, was fairly user-unfriendly and required a high degree of experience in the art of programming neural networks. Unfortunately, my programming skills were minimal. However, using this faster but more basic technology, I was able to run my training set over approximately  1  See for instance, the outputs for this network with facts 19, 23, 28, 35 and 42.  159  48,000 data runs. There were also significant problems with this program. The most significant of these was the problem of over-training , the process whereby the network 2  trained very well, in fact too well on the training facts and fails to generalize well. It was an inadequacy of this software that prevented mefromfinding the optimum network. The only solution available to me was to stop the training process at random and examine the results and hoping to stumble across a useful network. The key to obtaining a useful neural network is being able to stop the network before it starts to memorize the facts. Neural networks possess what we might describe in humans as laziness. If a solution can be found to a problem in the shortest possible time it will be sought by the neural network. If that solution should happen to involve memorization, then that solution will be sought nevertheless. The results of this experiment were that the network trained very well but was useless when given cases that it had not seen.  4.17  COMPARISON WITH T H E STATISTICAL QUANTUM ADVISOR  This is probably the most important experiment documented in this thesis. I decided to put my best two neural networks up against the Statistical Quantum Advisor (SQA.) The SQA was written in the C programming language in 1992 by John McClean who is a programmer at FLAIR. Like myself, he took the Whiplash database and subjected it to a degree of pre-processing by removing difficult cases such as ones that involved subsequent injuries, multiple injuries, other injuries, etc. This reduced the set of cases to from an initial 1102 to 477, less than my training set of 635 cases. One of McClean's biggest  2  See Appendix [ ], Glossary of Neural Network terms.  160  complaints regarding his program was one familiar to me, namely that there were not enough training cases. Training with this smaller amount of data would have one of two effects for neural networks; either the training set might be so small as to preclude learning altogether or it might make a cleaner data set that would lead to a more accurate network.  In order to perform this comparative analysis, I had to take his set of training data and manipulate it into a form that could be read by my neural network program. This involved taking the original database, deleting all the records that the SQA was unable to deal with and then running my Test*.prg program over that data file. I decided initially that I should test his program against mine in its present form, using 635 cases and then train it again using the same 477 cases as the SQA. Once this had been accomplished I then had to create what the makers of Brainmaker describe as a running fact file - afileconsisting only of inputs, upon which the neural network could predict an output for each fact. My two most accurate networks which used 635 training facts did not perform as well as the SQA. Figure 4.13 below shows the outputs for my 43 of the 70 testing facts compared with the SQA outputs. A graphical representation of the same results can be found in Appendix H. All outputfigureshave been rounded up or down to the nearest dollar. The most accurate network/program output for each fact is marked with a star. As we have seen above, there is a more important test than this. We are concerned to find the most consistent programming structure. The two neural networks should not be thought of as competing against one another.  We have seen from experiments above that they have different  attributes.  161  Figure 4.13 - Comparative Table of Neural Network and SQA Outputs CASE NUMBER  ACTUAL AWARD  SQA OUTPUT ($)  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43  10,000 16,900 4,000 9,500 12,500 6,500 10,000 12,500 5,000 9,500 16,000 7,500 10,000 8,500 20,000 25,000 1,500 16,000 22,000 3,000 7,500 12,500 45,000 20,000 15,000 7,000 10,000 30,000 8,500 18,000 15,000 9,500 17,000 20,000 12,500 20,000 15,000 10,000 12,000 12,000 40,000 30,000 40,000  19,727 27,564 5,755* 6,518 7,838 5,531 11,474* 9,529 5,492* 8,106 13,186* 6,751 8,897 10,246 12,807* NOT USED 10,315 23,558 25,495* 11,188 8,194 9,947 29,252 16,826* 11,487 NOT USED 16,132 13,529* 10,066 13,078* 11,945* 9,905* 9,575 NOT USED 18,910 NOT USED 16,806* 10,548* 9,559* NOT USED 32,982 24,368* 25,398  162  NETWORK NETWORK OUTPUT ($) (30 OUTPUT ($) (25 HIDDEN HIDDEN NEURONS) NEURONS) 17,596* 19,315* 7,531 8,561* 12,064* 8,023 15,237 5,991 8,344 9,942* 20,263 5,480 14,455 10,144 12,401  -  3,027 24,775 7,634 6,746 23,775 6,081* 14,886 10,660* 6,567 11,679 29,880 7,843* 11,244* 7,694* 9,101  -  4,674 14,354 44,289 8,829* 6,928* 14,660* 14,886 13,971 10,263  4,356* 15,530* 57,600 13,418 6,283 19,600 44,041* 10,574 17,935*  13,340 9,568 8,908* 27,869 6,957 7,027 16,662*  10,213* 11,067 7,798 32,292 6,788 6,594 21,561  29,415  17,040*  -  -  -  -  -  -  10,818 14,925 7,385  8,483 10,574 7,798  38,087* 10,995 29,039*  35,652 18,069 17,216  -  -  Readers will note that in previous tables, 70 cases have been used for testing.  This  experiment allowed for an analysis of no more than 43 facts because the SQA program used 477 cases taken from the Whiplash database when it consisted of only 1,100 cases as opposed to its present total of over 1,700 cases. Thus, I was able to compare only the first 43 of my 70 testing cases in this analysis. The above table shows that, of the 38 cases where a comparison was possible, the SQA program was the most accurate in 15 cases. The 1st neural network was the most accurate in 12 cases and the 2nd network was the most accurate in 11 cases. Consistency was a more important attribute than accuracy in any given case. Using this criterion, both networks performed almost as well as the SQA. In 18 of the 38 cases, both networks were accurate within a range of $3-4,000. The SQA was consistent in 21 of the 38 cases, (using the same judgment criterion.) The SQA seemed to perform better with high-end awards, although it was not without an occasional hiccup as in case number 43. The neural networks seemed to cope better with the very low awards (for instance, the problematic case number 17 with an award of $1,500) suggesting that the connectionist architecture can cope better than linear programming with contradictory data. Now and again, however, the networks would provide upsets. For example, in case number 19, the networks provided outputs of around $44,000 and $58,000 against an actual award of $22,000.  One would imagine that both neural  networks and the SQA would be inaccurate on the same cases. inexplicably, this was not so.  163  Interestingly, and  4.18  CONCLUSIONS  In the course of writing this chapter I have developed between 40 and 50 neural networks. I have not been able to develop a neural network that is likely to be of immediate commercial interest to ICBC or to lawyers who deal with whiplash injuries on a daily basis. This was not to be expected but would have been pleasantly surprising if it had been achieved.  However, I have been able to build networks that come close to  performing as well as a program built by a programmer experienced both in the domain of whiplash injuries and who is familiar with a range of computer programming techniques. I have familiarity with neither of these areas and yet by giving data to a neural network, have witnessed the evolution of a program that comes close to (but cannot as yet surpass) the performance of the SQA. I believe that, in the circumstances, one should be quite satisfied with such a conclusion.  One of the most important lessons to be learnt from experimenting with neural networks is the importance of finding an adequate search method. I started working with this data set changing items such as the representation of the dates of the cases, changing how the quantum figure might be represented, altering the learning rate, changing the number of hidden neurons, etc. I tried to work through all the possible variables as methodically as possible. However, time constraints and the limitations of the software have prevented me from making an exhaustive search of all possible variables.  It may be the optimum  network variables for this data set are a learning rate of 0.6, an error Tolerance of .0.8, a training over 12,000 runs, 30 hidden neurons, 61 inputs and 1 output neuron. However  164  finding this optimum combination might take one person a lifetime. Alternative measures for dealing with this problem are considered further in the Conclusion to this thesis.  165  CHAPTER 5 CONCLUSION 5.1  W H E R E NOW?  The performance of the Whiplash neural network appeared to be less successful, (albeit by a very marginal performance) than the Statistical Quantum Advisor. However, one should posit the question - in any given award, what error should be attributed to that particular judge? Arguably, judges do not hand down identical awards in identical cases. I have made the presumption throughout this thesis that if a number of different judges heard exactly the same case they would not necessarily hand down the same award. Equally, in different courtroom circumstances, the same judge might hand down a different award. Therefore the only measure I have been able to use is the particular award in a particular case. Alternative measures of accuracy might offer more favourable results. One such method might be to take cases with similar profiles, eliminate extreme awards and measure the success of the neural network in terms of its conformity with the overall band of cases. However, with the current data, the only measure available to me was a single case instance. Bands of awards may be a better indication than a single award criterion as a judgment of my success and undoubtedly my results would iook far more impressive using this variable.  166  I have explored only one of the various possible neural network configurations, namely the back-propogation function using the Sigmoid Transfer function in a slow-learning software application. I believe that the most important conclusion to be drawn from this set of experiments is that, given the large number of variables that can be changed in a neural network, a superior search strategy must be found in order to discover the optimum combination of factors. With variables such as the learning rate, the number of hidden neurons, a variety of different data presentations, various transfer functions and the like, there are an immense number of configurations to be explored. I have built less than one hundred neural networks using what I believed to be the most obvious architectures. Yet there may be thousands, if not millions of different configurations of networks. Within this population there may (or may not) be one, ten, one hundred or one thousand networks that perform as well as (or better than) the statistical model. If we imagine a three dimensional graph of peaks and valleys which represent good and bad networks respectively, my models might be located in the foothills over on the North-East side of the graph, whereas the range of solutions actually reside somewhere in the mountain range in the South West. However, using my current search strategy I have no way of finding this possible range of solutions (if they exist), other than by stumbling upon them by luck more than judgment. As yet, there appears to be no procedure for identifying "good" networks other than through identifying those that adequately perform the task in question.  167  5.2  IMPROVING THE SEARCH  STRATEGY  The only method available using the software that I used throughout this thesis is trial and error, by either wild guessing based on intuition (the so-called "Monte Carlo" method) or the more scientific method of hill-climbing - the process of testing potential improvements achieved by changes to each parameter. The problem with this latter search method is compounded by the fact that neural networks do not neccessarily react in a linear manner. That is to say that by changing the learning rate, the optimum number of hidden neurons may also have changed. Although more likely to succeed than wild guessing, this method is very time consuming. I have largely relied on hill-climbing using what I have read in software manuals, certain practice-orientated magazines  183  FLAIR programmers  184  and sought advice from the  to set initial parameters and then vary a single parameter in both  directions. This has not proved to be very successful. We must therefore explore some of the possible alternatives.  5.3  FUTURE  RESEARCH  5.3.1 G E N E T I C A L G O R I T H M S One alternative may be the use of genetic algorithms in conjunction with the neural network. These are another aspect of the so-called "evolutionary computation" research project that is affecting the Artificial Intelligence discipline. Like neural networks they are  The magazines that I found to be most help have been PC-AI and A l expert. In particular Bruce Atherton.  168  finding immediate uses in the real world . 185  As the name suggests, the notion behind  genetic algorithms is that it is possible to simulate certain characteristics of animal evolution in computer programs to find the "best fit" solution to a particular problem. Just as living species pass on certain crucial genetic information regarding characteristics through genetic strings called chromosomes during reproduction, mathematical algorithms can be passed down between generations and are then "mutated" or recombined to perform a given operation more effectively. Those that perform the best are selected and the remainder are disposed of. Genetic algorithms must have the following attributes; mutation, reproduction and selection. With respect to this problem, a population of neural networks is generated then the most successful are selected.  These are then  recombined/mutated and tested. Of this set, again the most successful are selected. These are, in turn, recombined and tested. This process continues until the search parameters have been satisfied and can take many generations before the optimum algorithms are found. Although not specifically designed for the process of function optimization, this is proving to be their primary application in engineering, manufacturing and elsewhere. In the context of this neural network research project I might wish to know the optimum number of hidden neurons, and the best learning rate. A search strategy to find this solution might be implemented using genetic algorithms. Hinton mentions the possibility of combining hill-climbing with genetic learning. He writes; "It is possible to combine genetic learning with gradient descent (or hill-climbing) to get a hybrid learning procedure called "iterated genetic hill climbing" or "IGH" that works better than either learning alone. IGH is as a form of multiple restart hill climbing in which  Examples include finance - Goldman Sachs of New York are known to be working in the area; in the military, the US Naval Surface Weapons Center at Bethesda, MD., are using genetic algorithms to advance propeller design.  169  the starting points, instead of being chosen at random, are chosen by "mating" previously discovered local optima. " 186  This would seem to be the most obvious next step in the development of Connectionism in fuuzy domains such as legal reasoning. Otherwise we can resign ourselves to endless experimentation. Of course, even with this new search strategy, there is no guarentee that the networks would out-perform or even improve on previous neural networks or on the statistical program.  It may be that I have stumbled unwittingly upon the optimum  network, although Ifindsuch a scenario highly unlikely.  5.3.2 H Y B R I D S - C O M B I N I N G E X P E R T S Y S T E M S W I T H NEURAL NETWORKS One of the problems that I have encountered in attempting to build neural networks in this domain has been that the network has no guidance as to which inputs might be more important than others. (This was, in part, a deficiency of the software.  Some neural  network software packages allow the user to manually increase the connection weightings on particular input neurons. The version of Brainmaker that I used, did not.) It may be that by providing the network with a degree of guidance would assist the network further. Also a knowledge-based system might be able to explain the reasoning contained within the network. We saw in Chapter 1 some of the limitations of expert systems; the facts that they do not have an adaptive reasoning functions, their performance does not improve with experience, etc. We have seen in Chapters 3 and 4 limitations of neural networks, namely that they have very limited explanatory functions, their requirement for a particular  Hinton, G.E., Connectionist Learning Procedures, Artificial Intelligence. Vol.40., 1989, pp.189-234 at p.224. 186  170  presentation of data, etc. Liebowitz writes "Nowadays many people in the A l world believe that it's critical to integrate intelligent systems with such techniques as simulation and optimization, inter-active multimedia, neural networks and genetic algorithms. " 187  There is also some theoretical support for the notion; "Most probably, we think, the human brain is, in the main, composed of relatively small distributed systems, arranged by embryology into a complex society that is controlled in part (but only in part) by serial, symbolic systems that are added later. But the subsymbolic systems that do most of the work from underneath must, by their very character, block all other parts of the brain from knowing much about how they work. And this, itself, could help explain how people do so many things yet have such incomplete ideas of how those things are actually done. " 188  There are various types of hybrid architectures that may be utilized, such as loosecoupling, where the expert system and neural network stand alone but communicate via datafiles,tight coupling where they communicate via data passing . Rule-based systems 189  tend to work best with large areas of the problem domain but cannot always deal effectively with the boundary areas, where the subtleties of the domain prevent articulation in the form of rules. One possible application of a hybrid architecture in this domain might be the use of a rule-based system to deal with the major issues, so that rule-base defines the case as being a clear high-end award, (say $40,000-60,000) and then the neural network is used to provide a precise award within that band, based on its training with such cases.  Leibowitz, J., Roll Your Own Hybrids, BYTE. July 1993, p. 113. Minsky, M . , and Papert, S., (epilogue to Perceptrons, Revised ed., Cambridge, M A , MTT Press, 1988 ) See Leibowitz, ibid., p. 114.  171  5.4  PRACTICAL PROBLEMS OF NEURAL NETWORKS  5.4.1 F I N D I N G A S O U R C E O F D A T A Law is a domain rich in data.  However, with the exception of social science data  pertaining to the law (such as crime statistics) most of this data is contained in legal cases in a linguistic rather than numeric form. Computer programs have had great difficulty in dealing with issues of natural language and although inroads are being made to reduce linguistic knowledge to a more malleable form, we still have a way to go in this regard. For neural networks to be of significant further value in law, we need to discover a mechanism by which data can be easily transformed from linguistically indeterminate symbols to network-readable symbols without loosing any of their subtlety or character. As yet the only method is to put all the factors of the case into a numerically precise database. In the process much of the subtleties and the linguistic indeterminacies of the case are lost. We would be assisted by a systematic "fuzzification" of linguistic factors by using methods such as that by which I attempted to fuzzify the quantum values in the whiplash database in Chapter 4.  However, this will involve a significant research  undertaking to determine the optimum database structure. There would be a significant number of questions to be answered such as - should all inputs be fuzzified, if so to what extent, if not then which factors should be manipulated, etc.  5.4.2 T E C H N I C A L - E X T E R N A L I Z I N G T H E I N T E R N A L I T I E S Researchers in the field of law and economics often comment on their struggle to quantify and build into their equations those factors that evade numerical quantification. They have  172  in mind such factors as a view of an ocean from land, the enjoyment people get from walking across an open landscape undisturbed by development and many other such intangibles. externalities."  They call this process of attempted quantification "internalizing the Connectionist have the exact opposite problem; what I have called  "externalizing the internalities."  The knowledge, value, or intellectual property is  contained in the weightings on the neurons in numerical form. We saw in Chapter 2 that the Paris-based trio of Bochereau et al.  190  achieved some success in extracting symbolic  rules (albeit of very minimal use and they did not describe their methodology for doing so.) Even if we have discerned a meaning from the weighting of connections, this does not necessarily mean that we have discovered the relationship between inputs and outputs. It means that we have found a relationship, possibly one of many that may satisfy the requirements of the data. It is clear that judges view cases differently. It may also be that different lawyers reason in different ways. For instance, some people may rely more on their intuitive nature, others may prefer the rational logical analysis. There is no reason to assume that lawyers are any different. Moreover, there is no empirical proof that shows that judges necessarily come to similar conclusions by the same reasoning processes. Thus any notions that neural networks can in some way discover the "deep structure" of an area of law should be dispelled.  See note [ ] of Chapter 2.  173  5.4.3 I N C O M P L E T E  DATA  There are two distinct problems in this area. First, there are in the region of 60 potential inputs. It would be unfair to imagine that adjusters would be prepared to input all 60 factors in order to  discover a quantum figure. Equally it is highly unlikely that the  adjuster would have all the information at his or her fingers. Very often the purpose of predicting a possible quantum figure is to avoid going to the trouble and expense of accumulating all of the facts (the input data) in the first place. A neural network can function in such circumstances although the output that it provides is likely to be far more generalized.  The second problem relating to incomplete data is that only 5% of all actions actually go to court. Therefore the database contains only a very truncated "snapshot" of all disputes involving whiplash in British Columbia. A database showing all awards settled or litigated would have made a far richer data source.  There are other areas of research that would make interesting projects. One such project would be the development of a network from the bottom up. i.e. take a very small number of cases, (say 10) all from a given year and teach the network with these alone. Then add another 10 cases and let the network adjust its performance accordingly. This would continue until all 630 training cases are trained upon. I was not able to perform this experiment because aspects of the software prevented me from doing so.  Another  interesting experiment would be to isolate those cases that involve certain judges and train  174  networks on only those cases, then compare the results and see if they correspond with the opinions of lawyers experienced in appearing before those judges.  5.5  T H ESOCIOLOGICAL  CONTEXT  5.5.1 W O U L D C O M P A N I E S W A N T T H E S E  MACHINES?  The reality of corporate culture is that, even with significant advances in the development of neural networks, the level of sophistication is unlikely (at least in the near future) to be able to replace the expertise of an adjuster with 10 or so years experience at ICBC. The best opportunities for the exploitation of this type of software may well be with its use by more junior adjustors. Although they might appreciate such innovations with open arms, not least because the software could provide the user with an empirically sound "secondguess." However, their managers may be less welcoming. Supervisors may not want their trainees to have a "crutch" to lean on, to justify their opinions; they would want their trainees to develop an expertise of their own. (This problem is not exclusive to neural networks - it is likely to be reflected in the adoption (or otherwise) of any other form of software that purports to offer substantive advice, such as expert systems.) Moreover, if the neural network is trained on a training set that includes all cases including highly unusual awards, the likelyhood is that, if adjusters blindly follow the network output as if its answers were "carved in stone," then those quirks of precedent will be re-inforced and replicated. The network will effectively loop those problem cases into its internal makeup. There would be a major perception problem to be overcome in any commercial  175  application. Many computer users still do not realize that a computer is no more than a very high speed idiot with a well-developed memory.  5.5.2 W O U L D S O C I E T Y W A N T T H E S E  MACHINES?  There are a number of political, ideological and ethical implications of developing machines that appear to predict outcomes in legal case, not least of which is the Orwellian implication for machine-made justice. These have been explored elsewhere  191  and are  mentioned only for the sake of completeness. Mainstream jurisprudential commentators have been quick to pick up on this theme. Lloyd's Introduction to Jurisprudence offers these comments: "It is easy to throw up one's hands in horror, but is the "inscrutable" jury such a rational institution. You can at least program a computer with biases acceptable to the community, or at least to a majority of it, but juries, and judges for that matter retain their own capnces. After 20 years of research into computers and law and 10 years of using expert systems, we remain a very great distance from these Orwellian scenarios and as the information revolution continues, enhancing access to information further, the likelyhood of this events coming to pass seems to be, if anything receding. Neural networks, for all their advances seem unlikely to alter this forecast dramatically.  See Spengler,J. J., Machine Made Justice - Some Implications in Baade, H.W.(ed.), Jurimetrics, New York, Basic Books, 1963 and Beutel, F.K., Experimental Jurisprudence, Lincoln, University of Nebraska Press, 1957. Lloyd, of Hampstead, and Freeman, M.D.A., Lloyd's Introduction to Jurisprudence, London, Stevens & Sons, 1985, p.703. 1 9 2  176  5.5.3  T H E NATURE OF T H E NEGOTIATION PROCESS  Adjusters act on behalf of ICBC (possibly with the assistance of Counsel) against the Plaintiff and his/her lawyer. The reality of the negotiation process would appear to be that the adjuster predicts what the claim is worth objectively and often offers less because they know that the lawyer in all likelihood is going to ask for much more. This is commonly described as "low-balling" the offer. If neural networks such as the one developed in this thesis were to come into use by ICBC and the lawyers acting on behalf of their Plaintiffs were to discover that such a tool existed, they would undoubtedly wish to know the conclusion reached by such a system. Revelation of that figure might seriously undermine the traditional negotiation process based on the adversarial system. The response to this argument is therefore to offer the system to judges. However, they too are likely to possess experience superior to the pattern recognition powers of the network, at least in its current form. The network might, however, act as an intelligent assistant. A divergent award produced by such a neural network might cause a judge to reconsider his or her position. There are further problems. The neural network (in its present form) cannot take account of sociological issues such as the particular dynamics of the social relationships between a given lawyer and adjuster, the role of the doctor in the trial process, etc. Some doctors are not supportive and may not perform well on the witness stand. Others are considered to be very credible. Equally adjusters may have some bias either for or against the Plaintiff depending on how they perform through the negotiation process. Who the judge may also have a dramatic effect on the outcome. Some judges are known to be sympathetic to the  177  complaints of the Plaintiff; others are known to be highly skeptical their clients that going to court is a lottery. experiment  . Lawyers often tell  Thus, the data that I have used in this  i.e. injuries to the Plaintiff represent only half of the equation.  The  sociological interaction between key participants in the process cannot be analysed numerically, (at least not yet.)  5.6  A l AND T H E MEDIA  For the sociologist, A l has been an interesting scientific paradigm. The advent of a new technology is heralded in the popular press as much as in the scientific journals. There then follows an enormous amount of media interest in the potential applications of this new techology.  The problem with the popular media is that they tend to speak of  technological advances as if they have already occurred rather than reporting on what a given scientist believes to be feasible with the technology. After the media interest has peaked, nothing further is heard for sometime while the real research continues. During this period, articles appear in the popular press asking questions along the following lines; Whatever happened to that new technology, X? Finally research begins tofilterthrough which suggests that X is not the problem solving panacea that everyone thought it was. This thesis is part of that research momentum as the crest of media wave begins to break. It endorses the use of the technology but is realistic about the problems it presents. What is so remarkable is that one need know very little about the domain in issue and nothing  One Vancouver lawyer stated in interview "If we find out the name of the judge the previous day and we know he is one who will not be interested in what we have to say, we immediately phone the adjuster and accept their offer." How can any computer program, neural network or otherwise, capture the dynamics of such human interaction?  178  about computers to build a program that has some predictive capacity, albeit currently insufficient for utilization. Neural networks are not intelligent. As the unnamed author of a recent article on neural networks used infinancialforecasting, in Wired magazine so rightly noted "..the net will just as merrily try to find relationships between the cycles of the moon and inflation as between wage rates and inflation ." However, researchers 194  should be optimistic about the potential of these machines. Like Leith  195  I started out on  this research project relatively optimistically to build a neural network that could outperform former programming methods. However during this building I too grew skeptical of being able to handle legal knowledge in this type of computer system. Leith describes his skepticism thus: "real legal skill is a much more amorphous quantity than the builders of legal expert systems believe. " This thesis has shown that one cannot simply present 196  a database of numeric factors which represent a complex set of linguistic, social and political relationships that make up a domain of law, to a neural network and hope that it will be able to work everything out. Bench-Capon has shown that it is possible with hypothetical domains but the real world remains nebulous . My skepticism towards the 197  ability of neural networks to deal with legal knowledge remains in part for all the reasons I have outlined above (despite some encouraging results.) However, this skepticism is tempered by the realization that the challenge of finding the optimum network(s) also present marvellous opportunities to explore the complexity of relationships between sociological factors in law; we have in our possession a far more complex tool than expert  Wired Magazine, September 1994, p.39. See note 83 in Chapter 2. See note 83 in Chapter 2. See note 132 in Chapter 3.  179  systems. This is supplemented by an appreciation that I have only scratched the surface of neural network theory and application.  Hypothetically, at least, these political,  sociological and economic factors that affect legal reasoning and decision-making may be built into the network. Such research offers exciting potential for both lawyer and social scientist. I have been able to produce results almost of equal value as those produced by the Statistical Quantum Advisor with a relatively simple piece of software, no programming experience and no familiarity with the domain of whiplash. Furthermore, if the developments that I have outlined above are utilized i.e. the use of genetic algorithms to optimize searching, the use of banding as a measure of success and the fuzzification of input data then I believe that this technology could be utilized in a wide number of instances, faciliatating earlier settlements, substantial cost savings through alternative dispute resolution and out-of-court settlements through enhanced access to complex legal information. Plaintiffs are generally willing to trade-off a certain amount of accuracy for substantial savings in information costs.  This process could be transformed by the  application of neural networks. For all its pitfalls, I believe that I have identified an important new area for research in the discipline of Law and Information Technology and am both confident and excited about its future.  180  BIBLIOGRAPHY BOOKS & THESES Anderson, J.A., and Rosenfeld, E., Neurocomputing: Foundations ofResearch, Cambridge, MA., MIT Press, 1988. Aparicio, M., & Levine, D., Neural Networks for Knowledge Representation and Inference, Hillsdale, NJ., Lawrence Erlbaum Associates, 1994. Arbib, M.A., and Robertson, J.A., (eds.), Natural and Artificial Parallel Computation, Cambridge, MA., MIT Press, 1990. Ashley, K., Modelling Legal Argument: Reasoning with Cases and Hypotheticals, unpublished Ph.D thesis, University of Massachussetts, 1988. Baade, H.W., (ed.), Jurimetrics, New York, Basic Books, 1963. Beutel, F.K., ExperimentalJurisprudence, Lincoln, University of Nebraska Press, 1957. Braithwaite, M., Prolegomena to a Post-Modernist Theory ofLaw, unpublished LL.M. thesis, University of British Columbia, October 1993. Cardoza, B., The Nature of the Judicial Process, New Haven, Yale University Press, 1921. Continuing Legal Education Society of British Columbia, ICBC -Motor Vehicle Accident Claims, Vancouver, B.C., 1988. Continuing Legal Education Society of British Columbia, ICBC Defence Materials, Vancouver B.C., 1993. Coval, S., & Smith, J.C., Law and its Presuppositions: Actions, Agents and Rules, London, Routledge and Kegan Paul, 1986. Deedman,G.C, Building Rule-Based Expert Systems in Case-Based Law, unpublished LL.M. thesis, University of British Columbia, April, 1987. Department of Trade & Industry, Neural Computing- Learning Solutions, London, 1993. Dreyfus, H.L., What Computers Can 'tDo, New York, Harper & Row, 1972. Dreyfus, H.L., What Computers Still Can 'tDo, Cambridge, MA., MIT Press, 1992.  181  Dreyfus, H.L., & Dreyfus, S.E., Mind Over Machine, New York, Free Press, 1986. Dworkin, R., Taking Rights Seriously, Cambridge, MA., Harvard University Press, 1977. Foreman, S.M., & Croft, A.C., Whiplash Injuries - The Cervical Accelaration /Deceleration Syndrome, Baltimore, Williams & Wilkins, 1988. Frank, J., Courts on Trial, Princeton, Princeton University Press, 1949. Frank, J., Law and the Modern Mind, Garden City, NY., Doubleday Press, 1963. Gardner, Av.d.L., An Artificial Intelligence Approach to Legal Reasoning, Cambridge, MA., MIT Press, 1987. Goodrich, P., Reading the Law: A Critical Introduction to Legal Method and Techniques, London, Basil Blackwell, 1986. Graubard, S., (ed.) The Artificial Intelligence Debate: False starts, Real foundations, New York, Basic Books, 1988. Harmon, P., and King, D., Expert Systems, Artificial Intelligence in Business, New York, John Wiley and Sons, 1985. Hart, H.L.A., The Concept ofLaw, Oxford, Clarendon Press, 1961. Hebb, D.O., The Organization of Behavior, New York, Wiley, 1949. Holmes, O.W., Collected Legal Paper, New York, Harcourt, Brace & Co., 1920. Holton,G., Thematic Origins of Scientific Thought; Keplar to Einstein, Cambridge, Harvard University Press, 1973. Insurance Corporation of British Columbia, Comparative Aspects of the Insurance Corporation ofBritish Columbia and Private Sector Insurance Companies in Canada, Vancouver, B.C., 1990. Insurance Corporation of British Columbia, Annual Report, Vancouver, B.C., 1993. James, W., Psychology, New York, Holt, 1890. Kevelson, R., (ed.) Peirce and the Law: Issues in Pragmatism, Legal Realism and Semiotics, New York, Peter Lang, 1991.  182  Kosko, B., Fuzzy Thinking, New York, Hyperion, 1993. Kursweil, R., The Age of Intelligent Machines, Cambridge, MA., MIT Press,  1990.  Lawrence. J., Introduction to Neural Networks, California Scientific Software Press, Nevada City, C A , 1993. Leith, P., The Computerised Lawyer - a guide to the use of computers in the legal profession, London, Springer-Verlag, 1991. Levi, E.H., An Introduction to Legal Reasoning, Chicago, University of Chicago Press, 1949. Leyh, G, (ed.) Legal Hermeneutics: Theory, History and Practice, Berkeley, University of California Press, 1992. Llewellyn, K., Jurisprudence; realism in theory and practice, Chicago, University of Chicago Press, 1962. Lloyd, of Hampstead, and Freeman, M.D.A., Lloyd's Introduction to Jurisprudence, London, Stevens & Sons, 1985. McNeill, D., & Freiberger, P., Fuzzy Logic (forthcoming.) Section quoted in Success Magazine, September 1994. Minsky, M., Semantic Information Processing, Cambridge, MA., MIT Press, 1968. Minsky, M., and Papert, S., Perceptrons, Revised ed., Cambridge, MA, MIT Press, 1988. Mital, V., and Johnson, L., Advanced Information Systems for Lawyers, London, Chapman Hall, 1992. Ministry of the Solicitor General, Motor Vehicle Branch, Motor Vehicle Statistics, Victoria, B.C., 1989. Naylor, C , Build Your Own Expert System, , London, Halstead Press, 1983. Neumann, J. von, The Computer and the Brain, New Haven, Yale University Press, 1958. Pagels, H., The Dreams of Reason, New York, Simon and Schuster, 1988.  183  Partridge, D. & Wilks, Y., The Foundations ofArtificial Intelligence - A Sourcebook, Cambridge, Cambridge University Press, 1990. Penrose, R., The Emperor's New Mind, New York, Oxford University Press, 1989. Posner, R.A., The Problems of Jurisprudence, Cambridge, MA., Harvard University Press, 1990. Raz, J., The Authority ofLaw, essays on law and morality, Oxford, Clarendon Press, 1979. Reed, C , Computer Law, London, Blackstone Press, 1990. Ringle, M., Philosophical Perspectives on Artificial Intelligence, The Harvester Press, Brighton, Sussex, 1979. Rumelhart, D.E, and McLelland, J.L, (eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Cambridge, MA., MIT Press, 1986. Sayegh, N.S., Philosophy and Philosophers, Ottawa, Academy Books, 1988. Schilpp, P. A , The Philosophy of Karl Popper, La Salle, EL., Open Court, 1974. Susskind, R.E., Expert Systems in Law - A Jurisprudential Inquiry, Oxford, Clarendon Press, 1987. Trial Lawyers Association of British Columbia, Conducting a Whiplash Case: Winning Trial Strategies, Vancouver, B.C., Canada, February 23, 1990. Vandenburghe, G.P.V., (ed.) Advanced Topics in Law and Information Technology, Hillsdale, NJ., Lawrence Erlbaum, 1990. Wahlgren,P., The Automation ofLegal Reasoning - A study on Artificial Intelligence and Law, Deventer, Kluwer Law and Taxation Publishers, 1992. Waterman, D.A., A Guide to Expert Systems, Menlo Park, C A , Addison-Wesley Publishing Co., 1986. Williams, G., Learning the Law, 5th ed., London, Stevens & Sons, 1978. Winston, P.H., Artificial Intelligence, Reading, Mass. USA, Addison-Wesley Publishing Co., 1992.  184  DICTIONARY The New Lexicon Webster's Dictionary of the English Language, New York, Lexicon Publications, Inc., 1988.  185  ARTICLES Anderson, J.A., A simple neural network generating an interactive memory, Mathematical Biosciences. 14: pp. 197-220. Ashley, K., Reasoning by Analogy: A Survey of Selected Al Research with Implications For Legal Expert Systems, in Walter, C , Computing Power and Legal Reasoning, St.Paul, M N , West Publishing Co., 1985. pp. 105-130. Belew, R.K, A Connectionist Approach to Conceptual Information Retrieval , Proceedings of the First International Conference on Artificial Intelligence and Law. Boston, MA., 1987, pp.116-126. Bench-Capon, T.E., Neural Networks and Open Texture, Proceedings of the Fourth International Conference on Artificial Intelligence and Law. Amsterdam, A C M Press, 1993, pp.292-297. Bench-Capon, T.E., & Sergot, M., Towards a Rule-Based Representation of Open Texture inlaw, Journal of Law and Information Technology. 1992. Bing, J, The law of the books and the files: possibilities and problems of legal information retrieval, in G.P.V. Vandenburghe (ed.) Advanced Topics in Law and Information Technology, Hillsdale,NJ, Lawrence Erlbaum, 1990, pp. 151-182. Blair, D.C., & Maron, M.E., An Evaluation ofRetrieval Effectiveness for a Full-Text Document Retrieval System, Communications of the ACM. New York, 28,3, 1985, pp.289-299. Bochereau, L., Bourcier, D., and Bourgine, P., Extracting Legal Knowledge By Means of A Multi-Layered Neural Network Application to MunicipalJurisprudence,, Proceedings of the Fourth International Conference on Artificial Intelligence and Law, Amsterdam, ACM Press, 1993, pp.288 - 296. Brown,D., "The Third International Conference of Artificial Intelligence and Law; A Report and Comments", Journal of Law and Information Science. Vol.2, No.2, 1991, pp.223- 239. Buchanan, B.G., and Headrick,T.E., "Some Speculation About Artificial Intelligence and Legal Reasoning Stanford Law Review. 23, November 1970, pp.40-61.  186  Chomsky, N., A Review of Verbal Behavior by B.F.Skinner in Language. 35, 1959, pp.2658. Churchland,P.M, Representation and high-speed computation in neural networks, in Partridge, D, & Wilks, Y., The Foundations ofArtificial Intelligence - A Sourcebook, Cambridge, Cambridge University Press, 1990. Classe, A., Little Grey Cells, Accountancy. May 1993, v . l l l (n.1197), p.67. Crowe, H.E., Injuries to the Cervical Spine, paper presented at the meeting of the Western Orthopaedic Association, San Francisco, 1928. Dreyfus, S.E., and Dreyfus, H.L., Towards a reconcilation ofphenomenology andAI, in Partridge, D., and Wilks, Y., The foundations of artificial intelligence - A sourcebook, Cambridge, Cambridge University Press, 1990, pp.396-410. Fernhout,F., Using a Parallel Distributed Processing Model As Part OfA Legal Expert System. 1 Pre-Proceedings of the Third International Conference onLogica. Informatica. Diretto , Florence, Martino,A.A. (ed.), 1989, pp.255-268. Fodor, J.A., Why there STILL has to be a language of thought, in Partridge, D, & Wilks, Y., The Foundations of Artificial Intelligence - A Sourcebook, Cambridge, Cambridge University Press, 1990, pp.289-305. Fodor, J., and Pylyshyn, Z.W., Connectionism and cognitive architecture: A critical analysis, Cognition. 28, 1988, pp.3-71 Fuller, L., Positivism and Fidelity - A Reply to Professor Hart, 71 Harvard Law Review 1958, pp.630-672. Gideon, T.D., and Mital,V., Information Retrieval in Law using a Neural Network integrated with Hypertext, Proc. of International Joint Conference on Neural Networks (IJCNN), Singapore, 1991, pp.1819-1824. Hart, H.L.A., Positivism and The Separation ofLaw andMorals, 71 Harvard Law Review. 1958, pp.593-629. Hertz, D., Thinking Computers, The Miami Herald, February 5, 1989, p. Hinton, G.E., Connectionist Learning Procedures, Artificial Intelligence. Vol.40., 1989, pp. 189-234.  187  Holland, J., Complex Adaptive Systems, Daedalus. Vol.121, No.1, Winter 1992, pp. 17-30. Israel, D.S., The role of logic in knowledge representation, Computer. Vol. 16, No. 10, 1983. Kantrowitz, M., Horstkolte,E., and Joslyn, C , Answers to Frequently Asked Questions about Fuzzy Logic and Fuzzy Expert Systems, comp.ai.fuzzy, July ,1994, ftp.cs.cmu.edu, usr/ai/pubs/faqs/fuzzy/fuzzy.faq. Kinoshita, J., Neural Nets at Work, Scientific American. Vol.259, No.5, pp. 134-135. Klimasauskas, C , Applying Neural Networks. PC Al. May-June 1991, pp.20- 24 Kohonen, T., Corrolation Matrix Memories, IEEE Transactions on Computers. C-21: pp.353-359 Kowalski, A., Case-Based Reasoning and the Deep Structure Approach to Knowledge Representation, Proceedings of the Third International Conference of Artificial Intelligence and Law. A C M Press, Oxford, 1991, pp. Kowlaski, R., and Sergot, M., The Use of Logical Models in Legal Problem Solving, Ratio Juris. Vol.3, No.2, July 1992, pp.201-218. Kuhn, T.S., The Structure of, in Scientific Revolutions, Chicago, University of Chicago Press, 1970, 2nd edition International Encyclopedia of Unified Sciences, Vol 2, Number 2. Lashbrooke, E.C., Legal Reasoning and Artificial Intelligence, Loyola Law Review. Vol.34, No.2, 1988, pp.287-310. Leibowitz, J., Roll Your Own Hybrids, BYTE. July 1993, pp. 113-115. Loevinger, L., Jurimetrics - The Next Step Forward, Minnesota Law Review. Vol.33, 1948, pp.455- 493. McClelland, J.L., Can Connectionist Models Discover the Structure of Natural Language? in Minds ,Brains and Computers, Norwood, NJ., Ablex Publishing Co., 1992, Chapter 7, pp. 168-189. McCulloch, W.S., and Pitts, W., A Logical Calculus of the Ideas Immanent in Nervous Activity, 5 Bulletin of Mathematical Biophysics. 1943, pp.127 -147.  188  MacCrimmon, M.T., Facts, Stories and the Hearsay Rule, Pre-Proceedings of the III International Conference on Logica. Informatica. Diretto. Florence. Martino, A.A., (ed.) 1989, pp.461-475. MacLean,P., The triune brain,emotion and scientific bias, InF.O.Schmitt (ed.) The Neurosciences: Second study program, New York, Rockefeller University Press, 1970, pp.336-349. Meldman, J.A., A Structural Modelfor Computer-Aided Legal Analysis, Rutgers Computer and Technology Law Journal. 6, 1977, pp.27'-71. Mital V., and Gedeon, T.D., A Neural Network Integrated with Hypertextfor Legal Document Assembly, Proceedings of the Twenty-Fifth Hawaii International Conference on Systems Sciences. Hawaii, IEEE Computer Society Press, 1992, pp.533-539. Opdorp, van,G.J., and Walker,R.F., A Neural Network Approach to Open Texture, in Kasperson,H.W.K., & Oskamp, A., Amongst Friends in Computers and Law, Deventer, Boston, Kluwer Law and Taxation Publishers, 1990, pp.279-309. Philipps, L , Distribution ofDamages in Car Accidents Through The Use of Neural Networks, 13, Cardoza Law Review. 1991, pp.987-1000. Philipps, L., Are Legal Decisions Based on the Application ofRules or Prototype Recognition? Legal Science on the Way to Neural Networks, 2 Pre-Proceedings of the Third International Conference on Logica. Informatica. Diretto , Florence, Martino,A. A. (ed.), 1989. Raghhupathi, W., Levine,D.S., Bapi, R.S., and Schkade,L.L., Towards Connectionist Representation ofLegal Knowledge, in Aparicio,M., and Levine,D.S, Neural Networks for Knowledge Representation and Inference, Hillsdale, NJ., Lawrence Erlbaum, 1994, pp.267 - 282. Raghupathi, W., Schkade, L.L., Bapi, R.S., and Levine, D.S., Exploring Connectionist approaches to Legal Decision Making, Behavioral Science. 36, 1991, pp.133-139. Rau, L.F., Knowledge organization and access in a conceptual information system, 23 Information Processing and Management, pp.269-283. Rose, D.E, and Belew, R.K, Legal Information Retrieval: a Hybrid Approach, Proceedings of the First International Conference on Artificial Intelligence and Law. Boston, A C M Press, 1989, pp. 138-146.  189  Rissland, E., and Ashley, K., HYPO: A Precedent-Based Legal Reasoner, in Vandengurghe, G.P.V., Advanced Topics ofLaw and Information Technology, Hillsdale, NJ, Lawrence Erlbaum, 1989, pp.327- 356. Rosenblatt,E, The Perceptron: a probabilistic modelfor information storage and organization in the brain, Psychological Review 65.1958. pp.386-408. Schneider, W., Connectionism.Ts it a paradigm shiftfor psychology? Behavior Research Methods. Instruments and Computers. 19, 1988, pp.73-83. Schwartz,E.I & Treece, J.B, Smart Programs Go To Work, in Business Week. March 2, 1992. Sergot M.J., Sadri, F., Kowalski, R.A., Kriwaczek,F., Hammond, P., and Cory, H.T, The British Nationality Act as a logic program, Communications of the ACM. 29, pp.370383. Simon, H.A., & Newell, A., Heuristic problem Solving: The Next Advance in Operations Research, Operations Research. Vol 6, 1958. Smith, J.C, & Gelbart,Daphne, Beyond Boolean Search: FLEXICON, A Legal Text-based Intelligent System, Proceedings of the Third International Conference on Artificial Intelligence and Law. Oxford, England, ACM Press, 1991, pp.225-233. Smith, J.C, Law, Language andPhilosophy, Best-Printers, Vancouver, 1968. Smith, J.C, Review of The Automation ofLegal Reasoning, Stockholm, 5 Juridisk Tidskift. Nr.l, 1993-1994, pp.247- 256. Smolensky, P., Connectionism and the Foundations ofAl, in Partridge, D, & Wilks, Y., The Foundations ofArtificial Intelligence - A Sourcebook, Cambridge, Cambridge University Press, 1990, pp.306-326. Somers.J. et al., Catching Intruders in a Neural Net, Security Management. Feb 1994, v.38 (n2), pp.68-74. Spengler, J.J., Machine Made Justice - Some Implications in Baade, H.W., Jurimetrics, 1960, pp. Stone, J., Fallacies of the Logical Form in English Law, in Sayre, P., Interpretations of Modern Legal Philosophy, 1947, pp.696-735.  190  Susskind, R.E, Expert Systems in Law: Theory and Practice, Cambridge Lectures, 1987. Susskind, R.E., Expert Systems in Law: A Jurisprudential Approach to Artificial Intelligence and Legal Reasoning, Modern Law Review. 49, 1987, pp. 168-183 Swales, G.S., and Yoon,Y., Applying Artificial Neural Networks to Investment Analysis, Financial Analysts Journal. Sept -Oct, 1992, v.48 (n5), pp.78-81. Thagard, P., Explanatory coherence, in Behavioral and Brain Sciences. 1989, 12, pp.435502. Warner, D.R., A Neural Network-Based Law Machine: Initial Steps, 18 Rutgers Law and Technology Journal. 1992. pp.51Warner, D.R, Towards a Simple Law Machine, Jurimetrics Journal, 29, 1989, p451-487. Wilks, Y., Some comments on Smolensky andFodor, in Partridge, D, & Wilks, Y., The Foundations ofArtificial Intelligence - A Sourcebook, Cambridge, Cambridge University Press, 1990, pp.327-336. Zadeh, L., Fuzzy Sets, Infomation and Control. 8, 1965, pp.338-353. Zadeh, L., Commonsense Knowledge Representation Based on Fuzzy Logic, Computer. Vol.16, No. 10, 1983. unnamed author, unnamed article in Wired. September, 1994, p.39.  191  APPENDIX A TEST2.PRG (* symbol denotes commentary) SET ECHO OFF SET TALK ON SELECT 1 USE TARGET2 ALIAS T SELECT 2 USE BRNMAK3 ALIAS B SELECT 1 RET=.F. RET1=.F. RET2=.F. DO WHILE NOT. EOF() ? RECNOQ RET=T->DMGS_INCL .OR. T->MULT_ACC = Y .OR. T->OTHER INJ = 1 .OR. T>OTHERINJ = 2 .OR. T->OTHER_INJ = 3 .OR. T->SUBSQ_INJ = 1 .OR. T>SUBSQ_INJ = 2 .OR. T->SUBSQ INJ = 3 .OR. T->SUBSQ_INJ = 4 .OR. T>SUBSQ_INJ = 5 .OR. T->THINSKULL = 1 RET1= T->THN_SKULL = Y C .OR. T->THN_SKULL = Y G .OR. T->PRE_EXIST = 1 .OR. T->PRE_EXIST = 2 .OR. T->PRE_EXIST = 3 .OR. T->PRE_EXIST = 4 .OR. T->PRE_EXIST = 5 .OR. T->PRE_EXIST = 6 RET2= T->PRE_EXIST = 7 O K T->PRE_EXIST = 8 .OR. T->PRE_EXIST = 9 .OR. T->PRE_EXIST = 10 OR. T->PRE_EXIST = 11 .OR. T->PRE_EXIST = 12 .OR. T>PRE EXIST = 13 .OR. T->PRE EXIST = 14 .OR. T->PRE EXIST = 15  IF .NOT. RET .AND. .NOT. RET1 .AND. .NOT. RET2 SELECT 2 APPEND BLANK REPLACE B->NAME WITH T->NAME REPLACE B->JUDGE WITH T->JUDGE REPLACE B->DATE WITH YEAR (T->DATE)-1984 REPLACE B->INJDATE WITH YEAR (T->INJURYDATE)-1984 IF T->FTWORKLOSS >0 REPLACE B->FTWOLO WITH T->FTWORKLOSS 192  ELSE IF T->FTWORKLOSS = -2 REPLACE B->FTWOLO WITH 99 ELSE LF T->FTWORKLOSS = -1 REPLACE B->FTWOLO WITH (B->DATE-B->INJDATE)* 12 ELSE IF T->FTWORKLOSS = -3 REPLACE B->FTWOLO WITH 0 ELSE IF T->FTWORKLOSS = -4 REPLACE B->FTWOLO WITH 0 END IF ENDIF ENDIF ENDIF ENDIF LF T->PTWORKLOSS >= 0 REPLACE B->PTWOLO WITH T->PTWORKLOSS ELSE TF T->PTWORKLOSS = -2 REPLACE B->PTWOLO WITH 99 ELSE TF T->PTWORKLOSS = -1 REPLACE B->PTWOLO WITH (B->DATE-B->INJDATE)*12 ELSE TF T->PTWORKLOSS = -3 REPLACE B->PTWOLO WITH 0 ELSE IF T->PTWORKLOSS = -4 REPLACE B->PTWOLO WITH 0 ENDIF ENDIF ENDIF ENDIF ENDIF IF T->DISABILiTY >0 REPLACE B->DISAB WITH T->DISABILITY ELSE IF T->DISABILITY = 0.00 REPLACE B->DIS AB WITH 0 ELSE IF T->DISABILITY = -1  193  REPLACE B->DISAB WITH (B->DATE-B->INJDATE)/4 ELSE IF T->DIS ABILITY = -2 REPLACE B->DIS AB WITH 99 ELSE IF T->DISABILITY = -3 REPLACE B->DISAB WITH 0 ELSE IF T->DIS ABILITY = -4 REPLACE B->DISAB WITH 0 ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF IFT->RN= "" REPLACE B->RECOV WITH 0 ELSE IF T->RN = -1 REPLACE B->RECOV WITH (B->DATE-B->INJDATE)* 12 ELSE EFT->RN--2 REPLACE B->RECOV WITH 99 ELSE IF T->RN = -3 REPLACE B->RECOV WITH 0 ELSE IFT->RN = -4 REPLACE B->RECOV WITH 0 ELSE IFT->RN>0 REPLACE B->RECOV WITH T->RN ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF REPLACE B->SEVER WITH T->SN REPLACE B->QUANT WITH T->QU IF T->ATHLETIC >0  194  REPLACE B->ATHLETIC WITH 1 ELSE REPLACE B->ATHLETIC WITH 0 END IF IF T->RECREATION >0 REPLACE B->RECREAT WITH 1 ELSE REPLACE B->RECREAT WITH 0 ENDIF JJFT->HOBBY>0 REPLACE B->HOBBY WITH 1 ELSE REPLACE B->HOBBY WITH 0 ENDIF IF T->SOCIAL >0 REPLACE B->SOCIAL WITH 1 ELSE REPLACE B->SOCIAL WITH 0 ENDIF IF T->SEXUAL >0 REPLACE B->SEXUAL WITH 1 ELSE REPLACE B->SEXUAL WITH 0 ENDIF IF T->LIFESTYLE >0 REPLACE B->LIFESTYLE WITH 1 ELSE REPLACE B->LIFESTYLE WITH 0 ENDIF IF T->HOUSEWORK >0 REPLACE B->HOUSE WITH 1 ELSE REPLACE B->HOUSE WITH 0 ENDIF IF T->PSYCH >0 REPLACE B->PSYCH WITH 1 ELSE REPLACE B->PSYCH WITH 0  195  ENDIF IF T->DEPRESS >0 REPLACE B->DEPRESS WITH 1 ELSE REPLACE B->DEPRESS WITH 0 ENDIF IF T->PERSONLTY >0 REPLACE B->PERSON WITH 1 ELSE REPLACE B->PERSON WITH 0 ENDIF IF T->JOB_CHANGE >0 REPLACE B->JB CHANG WITH 1 ELSE REPLACE B->JBCHANG WITH 0 ENDIF REPLACE B->NECKP WITH T->NECKPAIN REPLACE B->CPS WITH T->CHRONPAIN REPLACE B->LBP WITH T->LOW_BACK REPLACE B->HEADAC WITH T->HEADACHES REPLACE B->MYOFAS WITH T->MYOFASCIAL REPLACE B->TMJ WITH T->TMJ REPLACE B->SPASMS WITH T->SPASMS REPLACE B->SHOULD WITH T->SHOULDER REPLACE B->FIBROS WITH T->FIBROSITIS REPLACE B->NUMBT WITH T->NUMB TINGLE REPLACE B->ANKYL WITH T->ANKYLOSING REPLACE B->FATIGUE WITH T->FATIGUE REPLACE B->SLEEP WITH T->SLEEPLOSS REPLACE B->THOE WITH T->THORACIC REPLACE B->DIZZY WITH T->DIZZINESS REPLACE B->MIDBACK WITH T->MTDBACK REPLACE B->DRUGD WITH T->DRUG_DEP IF T->CREDIBLE REPLACE B->CREDIB WITH 1 ELSE REPLACE B->CREDIB WITH 0 ENDIF  196  REPLACE B->SURGERY WITH T->SURGERY REPLACE B->HOSP WITH T->HOSPITAL LF T->RX REPLACE B->RX WITH 1 ELSE REPLACE B->RX WITH 0 ENDIF LF (T->PHYSIO="") REPLACE B->PHYSIO WITH 0 ELSE REPLACE B->PHYSIO WITH 1 ENDIF  IF (T->MASSAGE="") REPLACE B->MASSAGE WITH 0 ELSE REPLACE B->MASSAGE WITH 1 ENDIF IF T->CHIRO REPLACE B->CHIRO WITH 1 ELSE REPLACE B->CfflRO WITH 0 ENDIF IF T->PSYCHOL REPLACE B->PSYCHO WITH 1 ELSE REPLACE B->PSYCHO WITH 0 ENDIF IF T->PAINCLINIC REPLACE B->PAJNCL WITH 1 ELSE REPLACE B->PAINCL WITH 0 ENDIF IF T->ACUPUNCTUR REPLACE B->ACUPUN WITH 1 ELSE REPLACE B->ACUPUN WITH 0 ENDIF  197  IFT->COLLAR>0 REPLACE B->COLLAR WITH 1 ELSE REPLACE B->COLLAR WITH 0 ENDIF IF AT("A ,T->FIN)>0 REPLACE B->INFAMB WITH 1 ELSE REPLACE B->INFAMB WITH 0 IF AT ("Br',T->FIN)>0 REPLACE B->PENSION WITH 1 ELSE REPLACE B->PENSION WITH 0 IF AT ("B2",T->FIN)>0 REPLACE B->FINOTH WITH 1 ELSE REPLACE B->FINOTH WITH 0 IF AT ("C",T->FIN)>0 REPLACE B->LOSTJOB WITH 1 ELSE REPLACE B->LOSTJOB WITH 0 IF AT ("C2",T->FIN)>0 REPLACE B->RETIRE WITH 1 ELSE REPLACE B->RETIRE WITH 0 IF AT ("C3",T->FIN)>0 REPLACE B->STUDENT WITH 1 ELSE REPLACE B->STUDENT WITH 0 IF AT ("Q",T->FIN)>0 REPLACE B->WAGES WITH 1 ELSE REPLACE B->WAGES WITH 0 ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ,,  REPLACE B->GRMONTH WITH 0  198  REPLACE B->GRONG WITH 0 REPLACE B->GRPERM WITH 0 REPLACE B->CD1 REPLACE B->CD2 REPLACE B->CD3 REPLACE B->CD4 REPLACE B->CD5 REPLACE B->CD6  WITH 0 WITH 0 WITH 0 WITH 0 WITH 0 WITH 0  IFT->RN>=0 REPLACE B->GRMONTH WITH T->RN ELSE IFT->RN = -1.00 REPLACE B->GRONG WITH 1 ELSE IF T->RN = -2.00 REPLACE B->GRPERM WITH 1 ENDIF ENDIF ENDIF REPLACE B->JBTEMP WITH 0 REPLACE B->JBPERM WITH 0 IF T->JOB_PERF = 1 REPLACE B->JBTEMP WITH 1 ELSE IF T->JOB_PERF = 2 REPLACE B->JBPERM WITH 1 ENDIF ENDIF IF T->CAR_DAMAGE = 1 REPLACE B->CD1 WITH 1 ELSE IF T->CAR_DAMAGE = 2 REPLACE B->CD2 WITH 1 ELSE IF T->CAR_DAMAGE = 3 REPLACE B->CD3 WITH 1 ELSE IF T->CAR_DAMAGE = 4  199  REPLACE B->CD4 WITH 1 ELSE IF T->CAR_DAMAGE = 5 REPLACE B->CD5 WITH 1 ELSE IF T->CAR_DAMAGE = 6 REPLACE B->CD6 WITH 1. ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF SELECT 1 SKIP ENDDO  200  APPENDIX B LIST OF FACTS USED AS INPUTS NAME  Name of the case, usually only includes name of the plaintiff. Used to annote the case, rather than as an input. This fact was not used as an input but was retained in the database so that I could keep track of an individual case i.e. an annotation  JUDGE  Code used to protect the identity of the judge. Also used for annotation.  DATE  The date of the trial.  INJDATE  The date of the injury.  FTWOLO  Number of months lost from full-time work.  PTWOLO  Number of months lost from part-time work.  DISAB  Time of any total disability  RECOV  The Recovery Period (if expressed in numerical form.)  GRMONTH  General recovery period. This is largely the same as RECOV above. The original database uses two fields.  GRONG  If the recovery is expressed in the case as being "ongoing." A logical value (i.e. yes or no.)  GRPERM  If the general recovery is described as "permanent." Logical value.  ATHLETIC  Athletic impairment? Logical value.  RECREAT  Recreational impairment? Logical value.  HOBBY  Significant/Favorite hobby impaired? Logical value.  SOCIAL  Social life impaired? Logical value.  SEXUAL  Sexual impairment?Logical value.  LIFEST  Lifestyle impairment? Logical value.  201  HOUSE  Housework impairment?Logical value.  PSYCHO  Psychological impairment? Logical value.  DEP  Depression? Logical value.  PER  Personality affected, changed? Logical value.  IB TEMP  Job performance temporarily impaired? Logical value.  JBPERM  Job performance permananently impaired? Logical value.  IB CHAN  Job change necessitated? Logical value.  NECK  Persistent neckpain or stiffness? Logical value.  CPS  Chronic Pain Syndrome diagnosed? Logical value.  LBP  Lower Back Pain? Logical value.  HEADAC  Persistent headaches? Logical value.  MYOFAS  Myofascial pain syndrome diagnosed? Logical value.  TMJ  Jaw problems diagnosed? Logical value.  SPASMS  Muscle spasms diagnosed? Logical value.  MTDBACK  Midback pain diagnosed? Logical value.  DRUGD  Drug dependency? Logical value.  CREDD3  Credibility problem? Logical value.  SURGERY  Surgery required? Logical value.  HOSP  Hopitalization required? Logical value.  RX  Prescription medicine required? Logical value.  PHYSIO  Physiotherapist treatment required? Logical value.  MASSAGE  Massage treatment required? Logical value.  CHTRO  Chiropractor appointments? Logical value.  202  PSYCHO  Psychologist/Psychiatrist appointments required? Logical value.  PATNCLrNIC Pain Clinic attendance required? Logical value. ACUPUNCTUR  Accupuncture appointements/treatment required.  COLLAR  Collar required? Logical value.  INFAMB  Future income loss - Infant/ambitions? Logical value.  PENSION  Future income loss - Pension? Logical value.  FINOTH  Future income loss - other loss? Logical value.  LOSTJOB  Future income loss - loss of job? Logical value.  RETIRE  Future income loss - early retirement? Logical value.  STUDENT  Future income loss - Lost opportunity to be a student? Logical value.  WAGES  Lost wages? Logical value.  CD1-6  Car damage (1 = $0; 2 = <$500; 3 = $500-1,000; 4 = $1,000-2,500; 5 $2,500-$5,000; 6 = >$5,000.)  203  APPENDIX C (Segment of T E S T 3 . P R G changing 1 output to 10. A l l other code remains unchanged.)  IF T->QU <5000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 1 ELSE IFT->QU<10000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 1 REPLACE B->QU5 WITH 0 ELSE IFT->QU<15000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 1 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ELSE IF T->QU <20000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0  204  REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 1 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ELSE IF T->QU <30,000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 1 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ELSE IF T->QU <40,000 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 1 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ELSE LF T->QU <60,000 REPLACE QU100 WITH 0 REPLACE QU80 WITH 0 REPLACE QU60 WITH 1 REPLACE QU40 WITH 0 REPLACE QU30 WITH 0 REPLACE QU20 WITH 0 REPLACE QUI 5 WITH 0 REPLACE QU10 WITH 0 REPLACE QU5 WITH 0 ELSE IFT->QU<80,000 REPLACE QUI00 WITH 0 REPLACE QU80 WITH 1 REPLACE QU60 WITH 0 REPLACE QU40 WITH 0 REPLACE QU30 WITH 0  205  REPLACE QU20 WITH 0 REPLACE QUI 5 WITH 0 REPLACE QU10 WITH 0 REPLACE QU5 WITH 0 ELSE IFT->QU<100,000 REPLACE B->QU100 WITH 1 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ELSE IF T->QU<150,000 REPLACE B->QU150 WITH 1 REPLACE B->QU100 WITH 0 REPLACE B->QU80 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU30 WITH 0 REPLACE B->QU20 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU10 WITH 0 REPLACE B->QU5 WITH 0 ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF ENDIF  206  APPENDIX D (Segment of Test4.Prg changing 1 output to 6 outputs. A l l other outputs remained the same.) IFT->QU<120,001 REPLACE B->QU120 WITH 1 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU25 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU7 WITH 0 ELSE IFT->QU<60,001 REPLACE B->QU120 WITH 0 REPLACE B->QU60 WITH 1 REPLACE B->QU40 WITH 0 REPLACE B->QU25 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU7 WITH 0 ELSE IFT->QU<40,001 REPLACE B->QU120 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 1 REPLACE B->QU25 WITH 0 REPLACE B->QU15 WITH 0 REPLACE B->QU7 WITH 0 ELSE IFT->QU<25,001 REPLACE B->QU120 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU25 WITH 1 REPLACE B->QU15 WITH 0 REPLACE B->QU7 WITH 0 ELSE IF T->QU<15,001 REPLACE B->QU120 WITH 0 REPLACE B->QU60 WITH 0 REPLACE B->QU40 WITH 0 REPLACE B->QU25 WITH 0 REPLACE B->QU15 WITH 1  207  ELSE  ENDIF  ENDIF  ENDIF  ENDIF  ENDIF  REPLACE B->QU7 WITH 0 IFT->QU<7,501 REPLACE REPLACE REPLACE REPLACE REPLACE REPLACE ENDIF  208  B->QU120 WITH 0 B->QU60 WITH 0 B->QU40 WITH 0 B->QU25 WITH 0 B->QU15 WITH 0 B->QU7 WITH 1  APPENDIX E d B A S E Coding for the "fuzzification of Quantum. (All other code remains unchanged.) BAND = 0 DO WHILE BAND < 5 NODEVALUE = 0 BAND = BAND + 1 DO CASE CASE BAND = MIN = 0 MTD = 54 M A X = 108 CASE BAND = MTN = 54 MTD= 108 MAX = 164 CASE BAND = MIN = 108 MED = 164 MAX = 216 CASE BAND = MIN= 164 MED = 216 MAX = 270 CASE BAND = MIN = 216 MED = 270 MAX = 324 ENDCASE  1  && all 5 band widths are && an identical 108 units wide  2  3  4  5  IF B->QU > MIN .AND. B->QU <= MED NODEVALUE = (B->QU - MIN)/(MID - MIN) ELSE IF B->QU > MED AND. B->QU <= M A X NODEVALUE = 1-(B->QU - MED)/(MAX - MED) ENDEF ENDIF DO CASE CASE BAND = 1 REPLACE B->BAND1 WITH NODEVALUE  209  CASE BAND = 2 REPLACE B->B AND2 WITH CASE BAND = 3 REPLACE B->BAND3 WITH CASE BAND = 4 REPLACE B->BAND4 WITH CASE BAND = 5 REPLACE B->B AND5 WITH ENDCASE ENDDO && WHILE band < 5  NODEVALUE NODEVALUE NODEVALUE NODEVALUE  210  APPENDIX F GLOSSARY OF NEURAL NETWORK TERMS Artificial Neural Network A model made to simulate a biological neural systems; it may model brain processes or brain capabilities. Back-Propogation A supervised learning method in which the error signal is fed back through the network altering weights in order to prevent the same error from recurring. Back-Propogation Rule Also called the Generalized Delta Rule, training consists of running patterns forward through the network layers, then propogating the errors backwards and updating the weights in order to reduce the errors. Biological Neural Networks The combination of neurons that go to make up the brain of any living species. Connectionists Those researchers who accept the new electronic approach to computation using highly interconnected neuron-like components that take certain aspects of neurology and apply these to create "intelligent" computer programs, in the sense that they have the ability to improve their performance. Connection The path between two neurons across which information passes. These are the equivalent of synapses in biological neural networks. The weights on the connections determines the strength of the signals. Epoch One complete presentation of the training set to the network during training. Error The difference between the network response and the required network response.  211  Fault Tolerance The degree to which the network can deal with noisy data. Feedback Network A network in which the inputs to any given neuron can be taken from outputs either of its own layer or the following layer(s). Hidden Layer The layer of neurons that are hidden from the user - neither its inputs or outputs are fed to the outside world. There may be one or more layers of neurons in a given network. If there is only one hidden layer of neurons. Layer A group of neurons that have a specific function i.e. inputs, outputs, or hidden layers. Learning Rule The algorithm used for modifying the connection strengths/ weights in response to training patterns while training is being carried out. Multi-Layer Perceptron The most common of neural network architectures in which perceptrons are connected in layers. The most common of these architectures is the three layer network. Neuron The smallest processing element in the network. The neuron receives inputs from outputs of other preceding neurons. Using the inputs the neuron produces an output using simple vector matrix multiplication, (see Transfer Function for more on this point.) This output is either fed to other neurons or to directly to the outside world. Output Neuron The neuron in the network that gives the output, or the results. Over training Training a neural network involves the presentation of problems facts into the network. Over training is the result of training the network to respond very accurately to the training set only which means that it is less able to generalize.  212  Perceptron A one layer network in which inputs are summed to provide an output. See Section 1.17.3 of Chapter 1. Pre-Processing The process prior to processing by the neural network in which the raw data is manipulated into a network-readable form. For an example of a pre-processing programming see the Test2.prg program in Appendix A. Post-Processing Modification of the output of the network before applying them to the real world. Supervised Learning The process by which examples of presented to the network together with examples of the expected outputs. If the outputs from the network differs from the output specified the network weights are modified. Section 1.11.3 of Chapter 1 for a description of both supervised and unsupervised learning. Test Set Those examples that are set aside, that have not been used in the training process. Once training has been completed, the network can be tested using the Test set to see how accurate it is. Topology Another name for the architecture of the network or the way in which the neurons are arranged. Training The process during which the neuron weights are adjusted so that the network performs the function for which it was designed. Transfer Function The function that translates the internal value calculated by the neuron to a value suitable to represent its output. There are a variety of possible functions. See Section 1-10 in Chapter 1.  213  Unsupervised Learning The training set has no output response associated with the input stimuli. The network makes associations between the input data.  Weight The value associated with a connection between neurons in the network. The value of the weights determines how much of the output of one neuron is fed into the input of the next.  214  APPENDIX G GLOSSARY OF BRAINMAKER TERMS Bad Facts The total number of facts that the training network has got "wrong" in that data run. Data Run The process by which all facts/cases in the training data set are presented to the neural network program once. The training of a neural network will usually involve many, possibly hundreds or thousands of data runs. Also known as "epochs." Good Facts The total number of facts that the training network has got "right" in that data run. Learning Rate This determines how large or small an adjustment is made by the network during training. Output This is what the network produces during training by using the back-propogation rule. Pattern Output This is the set of number or symbols that the network should be producing, or very close to it. Running This function puts input data through a trained neural network and provides the user with the neural network output usually on screen. When users want to put the neural network to work they "run " the network. Testing This function puts input-output pairs of data that were not used during training through the neural network program and allows the user to see how well (or how badly) the network has trained. Tolerance  215  The percentage of the range of output. No corrections are made to the network if the output is within the tolerance Total The total cumulative number of facts evaluated by the training network. Training This function puts the input-output pairs of data through the neural network program repeatedly and corrects the network as it learns.  216  APPENDIX H Graphical Representation of Figure 4.13  Or  CM  • •  T3 £O  o CM  O)  E  oo  O  •  J2  CO  • 1  •s  CO  c o o ID  d  £  OO  < .2 II m as £  o •  co £  CM  E  • •  £ z _  o  • -D TJ  a  • •  <  i  •re  < II  m  •  In iTJ  CT CO O  • OB  JS  •  CO  o o o in  *x  o o o o"  o o o in CO  o o o o CO  o o o in CM  o o o o"  • o o o m"  217  o o o o  i- — 1  $ ui senj^A CM piBMV  a)  o o o in  (0  a  *- O  • • CM  3  •  O  •  II  TD  C  o E .2 o  • ••  O  JS CO  •  a  • •  •  co  C  O  •  O  T5  <E  jQ  O O CO TD II <_>  a •  iu 0to  • •  .5  i  (A  •  4- o  •  am  <  •  zs •  •  • • •  <  CO  •  II  K5  3 CT  CO  -Ii  a ••  o  CQ  • o o o  in  o o o o  o o o  iri 00  o o o d  o o o LO CM  o o o d  o o o  tri  CO $ ui sanre/\ P J B M V  218  o o o  o o o  in  cd O  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 44 38
China 39 46
Canada 33 0
Germany 16 15
United Kingdom 14 0
New Zealand 7 0
Egypt 4 0
France 3 0
Ghana 3 0
Malaysia 3 0
Pakistan 3 3
Italy 2 2
Singapore 2 0
City Views Downloads
Unknown 45 21
Shenzhen 24 30
San Francisco 12 0
Ashburn 10 0
Southend-on-Sea 10 0
Hangzhou 9 0
New Westminster 8 0
Saint-Quentin 7 0
Toronto 6 0
Seattle 5 0
Waterloo 5 0
Montreal 4 0
San Mateo 4 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items