A Functional Programming Language with Context Free Grammars as Data Types by Violet Rose SyTotiuk B.Sc , The University of Alberta, 1983 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E STUDIES Department of Computer Science We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A October 1984 © Violet Rose Syrotiuk, 1984 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of Computer Science The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date OCZOJH^ i i , iABH Abstract Specifying data structures is important in programming languages. One approach uses initial algebras, but constructing specifications using them is difficult and hampered by restrictions. The recursive term algebra approach, uses recursive definition along with basic operations implied by the definition itself to induce an algebra of terms. Such a recursive definition may be transformed into an unambiguous context free grammar. These grammars are used to define data structures in a functional programming language named Kaviar. Provisions are made for non-context free data structures as well. This approach avoids the difficulties encountered in using the initial algebra method. ii Table of Contents Abstract i i List of Tables vi i i List of Figures ix Acknowledgements x 1. Introduction 1 1.0 Framework 1 1.1 A n Alternative Approach 1 1.2 Implementation 2 1.3 Evaluation of the New Approach 2 1.4 Overview 2 2. Formal Specifications of Data Abstractions 3 2.0 Introduction 3 2.1 The Role of Specifications 3 2.2 The Specification Unit 4 2.3 The Need for Data Abstraction 5 2.4 Properties of Specifications of Data Abstractions 8 2.5 Criteria for Evaluating Specification Methods 10 2.6 Algebraic Preliminaries 11 2.6.1 Signatures 11 2.6.2 Algebras 12 iii IV 2.6.3 Homomorphisms and Isomorphisms 13 2.6.4 Word Algebras and Initial Algebras 15 2.6.5 Defining Specifications Using Initial Algebras 17 2.6.6 Further Reading 18 2.7 The Programming Language O B J 19 2.7.1 The Structure of O B J Objects 19 2.7.2 Definition of O B J Operators 20 2.7.3 The Form of O B J Equations 21 2.7.4 Evaluation in O B J 22 2.7.5 Other Features of O B J 23 2.8 Introducing a New Technique 23 3. A n Alternative Approach 24 3.0 Introduction .'. 24 3.1 Evaluation of the Algebraic Approach 24 3.1.1 Formality and Constructibility 24 3.1.1.1 The Problem of Infinite Sets of Equations 24 3.1.1.2 Problems when the Church-Rosser Property is Absent 27 3.1.2 Comprehensibility and Minimality 28 3.1.3 Range of Applicability and Extensibility 28 3.1.4 Conclusions of the Evaluation 28 3.2 The Recursive Term Algebra Approach 29 3.2.1 Term Algebras versus Initial Algebras 29 3.3 Data Types as Context Free Grammars 32 3.4 Sub-Data Types 35 V 3.5 A Language Based on this Approach 36 4. The Functional Programming Language Kaviar 37 4.0 Introduction 37 4.1 Lexical Conventions 37 4.1.1 Identifiers 37 4.1.2 Keywords 37 4.1.3 Constants 38 4.1.4 Strings 38 4.1.5 Separators 38 4.2 The B N F of Kaviar 38 4.2.1 A Kaviar Program 38 4.2.2 Type Definitions 39 4.2.2.1 Context Free Data Types 39 4.2.2.2 Sub-Data Types 41 4.2.3 Function Definitions 43 4.2.3.1 Recursive Functions 43 4.2.3.2 The Parsing Method of Earley 45 4.2.3.2.1 Conventions 46 4.2.3.2.2 Terminology 46 4.2.3.2.3 Informal Explanation of Earley's Algorithm 47 4.2.3.2.4 Earley's Parsing Algorithm 48 4.2.3.2.5 Example of the Operation of Earley's Algorithm 50 4.2.3.2.6 Modifications Required to Earley's Parsing Algorithm 51 4.2.3.2.7 Type Checking Sub-Data Types 53 vi 4.2.3.2.8 Interpreting a Recursive Function 53 4.2.3.2.9 Interpreting Recursive Expressions 54 4.2.3.3 Unification Functions 56 4.2.3.4 Finding a Unifier 57 4.2.3.5 Evaluating Unification Expressions 59 4.2.4 Invocation Expressions 59 4.2.5 Built in Functions 60 4.3 How to R u n a Kaviar Program 61 4.4 Implementation Restrictions and Details 61 4.5 Evaluating this Approach 62 5. Evaluation of the Recursive Term Algebra Approach 63 5.0 Introduction 63 5.1 Formality and Constructibility 63 5.2 Comprehensibility and Minimality 68 5.3 Range of Applicability 68 5.4 Extensibility 68 5.5 Possible Research 69 6. Further Research and Conclusions 70 6.0 Introduction 70 6.1 Possible Extensions to Kaviar 70 6.1.1 Parameterized Types 70 6.1.2 N-ary Unification Functions 71 6.2 A Possible Application of this Method 71 6.3 Conclusions 72 Vl l Bibliography 73 L i s t o f Tables Table I: Parse Lists for G with a/=(x.y) vi i i List of Figures Figure 1 : A concept and all programs which implement the concept correctly 3 Figure 2: A concept, its specification and all programs derived from the specification 3 Figure 3: O L D _ S T A C K before P U S H 6 Figure 4: N E W _ S T A C K after P U S H 6 Figure 5: Sequence of stack configurations 26 ix Acknowledgements M y thanks to Drs. A k i r a Kanda and Harvey Abramson for their guidance, patience and cooperation, not to mention financial support. The enthusiasm and encouragement of my parents, Vera, Alan and Myron kept my motivation high. I could not ask for better friends than I have found in Steve, Denise, Brent and Bob. Gracias Eric, por lo tiempo pasado, y a lo tiempo que ha de venir. x Introduction 1. Framework Specifications provide a means of abstracting ideas to reduce a problem to a more manageable and comprehensible size, making more apparent solutions to problems. In programming languages, the function and procedure are commonly seen abstrac-tions. Recently, an additional, powerful aid to abstraction has been identified - the data abstraction. A specification of a data structure should supply a representation indepen-dent characterization of the structure so the user need not be concerned with implemen-tation details. Six criteria which a technique for specifying such a characterization should satisfy are outlined. The development of the initial algebra technique is described along with a program-ming language, O B J , which uses initial algebras for specifying data types. 1.1. An Alternative Approach In evaluating the initial algebra approach for the specification of data types it is found that such specifications are often very difficult to construct. Indeed, there are several restrictions imposed on the user. These problems stimulated the search for an alternative method of specification. The recursive term algebra approach recursively defines not only a set of terms, but also basic operations over them. In other words, recursive definition induces an algebra of terms. The same algebra may be generated by an unambiguous context free grammar. Hence, context free grammars are used to specify data types. The treatment of non-context free data structures is also discussed. 1 2 1.2. Implementation A functional programming language, named Kaviar, has been designed and imple-mented. It uses context free grammars for the specification of data types. Sub-types, which may be non-context free, are defined by means of predicates which restrict some context free type. Functions may be defined by one of two methods: by recursion, or by unification. A full description of the language, some examples, and the operation of the inter-preter is given. 1.3. Evaluation of the New Approach The recursive term algebra approach is evaluated with respect to the criteria. Specifying data types using this approach overcomes the difficulties of the initial algebra method. 1.4. Overview Chapter 2 provides the framework. Chapter 3 first evaluates the initial algebra approach. Then, the recursive term algebra approach is formally presented. The pro-gramming language Kaviar is described in Chapter 4. Chapter 5 evaluates this new approach. Finally, Chapter 6 proposes possible extensions to the method, an application of the method and some conclusions. 2. Formal Specifications of Data Abstractions 2. Introduct ion This chapter introduces the notion of specifications in connection with abstract data types. Some criteria are established for evaluating the practical potential of specification techniques. One particular technique, the initial algebra approach is dis-cussed along with a programming language based on it, O B J . 2.1. T h e Ro le of Specifications Programming is a problem solving activity. Consequently, a correct program implements a concept that exists in someone's mind. The concept can usually be imple-mented by many programs a situation depicted in Figure 1, but, realistically, only a small, finite number are of interest. Concept Figure 1: A concept and all programs which implement the concept correctly. Since programs can become quite complex, a specification of what a program is intended to do should be given before the program is actually coded. This philosophy is described by Figure 2. Concept .1 Specification i • • • i Figure 2: A concept, its specification and all programs derived from the specification. The purpose of the specification is to aid the understanding of the concept involved, and to increase the likelihood that the program, when implemented, will perform the intended function. So far, no mention of what exactly a specification is and how it can 3 4 satisfy these goals has been given. The next sections serve to answer these questions. i 2 .2 . T h e Specification U n i t The nature of a specification is largely dependent on the program unit being specified. A specification of too small a unit does not correspond to any useful concept, just as the program comment x : = x + 1; / * Increase x by 1 * / does not convey much information. The specification unit should correspond naturally to a concept, or abstraction, found useful in thinking about the problem to be solved. In many of the existing programming languages there appear several implicit abstractions which are built into the language. The notions of arrays and records are examples of these, but perhaps the most commonly used kind of abstraction is the func-tional or procedural abstraction in which a parameterized expression or group of state-ments is treated as a single operation. When a function or procedure is made use of, the programmer is concerned only with what it does, not with the algorithm that is exe-cuted. Multiprocedure modules are also important in system design. Here, procedures are grouped together because they interact in some way. They may share certain resources as well as information. Considering the entire group of procedures as a module permits all information about the interactions to be hidden from other modules. Other modules obtain information about the interactions only by invoking the procedures in the group. The purpose of hiding not only requires making visible certain essential properties of a module but making inaccessible certain irrelevant details that should not affect other parts of a system. Although functions, procedures and modules offer a powerful aid to abstraction they alone are not enough. A n additional kind of abstraction has been identified - the data abstraction. It is comprised of a group of related functions or operations that act 5 upon a particular class of objects. There is one constraint - the behaviour of the objects can be observed only by applications of the operations. As with other methods of abstraction, the programmer is not concerned with how the data objects are represented. Instead, the operations on the objects are regarded as atomic even though several machine instructions may be necessary to perform them. Some examples of data abstractions are segments, processes, files, and abstract dev-ices of various sorts, in addition to the more ordinary stacks, queues, and symbol tables. Multiprocedure modules are also used in the implementation of data abstractions. Each procedure in the module implements one of the operations. The module as a whole implements the specified properties of the abstract type. The word abstract, as used here refers to a data type that is supposed to be independent of its representation, in the sense that details of how it is implemented and the representation of the type itself are to be actually hidden or shielded from the user. The user is provided with certain opera-tions and so only needs to know what they are supposed to do, not how they are done. Thus, a specification provides a means of describing the abstractions a programmer can make use of to solve the problem at hand. Since the need and importance of func-tional and procedural abstractions are well known while the need for data abstraction is, perhaps, less understood, the next section attempts to explain their necessity. 2.3. T h e Need for D a t a Abs t rac t ion As an example of the problems which crop up when the data abstraction is ignored and the operations in the group are given input/output specifications independently of one another, consider the following specification for the operation P U S H for stacks of integers. P U S H : S T A C K X I N T E G E R — S T A C K The input/output specification defines the output value of P U S H (the stack object returned by PUSH) in terms of the input values of P U S H (a stack object and an 6 integer). This can be done by defining a structure for stack objects and then describing the affect of P U S H in terms of this structure. In Pascal, a stack of integers might be defined as: type S T A C K == record T O P : integer; D A T A : array [1..100] of integer end; where the record selector T O P points to the topmost integer in the stack. Then, assum-ing P U S H is a function, the meaning of, N E W _ S T A C K := PUSH( O L D _ S T A C K , i ) that is, a description of the result of its execution could be stated as 1 for all j { 1 < j < O L D _ S T A C K . T O P — N E W _ S T A C K . D A T A [ j ] = O L D _ S T A C K . D A T A ( j ] & N E W _ S T A C K . T O P = O L D _ S T A C K . T O P + 1 & N E W _ S T A C K . D A T A [ N E W _ S T A C K . T O P ] = i } In words, and by example, if O L D _ S T A C K is as shown in Figure 3, then executing, say N E W _ S T A C K = PUSH( O L D _ S T A C K , 4 ) results in Figure 4. T O P : D A T A : 100 T O P : D A T A : 3 1 2 3 100 . Figure 3: O L D _ S T A C K before P U S H Figure 4: N E W _ S T A C K after P U S H That is, all of the entries in O L D _ S T A C K are copied to corresponding index positions in N E W _ S T A C K . The top of the stack in N E W _ S T A C K becomes one greater than that of O L D _ S T A C K and then the new top of stack entry is set to the integer input parame-ter, 4. 1 Ignoring for the moment the behaviour of PUSH when the stack is full, that is, when OLD STACK.TOP=100. 7 If the specification for P O P is P O P : S T A C K S T A C K X I N T E G E R and, again, assuming P O P is a function, the meaning of i = P0P( S T A C K ) could be stated as: i = S T A C K . D A T A [ S T A C K . T O P ] & S T A C K . T O P = S T A C K . T O P - 1 There are several things wrong with these specifications. The major flaw is that they do not describe the concept of stacklike behaviour. For example, a theorem stating that P O P returns the value most recently PUSHed on the stack can only be inferred from the details of the specifications. This method of deriving theorems is undesirable because the programmer must get involved with the detail (which is really implementa-tion information), rather than stating the concept directly. Another problem is that the independence of the specifications of P U S H and P O P is deceptive. A change in the specification of one is almost certain to lead to a change in the specification of the other. For example, in addition to being related through the structure chosen for the stack, the specifications of P U S H and P O P are also related in their interpretation of this structure - the decision to have the selector T O P point to the topmost integer rather than, say, to the first available slot. If a data abstraction such as S T A C K is specified as a single entity, much of the extraneous detail concerning the interactions between the operations can be eliminated, and the effects of the operations can be described at a higher level. Some specification techniques for data abstractions, such as the initial algebra and recursive term specification techniques to be presented in later sections, use input/output specifications to describe the effects of the operations. The difference is these specifications are expressed in terms of abstract objects with abstract properties instead of the very specific properties used in the example above. 8 In some instances it may not even be necessary to describe the individual opera-tions separately. Instead, the effects of the operations can be described in terms of one another. For example, the effect of P O P might be defined in terms of P U S H by POP( PUSH( stack, i ) ) = i which states that P O P returns the value most recently PUSHed. Now that the need for data abstractions has been established there are some pro-perties of them that should be pointed out. The next section presents some of these. 2.4. Propert ies o f Specifications of D a t a Abstract ions Specification techniques for data abstractions can be viewed as defining something very like a mathematical discipline. Just the way in which number theory arises from specifications, like Peano's axioms for the natural numbers, the discipline arises from the specification of the data abstraction. The domain of the discipline, the set on which it is based, is the class of objects belonging to the data abstraction, and the operations of the data abstraction are defined as mappings on this domain. The theory of the discip-line consists of the theorems and lemmas derivable from the specifications. The information contained in a specification of a data abstraction can be divided into two parts. Information about the actual meaning or behavior of the data abstrac-tion is described in the semantic part. This description is expressed using a vocabulary of terms or symbols defined by the syntactic part. The first symbols defined by the syntactic part of a specification identify the abstraction being defined and its domain. Usually, an abstraction has a single class of defined objects, and, typically the same symbol denotes both the abstraction and its class of objects. Thus the objects belonging to the data abstraction S T A C K are referred to as S T A C K s . The remaining symbols introduced by the syntactic part name the operations of the abstraction and define their input/output specification. For example operations of the 9 stack abstraction might include: C R E A T E : — S T A C K P U S H : S T A C K X I N T E G E R — S T A C K P O P : S T A C K — S T A C K T O P : S T A C K — I N T E G E R E M P T Y : S T A C K — B O O L Some observations should be noted about this example. First, more than one domain appears in the specification. In practice, the specifications for many interesting data abstractions include more than one domain. Normally, only one of these, here S T A C K , is being defined with the remaining domains, here I N T E G E R and B O O L , and their properties are assumed to be known. The second observation is that the group of operations can usually be partitioned into three categories. The first is those consisting of operations that have no operands belonging to the class being defined, but which yield results in the defined class. This includes the constants, represented as argumentless operations, such as C R E A T E . The second category consists of those operations, like P U S H and P O P , which have some of their operands in, and yield their results in, the defined class. Those operations, T O P and E M P T Y for stacks, whose results are not in the defined class make up the third category. A third observation is that the input/output specification of an operation does not necessarily have to correspond to the way the operation would be programmed. For example, P O P might be programmed in a way which removes a value from a stack, and returns both the new stack and the value. The semantic part of the specification uses the symbols defined in the syntactic part to express the meaning of the data abstraction. Two different approaches are used in capturing this meaning: either an abstract model is provided for the class of objects and the operations are defined in terms of the model, or the class of objects is defined implicitly through descriptions of the operations. 10 Equipped with the idea of what a specification is, there remains the problem of devising a method of using them. The next section establishes some criteria for evaluat-ing such methods. 2.5. C r i t e r i a for E v a l u a t i n g Specification Methods Liskov and Zilles [Liskov&Zilles,1978] have outlined a number of requirements an approach to specification should satisfy to be useful. The criteria, described below, include practical as well as theoretical considerations. This first criterion should be satisfied by any specification technique. 1) Formality. A specification method should be formal. Specifications themselves should be written in a notation which is mathematically sound. This allows the formal specification techniques to- be studied mathematically, so that other interesting questions such as the equivalence of two specifications may be set and answered. For those interested in program verification this is especially important. The next three criteria address the construction of specifications. 2) Constructibility. It should be possible to construct specifications without much difficulty. Two aspects of the construction process are of interest here: the difficulty of constructing a specification in the first place, and the difficulty in knowing that the specification captures the concept. 3) Comprehensibility. Properties of specifications which determine comprehensi-bility are size and clarity. Clear, small specifications are desirable since they are usually easier to understand than larger ones. 4) Minimality. It should be possible, using the specification method, to construct specifications which describe the properties of the concept precisely and unam-biguously in a way which adds as little extraneous information as possible. In 11 particular, a specification should say what functions a program should per-form, but little, if anything, about how the functions are performed. Flexibility and generality are issues addressed by the last two criteria. 5) Range of Applicability. There is a class of concepts associated with each specification technique which can be described in a natural and straightfor-ward way, thus satisfying the criteria dealing with construction. Concepts outside of this class can only be defined with difficulty, if they can be defined at all. Clearly, the larger the class of concepts which may be easily described by a technique, the more useful the technique. 6) Extensibility. It is desirable that a minimal change in a concept result in a similar small change in its specification. This criterion especially impacts the constructibility of specifications. A position has been reached where a specification technique may be examined. The technique of interest is an algebraic one. It is the precursor to the recursive term algebra technique introduced in Chapter 3. 2.6. A lgebra ic Prel iminar ies Algebraic ideas have shown up in connection with abstract data types. Before intro-ducing such a specification technique it is perhaps worthwhile reviewing some algebraic fundamentals. Those already familiar with these fundamentals can move on to §2.7 with no loss of continuity. 2.6.1. Signatures In programming terms, a signature corresponds to a collection of declarations, declaring types, constants and procedures: I N T E G E R , B O O L , S T A C K : type 12 C R E A T E : function() result S T A C K P U S H : func t ion(STACK, INTEGER) result S T A C K T O P : function(STACK) result I N T E G E R P O P : function(STACK) result S T A C K E M P T Y : function(STACK) result B O O L Notice that the types are just named, nothing is said of what they consist of. Bodies of procedures are not given. They can be viewed as being like the forward refer-ences of Pascal. Constants are regarded as functions with no arguments. Henceforth, following usual mathematical practice, the word sort will be used for type, and operator for function names. More formally, a signature E , is a pair < S , H > consisting of a set of sorts S = { « i • • • « „ } , and a set of operators Q={w 1 • • • OJ„ } each with given input and out-put sorts. That is, the operators have the form S * XS. For example, the operations of the stack abstraction may include: C R E A T E : - » S T A C K P U S H : S T A C K X I N T E G E R — S T A C K T O P : S T A C K — I N T E G E R P O P : S T A C K — S T A C K E M P T Y : S T A C K — B O O L Notice this data type, a stack of integers, involves several different sorts, including truth values of sort B O O L , integers of sort I N T E G E R , and stack states of sort S T A C K . A l l things of sort *,• are lumped together into a set called the carrier of sort s,. Unfortunately, signatures alone are not enough to specify a data type since they do not fully describe what is going on. For this, an algebra is needed. 2.6.2. Algebras When an association between some particular set with the sort and some particular function with each operator is made an algebra is the result. That is, an algebra is a signature together with a function taking sorts to sets and another function taking operators to functions. 13 A n I-indexed family of sets is a function from / to sets. If NATURAL={0 ,1 ,2 , . . . } then a function mapping 0 to {a,b}, 1 to {a,c,d}, 2 to 0 and so on is a N A T U R A L -indexed family of the set of letters. Then, more formally, if E is a signature, a E-fl/<7e6ra A is an S-indexed family of sets, \A\, denoting the carrier of A, together with an S* X S-indexed family of func-tions uul :Clul—*(\A | U —• | A | , ) where u ES', and a £ 5 . A n ^-sorted E-algebra A is, essentially, a collection A 5 of carriers, one for each s in a set S of sorts together with a collection of operations, that is functions, among them. The index set S for the stack example above is { I N T E G E R , B O O L , S T A C K } . A n S-sorted algebra A for this sort would have three carriers A INTEGER > ^-BOOL > AN<^> A STACK > a f l d some of its operators, more precisely stated, are P U S H : A S T A C K X ^INTEGER ~* STACK T O P : A S T A C K —» A ! N T E G E R C R E A T E : —*• A$TACK Each wGf2u T is called an operation of A and is named by w. f i x , is the set of names of constants of A of sort t. For example, ft^ STACK—{CREATE}, ^ STACK .INTEGER .STACK — {PUSH}, ft STACK JNTEGER = { TOP } and 0 U , = 0 otherwise. For the characterization of the data type to be independent of representation the algebra must be initial . Before discussing what this means the notions of homomor-phism and isomorphism should be understood. 2.8 .3. Homomorph i sms and Isomorphisms Suppose there are two algebras which are rather similar in that evaluating in one gives analogous results to evaluating in the other. If in the first algebra evaluating an expression like plut(x,timea(x,y)) with variable x bound to a and y bound to 6 gives result c, then evaluating the same expression in the second algebra with the correspond-14 ing binding, x to f(a), y to f(b), should give the corresponding result f(c). It is then said that / is a homomorphism from the first algebra to the second one. The integers modulo n consists of a set of n elements {0,l,...,n-l} and an opera-tion called addition modulo n which may be described as follows. Imagine the numbers 0 through n-1 as being evenly distributed on the circumference of a circle. T o add two numbers h and k, start at h and move clockwise A; additional units around the circle. h+k wil l be the end point. For example, the integers modulo 6 can be transformed into the integers modulo 3 by the function f, , (0 1 2 3 4 5) 1 ~ ^0 1 2 0 1 2> as may be verified by comparing their tables below. + 0 1 2 3 4 5 • - + 0 1 2 0 1 2 0 0 1 2 3 4 5 Replace 0 0 1 2 0 1 2 1 1 2 3 4 . 5 0 X 1 1 2 0 1 2 0 2 2 3 4 5 0 1 by 2 2 0 1 2 0 1 3 3 4 5 0 1 2 f(x) 0 0 1 2 0 1 2 4 4 5 0 1 2 3 1 1 2 0 1 2 0 5 5 0 1 2 3 4 2 2 0 1 2 0 1 Eliminating duplicate information, for example 2+2=1 appears four separate times in the table, gives: + 0 1 2 0 0 1 2 1 1 2 0 2 2 0 1 The dictionary says that two things are isomorphic if they have the same struc-ture. Consider the two groups and G2 with their operations denoted by + and • , respectively. + 0 1 2 • e a b 0 0 1 2 e e a b 1 1 2 0 a a b e 2 2 0 1 b b e a G j and G2 are different but isomorphic. Indeed, if in c?1 0 is replaced by e, 1 by a and 15 2 by 6, then G x coincides with G2. In other words there is a one-to-one correspondence r ° 1 2 ) u i i U a V transforming Gj to G2. It is called an isomorphism from Gx to G2. So, if two groups are isomorphic, this means there is a one-to-one correspondence between them which transforms one of the groups into the other. Now if G and H are any groups, it may happen that there is a function which transforms G into H although this function is not a one-to-one correspondence. In this case, the function is a homomorphism. More exactly, if G and H are groups, a homomorphism from G to / / is a function f: G -* H such that for any two elements, a and 6 in G: f(ab) - f(a)f(b) If G and H are groups, a bijective 2 function / : G —* H with the property that for any two elements a and 6 in G f(ab) = f(a)f(b) is called an isomorphism from G to H. Now with some idea of what homomorphisms and isomorphisms are, it is possible to return to the problem of finding a unique characterization of a data type using alge-bras. 2 .6.4. W o r d A lgebras and Initial A lgebras For any algebra there is a word algebra with its elements for a given sort consisting of all the expressions in the signature denoting elements of that sort. For example, for the number signature: S = { N U M B E R } 2 A function f : A — • B is called bijective if it is both injective and surjective. That is, if each ele-ment of A has exactly one partner in B and each element in B has exactly one partner in A . 16 ^NUMBER = {ZERO,ONE} ^NUMBER .NUMBER .NUMBER — {PLUS,TIMES} f l u , = 0, otherwise the set of numbers of the word algebra would be " Z E R O " , " O N E " , "PLUS ( Z E R O , Z E R O ) " and so on. The operations build larger expressions from smaller ones. The expressions could be reverse Polish, represented by trees or whatever. The representation is not the issue because all these expression algebras wil l be isomorphic. The key idea is that for any given interpretation of the operators there is a unique value for each expression. The interpretation is just another S-algebra, A , and the evaluation of the expressions is just a unique homomorphism from the word algebra to A . A n alge-bra having this unique homomorphism property is called the initial 5-algebra. In other words, / is an initial E-algebra if and only if for any E-algebra, A , there is a unique homomorphism f:I—*A. The word algebra is initial because there is just one way of creating the given expression using the operators described. Thus " T I M E S ( Z E R O , P L U S ( Z E R O , O N E ) ) " can only be created by applying the operator T I M E S to the two expressions " Z E R O " and " P L U S ( Z E R O , O N E ) " . Initiality characterizes the isomorphism class of an object. This means it character-izes an object abstractly, that is, independent of representation or only in terms of its structure. As another example, the data type of natural numbers, N A T U R A L , can be charac-terized by the initial E-algebra: S = { N A T U R A L } : ft\,NATURAL ~ { ° } I ft NATURAL .NATURAL ~ {SUCC} H u , = 0 , otherwise The basic idea is that every natural number and all further operations of interest upon natural numbers can be expressed in terms of just two basic ones, S U C C and 0. Initiality provides the key to unique characterization. There is no committment to 17 thinking of integers as strings of decimal, or binary, or Roman characters. To know the natural numbers, is to know the operations occurring in them and how they are com-bined. Further operations are obtained by recursive definitions expressed algebraically with equations, and based on initiality. Given all of these definitions the next section describes how a specification for a data type is constructed. 2.6 .5. Def ining Specifications using Initial A lgebras In the initial algebra approach a specification is defined to be a triple < S , f i , E > , where S is the set of sorts, CI is the set of operators of the type and E is a set of equa-tions which the algebra is to satisfy. The initial algebra that satisfies only what E requires and no more is chosen. For example, the operators and equations of the stack abstraction are given below: C R E A T E : — S T A C K P U S H : S T A C K X I N T E G E R — S T A C K P O P : S T A C K -+ S T A C K U {ERROR} T O P : S T A C K — I N T E G E R U {ERROR} T O P ( PUSH( s, i ) ) = i POP( PUSH( s, i ) j| = s T O P ( C R E A T E ) = E R R O R POP( C R E A T E ) = E R R O R The set of equations specify when two expressions yield the same value. If an equa-tion contains variables, such as s and i , the equation holds whenever each variable is replaced by an expression of the correct type. For example, the third equation POP( PUSH( s, i ) ) = s specifies that POP( PUSH( C R E A T E , 1 ) ) = C R E A T E , and POP( PUSH( PUSH( C R E A T E , 1 ), 2 ) = PUSH( C R E A T E , 1 ) The equations can be used in combinations to derive all the expressions which compute a particular value forming the equivalence class representing that value. 18 Using this approach, two expressions yield distinct results unless there is some sequence of applications of the equations which shows the expressions to be equal. The definitions of some data abstractions require the use of conditional equations, that is, equations which do not hold for all possible substitutions of expressions for the variables, but which hold only for substitutions which satisfy some condition. The specification below for integer sets illustrates conditional equations. E M P T Y : — SET I N S E R T : SET X I N T E G E R — S E T R E M O V E : S E T X I N T E G E R — SET H A S : SET X I N T E G E R — B O O L E A N INSERT( INSERT( s, i ), j ) = if i = j then INSERT( s, i ) else INSERT( INSERT( s, j ), i ) R E M O V E ( INSERT( s, i ), j) = if i = j then R E M O V E ( s, j ) else INSERT( R E M O V E ( s, j ), i ) R E M O V E ( E M P T Y , j ) = E M P T Y HAS( INSERT( s, i ), j ) = if i = j then T R U E else HAS( s, j ) HAS( E M P T Y , j ) = F A L S E The second equation is a good example of a conditional equation. There are two distinct outcomes that are possible when a R E M O V E follows an I N S E R T . If the integer being R E M O V E d is the same as the one being INSERTed, then that I N S E R T has no effect. If, however, the integer being R E M O V E d is distinct from the integer INSERTed, then the R E M O V E and the I N S E R T can be permuted. 2 .8 .6 . F o r F u r t h e r Help If these brief explanations have not succeeded in their intent, the following refer-ences could be consulted for further examples and elucidation: [Meseguer&Goguen,1983], [Burstall&Goguen,1982], [Goguen,Thatcher&Wagner,1978], and [Goguen,Thatcher,Wagner&Wright,1975]. A n overview of a programming language developed on these algebraic ideas will now be given. 19 2.7. T h e P r o g r a m m i n g Language O B J In O B J [Goguen&Tardo,1979], algebras are the method for specification of data types. This is based on the notion that algebraic isomorphism captures independence of representation and the idea that initial algebras embodies it. O B J resembles more traditional languages in permitting definitions of operations and then evaluation of expressions. It differs in ways including: operations are defined in modules (objects) and have a different character in that operator symbols do not denote procedures in the usual sense, because they are defined implicitly by algebraic equations (using the initial algebra model). There is no assignment function, nor any assignable variables. There are no side effects, statements, or gotos. It is very strongly typed, sup-ports sub-types and handles coercions, overloaded operators and errors in a rigorous, sys-tematic manner. The next section presents how data types, or objects, are defined in O B J . 2.7.1. T h e Structure of O B J Objects The syntax for objects is inspired by the notation for presenting initial many sorted algebras. It is intended that an object denote a particular many sorted algebra. A n O B J object is a means for declaring an abstract data type. There are three built in objects, I N T for. integers, B O O L for booleans and ID for identifiers. User defined objects begin with keyword O B J and end with JBO. Objects are subdivided into sort, operation and equation sections. The sorts and operators together constitute a signa-ture. Each operator declaration indicates its argument sorts, its target sort and its syn-tactic form. The equation section begins with a declaration of the variables to be used, including their sorts. A skeleton of the structure of an object is: 20 O B J <name> . S O R T S <new sorts> / <old sorts> O K - O P S ... F IX-OPS ... E R R - O P S ... V A R S ... O K - E Q N S ... EQNS ... ERR-EQNS. . J B O The S O R T S line gives the sort names involved in the object. There are three different types of operators; O K - O P S are used for ordinary situations, F IX-OPS are for recovery situations and E R R - O P S are for exceptional or error situations. The next section describes how these operators may be defined. 2.7.2. Definit ion of O B J Opera tors Operators are used in expressions, where an expression is either a constant opera-tor, or an operator together with its arguments (which are also expressions). Every expression has a sort, the sort of its operator. Operators are declared in a manner com-monly used in algebra: G : INT, I N T — INT C R E A T E : — INT The syntax of the operator is determined by its form to the left of the colon. Prefix form is assumed, so G would be invoked as G(2,G(4,5)). It is possible to customize the syntax of operators using underscores to indicate where the respective arguments go. For example, if an operator were defined as, P U S H _ O N _ : INT, S T A C K — S T A C K it would be invoked as, P U S H 2 O N C R E A T E Coercions are permitted with the form: _ : Si —• 5 2 21 indicating Sx is to be a sub-sort of S 2 . Note that the ability to customize syntax is not only a syntactic issue. The coercion above means that all items, but not necessarily operations, of sort Sl are also of sort S 2 which is clearly a semantic statement. Operating overloading, that is, distinct operators with the same syntactic form, are permitted: _ + _ : B O O L , B O O L — B O O L _ + _ : INT, INT — INT Specifying the signature has now been taken care of, but what of the equations? The next section describes their form. 2.7.3. T h e F o r m of O B J Equat ions Each variable used in the equations must be-listed, following the keyword V A R S , together with its sort, as, for example: V A R S I, J : I N T ; R l , R2 : R E C O R D ; Equations are pairs of expressions of the same sort separated by an equal sign, such as (G(0,J)=J). Each equation is of some particular sort and its constituent expressions must be unambiguous. Conditional form is allowed for equations. The condition must be of sort B O O L and appears following an IF in the form: ( G(I,J) = J IF (1=0) ) The conditional equation applies only if its conditional expression evaluates to true. A n example of specification of the factorial function is: O B J F A C T O R I A L S O R T S / I N T B O O L O K - O P S F : I N T I N T E R R - O P S N E G - A R G : — INT V A R S 22 I : INT O K - E Q N S ( F(0) = 1 ) ( F(I) = I * F(I-l) IF ( I>0) ) E R R - E Q N S ( F(I) = N E G - A R G IF (I<0) ) J B O Now with some knowledge of how specifications are built in O B J , the next section describes evaluation. 2.7.4. E v a l u a t i o n in O B J O B J evaluates, that is derives the value of, expressions by using the equations as rewrite rules. To apply a rule, first find a unifier must be found. This means that the interpreter tries to find an expression for each variable symbol occurring on the left side of the rule so that when these expressions are substituted for their respective variables, the original expression results. If the expression is conditional, evaluating the condition after substituting the expressions of the unifier for the variables must yield true. When O B J finds a rule with which it can unify an expression, it replaces the expression with one it obtains from the right side of the rule by substituting the expres-sions of the unifier for their respective variable symbols. In most cases, O B J applies rules first to the innermost nested sub-expressions until no more rules apply, that is, until those sub-expressions are reduced or in what is called normal form. For example, evaluation of F(3) omitting trivial steps, proceeds as: F(3) 3 * F(2) 3 * ( 2 * F ( l ) ) 3 * ( 2 * ( 1 * F(0) ) ) 3 * ( 2 * ( 1 * 1 ) ) 3 * ( 2 * 1 ) 3 * 2 6 Evaluation is complicated by the error facility. It has been proven [Goguen,1977] the above evaluation order correctly implements specifications containing error operators, provided all E R R - E Q N S are tried before O K - E Q N S . 23 O B J evaluation computes, for any given expression, a reduced expression represent-ing its equivalence class, and this expression is taken as a representative of the entire class. For example, the reduced value of expression F(3) is the expression 6. In visualizing the connection between OBJs operational semantics and its initial algebra semantics, consider initial algebra semantics in terms of " w o r d " algebras. W i t h no equations, each word or expression, would denote a distinct value in the intended semantic domain. The equations identify those expressions which are to have equivalent meaning in the intended semantic domain. 2.7.5. O ther Features o f O B J Hidden operators are supported by O B J . A n operator may be declared hidden by placing (HIDDEN) after its syntactic definition. Hidden operators cannot be used outside the object in which they were declared, but may be used freely within it . There are several evaluation modes some of them being: P E R M U T I N G which remembers intermediate values resulting from successful rule applications to an expres-sion, and R U N - W I T H - M E M O R Y where all intermediate values for all evaluations are remembered, as though all operators were P E R M U T I N G . 2.8. Introducing a N e w Technique Now that an algebraic specification technique has been observed, along with a pro-gramming language based on this technique, Chapter 3 will evaluate this method with respect to the criteria outlined earlier. It wil l then supply an alternative specification technique - one based on recursive term algebras. 3. An Alternative Approach 3. Introduct ion This chapter discusses the problems with the initial algebra approach to specification within the context established by the criteria given in Chapter 2. Then an approach based on recursive definition will be outlined along with how this method of definition can be realized as an unambiguous context free grammar. 3.1. Eva lua t i on of the A lgebra ic A p p r o a c h The algebraic method of specification of data abstraction is, essentially, non-constructive. It characterizes a type by certain properties. In review, a data type is defined by its operations only. The operations are given as mappings satisfying certain relations. For an algebraic specification the axioms are allowed to be of a very restricted form only. For example, only those function and predicate symbols are admitted which stand for explicit operations on the data type. The existential quantifier does not occur and the axioms have the form of equations with left and right-hand sides built according to specific rules [Guttag,1975]. This method will now be evaluated with respect to the six criteria outlined in §2.5. 3.1.1. Fo rma l i t y and Construct ib i l i ty First off, there is no question as to the formality of the algebraic approach to the specification of data types. Underlying it is a well developed mathematical theory. Con-structibility is, however, an issue, a difficulty admitted by the designers of O B J [Goguen&Tardo,1979]. Problems that commonly arise are presented below. 3.1.1.1. T h e P r o b l e m of Infinite Sets o f Equat ions A well known argument against non-constructive methods is that it is very difficult 24 25 to find the characterizing set of axioms, or, in the case of algebraic axioms, the set of relations between the operations of the type. For the algebraic specification, it might not only be difficult but impossible to find a presentation of a type by operations and relations. The question is whether the method is generally applicable, that is, is it possible to find such a finite specification for an arbitrary data type! A n example adapted from [Majster,1977] to show that this is not the case is presented. The operators for the integer stack abstraction and its equations are: C R E A T E : — S T A C K P U S H : S T A C K X I N T E G E R — S T A C K P O P : S T A C K — S T A C K U {ERROR} T O P : S T A C K — I N T E G E R U { E R R O R ) T O P ( PUSH( s, i ) ) = i POP( PUSH( s, i ) ) = s TOP( C R E A T E ) = E R R O R POP( C R E A T E ) = E R R O R To allow traversal of the stack stepwise from the top to the bottom and the ability to return to the top element from an arbitrary position the following additional opera-tions are required. They perform, respectively, movement down the stack by one posi-tion, yield the contents of the stack at the current position, and position to the top of the stack. D O W N : S T A C K — S T A C K U { E R R O R } R E A D : S T A C K — I N T E G E R U {ERROR} R E T U R N : S T A C K - * S T A C K U {ERROR} Now a finite characterizing set of relations between sequences of operations must be found. Needed first are 1: POP( C R E A T E ) = E R R O R R E A D ( C R E A T E ) = E R R O R D O W N ( C R E A T E ) = E R R O R POP( D O W N ( s ) ) = E R R O R PUSH( D O W N ( s ) ) = E R R O R POP( PUSH( s, i ) ) = if PUSH( s, i ) = E R R O R then E R R O R else s 1 Assume the result of an operation applied to ERROR is ERROR. 26 R E A D ( PUSH( s, i ) ) = if PUSH( s, i ) = E R R O R then E R R O R else i These seven relations, however, are not yet sufficient, as, for example, they do not express the fact that for a stack object consisting of n elements the (n+l) fold applica-tion of D O W N would cause an error. Hence, the relation 2 (DOWN)k+\PUSH)k (ilf . . . , ik)=ERROR is needed for k=0,1,2,3..., that is, infinitely many relations. Application of R E T U R N always causes positioning to the top element of the stack. Again, infinitely many equations of the form RETURN (DO WN )m (PUSH)" (iu . . . , in)=(PUSH)n (iv . . . , • „ ) for all m > l and m < n are needed. That is, the position in the stack by first PUSHing n elements onto it, Figure 5(a), then moving D O W N the stack m positions (m<n), Figure 5(b), followed by a R E T U R N , Figure 5(c), is the same as just PUSHing the n elements. —• . ' n - m # «'<> • l « 1 «1 (a) (b) (c) Figure 5: Sequence of stack configurations in performing RETURN(DOWN)m (PUSH)" (iu . . . , i „ ) The specification is still not complete. The effect of the R E A D operation is only determined for objects of the form PUSH( s, i ), that is, stacks that have had integers PUSHed onto them. A n arbitrary stack object is, (DOWNY (PUSH) r ( i q , . . . , i T ) with q<r, hence the equations, 2 Take (PUSHY («!>•••> 'n ) to °e short hand for PUSH( • • • PUSH (CREATE,ii) • • • ! „ ) 27 READ {DO WN ) q PUSH{s ,t )=READ {POP)« PUSH{s ,i) for q>0 are required. What this says is that given an arbitrary stack, of size r, the same element is retrieved in moving D O W N the stack q times (q< r) and then perform-ing the R E A D as when those q elements are POPped off the stack and then R E A D . Three infinite sets of equations have already been produced. The problem is in determining whether any such equations have been omitted. Many other examples of specifications that require infinite sets of equations to express equivalences can be easily constructed. Infinitely many relations might not be a handicap as long as the relations could be given in a parameterized form such as A{m,n): RETURN {D OWN )m {PUSH)* {ix, . . . , in)={PUSH)n {iu . . . , i „ ) Note, however, that in the theory developed for the algebraic technique the finiteness of the set of axioms is an important assumption. 3.1.1.2. P rob lems when the Church-Rosser P r o p e r t y is Absent In the algebraic approach, as pointed out by Klaeren [Klaeren,1980], there is neither a general procedure which decides whether a certain set of algebraic axioms determines a unique operation, nor is it possible to give computation rules for the evaluation of opera-tions. A n obvious idea is to interpret the axioms as rewriting rules as done in the O B J language. This, however, can lead to problems because systems of algebraic equations do not, in general, possess the Church-Rosser property [Meseguer&Goguen,1983], meaning it may be possible to get different results if the equations are applied in different ways. It is possible, however, to give sufficient restrictions on the form of the axioms [Guttag&Horning, 1978] to make sure that they define a unique total operation which can be computed by using the axioms as rewriting rules. 28 3.1.2. Comprehensib i l i ty and Min ima l i ty Considering the problems just mentioned it appears the comprehensibility and minimality of the specification suffer greatly. In an attempt to avoid the problems of infinite sets of equations and to give sufficient restrictions to make the operations total the size of the specification grows and becomes unclear. Quite often avoiding incon-sistency results in somewhat unnatural specifications. The user becomes trapped into giving these details and attention to the problem to be solved strays. 3.1.3. R a n g e of Appl icab i l i ty and Extensibi l i ty Using the algebraic approach an uncountable domain, such as the real numbers, cannot be defined without using an uncountable set of primitive constructors. The onus is on the user not to introduce-operations that lead to the generation of infinite sets of equations. In addition, if the axioms are interpreted as rewriting rules, the form of the axioms must be restricted to make sure they define a unique total opera-tion if consistent results are expected. Overall then, the addition of a new operation to Cl may mean augmenting the set of equations considerably. Hence, extending a specification may involve much effort. 3.1.4. Conclusions of the Eva lua t ion There appear to be some difficulties in using non-constructive techniques for the specification of data types, to which the initial algebra approach, as a member of this class, is subject. In particular, the difficulties surrounding the construction procedure limits this technique to those well versed in algebra. The next section introduces an alternative approach, using recursive term algebras, for which it will be shown that these difficulties can be either reduced or completely 29 eliminated. 3.2. T h e Recurs ive T e r m A l g e b r a A p p r o a c h In order to get a non-constructive specification of a data type one usually has to use some constructive description to find the relations or axioms. Constructive methods explicitly state the appearance of objects and how the operations affect them. A con-structive algebraic method, using recursive term algebras, will now be investigated. It is based on the work of Kanda and Abrahamson [Kanda&Abrahamson,1983]. 3.2.1. T e r m A lgebras versus Initial A lgebras In the initial algebra approach, a specification is defined to be the triple <S , f t ,E> , where S is a set of sorts, Cl is the set of operators of the type, and E is the set of equa-tions which the algebra has to satisfy. To guarantee the uniqueness of the specification <S , f i ,E> the initial algebra which satisfies E as the specified data type is chosen. This specification method is based on the philosophy that carrier sets (sets obtained as the interpretation of sorts) of a data type should be induced from the equational properties of the operations. A n alternative specification method based on an alternative philosophy accepts only recursively defined sets. Recursive definition not only defines a set of terms but also basic operations over them. That is, it induces an algebra of terms. How do these methods compare? Consider the following recursive definition of the natural numbers, N A T U R A L . Basis : 0 is a N A T U R A L . Recursive Construction Rule : If n is a N A T U R A L then so is s(n). Closure : Only those values that result from finitely many applications of the recursive construction rule to the basis function values are N A T U R A L s . This definition gives a set of terms over {0,s,(,),}. The operations of N A T U R A L from the recursive definition may now be induced. Notice closure ensures if n is a 30 N A T U R A L then either it is 0 or it is s(m) for some unique N A T U R A L m. This pro-perty gives a predicate, IS-ZERO on terms such that 3 : IS-ZERO(n) = T, if n = 0 = F , otherwise The recursive construction rule shows how to compose and decompose terms. This gives rise to the following two operations on terms: SUCC(n) = s(n) PRED(n) = m, if n=s(m) = undefined, otherwise A n algebra over the terms of N A T U R A L has been obtained. Note partial operations like P R E D 4 are bound to show up, consequently partial algebras. To overcome any problems this might possibly cause, it is only necessary to agree that undefinedness propagates throughout expressions. A l l operations of term algebras can be made total by adding undefined to each sort and making undefinedness propagation explicit. For example: IS-ZERO(n) = T , if n = 0 = undefined, if n=undefined = F , otherwise SUCC(n) = s(n), if n^undefined = undefined, otherwise PRED(n) = m, if n=s(m) = undefined, otherwise Term algebras canonically obtained from recursive definition, as above, will be called recursive term algebras. Comparing this with the initial algebra specification below: S = { N A T U R A L } ft\,NATURAL — {°} ft NATURAL,NATURAL = Is} ft0 , = 0 , otherwise 3 Assume recursive term algebras have the sort BOOL as a carrier. 4 PRED is a partial operation because it is not possible to perform PRED(O). 31 E = 0 the set of natural numbers {0, s(0), s(s(0)),...} and the successor function can be given as the interpretation of N A T U R A L and t, respectively. Notice that the predecessor func-tion cannot be obtained because the initial algebra will not give any operations not already in To get P R E D in the initial algebra specification, a new initial algebra <S,n ' ,E' > must be given, as, S = { N A T U R A L } \,NATURAL ~ {0} ft' NATURAL,NATURAL = {8>P} n' „ , = 0, otherwise which includes the symbol p for the predecessor operation. This, however, is not good enough because there are now too many terms representing the same natural number. For example, a(0), a(p(a(0))), and p(s(a(0))) all denote one. So a set of equational axioms is needed which enforces different terms-denoting the same object to be equal. Since the initial algebra approach only works for total algebras, the predecessor opera-tion must be totalized with the aid of undefined elements. The set E ' of axioms that is sufficient is s( undefined ) = undefined p( 0 ) = undefined p( undefined ) = undefined ok( 0 ) = T ok( s(n) ) = ok( n ) ok( undefined ) = F IF-EQUAL( T , n, m ) = n IF-EQUAL( F, n, m ) = m IF-EQUAL( undefined, n, m ) = undefined IF-EQUALJ ok(n), p(s(n)), undefined ) = IF-EQUAL( ok(n), n, undefined ) IF-EQUAL( ok(n), s(p(s(n))), undefined ) = IF-EQUAL( ok(n), s(n), undefined ) Really, all E ' is doing is characterizing the following four properties: 1) PRED(O) is undefined 2) undefinedness propagates 3) P R E D ( SUCC(n) ) = n 4) if n ^ O then SUCC( PRED(n) ) = n 32 The initial algebra satisfying E ' interprets N A T U R A L , p and a to be the set of natural numbers, and the predecessor and successor functions respectively. In recursive term algebra specification, one need only make sure that a recursive definition of terms provides a unique representation for each element of the intended set. Then, a sufficient collection of operations follows automatically. It is well known that there are some structures for each element of which it is impossible to provide a unique representation such as a finite power set. A discussion on how to handle these structures can be found in §7 of [Kanda&Abrahamson,1983]. In contrast, in the initial algebra specification, enough operations for the data type to be specified have to be chosen. Then by equational axioms, different terms are forced to denote the same object to form an equivalence class. The equations E of the specification should not be used to define operations, but only for expressing certain properties such as commutativity, associativity and so on. The set f i should be kept as small as possible. As presented, the recursive term algebra approach might be criticized to be lacking in abstractness since the specification is dependent on term representations. The view taken here is that abstractness is finitely establishable properties of a finitely decidable collection of finitely examinable (concrete) objects. From this viewpoint, the method is sufficiently abstract. Given that it has been decided to use recursive term algebras for the specification of data types, the next sections present how this form of definition can be realized as an unambiguous context free grammar. 3.3. D a t a T y p e s as Context F ree G r a m m a r s The three basic elements of a recursive function definition are: a baaia which states that certain values are, by definition, values of the function for given arguments, a 33 recursive construction rule which tells how to determine other values of the function from known values, and an understanding (or statement) that the function takes on only those values that result from finitely many applications of the recursive construction rule to the basis function values. A formal grammar is a four-tuple G = ( N , T , P , E ) where: N is a finite set of nonterminal symbols T is a finite set of terminal symbols N and T are disjoint, N D T = 0 P is a finite set of productions E is the start symbol, E £ ( N U T ) Each production in P is an ordered pair of strings, a, 0, a = <f>Aijj in which u, <f> and i/> are possibly empty strings in ( J V l j T ) ' and A is E or a nonterminal letter. A production is usually written as: Given this, a context free grammar G = ( N , T , P , E ) is a formal grammar in which all productions are of the form / A e N U { £ } A —* OJ j I we ( ivur) ' - {X} The grammar may also contain the production E—*X, where X represents the empty string. That is, in context free grammars only one nonterminal is allowed to appear on the left hand side of the production. The right hand side of the production contains a rewrite rule that can be formed by concatenating an arbitrary number of symbols chosen from the nonterminals and terminals. If G is a formal grammar, a string of symbols in (N{JT)' is known as a sen-tential form. If a—•/? is a production of G and u>=<j>atl; and a/=^/jty are sentential forms it is said that u>' is immediately derived from w in G . This relation is indicated by writing ui—>ut'. If Wj,^, . . . , urn is a sequence of sentential forms such that 34 UJ^—*OJ2—*-<jJn ft ' 9 s a i d that un is derivable from ul. This relation is indicated by writing u1—*'u„. The sequence w 1 ,w 2 , . . . ,u}„ is called a derivation of w„ from according to G . The language L(G) generated by a formal grammar G is the set of terminal strings derivable from E : L ( G ) = { we T' | S —* « } If a>6L(G), it is said that w is a string, a sentence, or a word in the language generated by G . Given these definitions, how is recursive definition connected to context free gram-mars? The terminal productions can be viewed as providing the basis of the recursive definition whereas the nonterminal production rules may be seen as the recursive con-struction rule. Closure follows automatically from the definition of —•'. For example, the basis for the recursive definition of the natural numbers, N A T U R A L , is: Basis: 0 is a N A T U R A L . From this the terminal production rule, N A T U R A L — 0 may be built since zero is the only basis function value. The recursive construction rule for N A T U R A L is: Recursive Construction Rule: If n is a N A T U R A L then so is s(n). A nonterminal production rule describes this construction process. It is: ' N A T U R A L — s( N A T U R A L ) The grammar, G , containing these two production rules gives a set of terms over {0,s,(,)}. In other words, the language generated by G is the set of terminal strings, L(G) = { w 6 r * | E ^ ' w } 35 where T={0,s,(,)} and E = N A T U R A L . Formally speaking then, recursive definition of sets is an unambiguous context free grammar. This has a few interesting results. First, the data types are algebras over context free languages. Theoretically, an unambiguous context free grammar is important because it provides a computationally complete set of basic operations over the language it generates. This means that from the basic operations of term algebras, by iterative use of function composition, condi-tional definition and general recursive definition, it is possible to recursively define all partial computable functions over the terms recursively defined. If a decidable set is non-context free, there is no obvious way of providing a computational basis from a grammar of the set. Second, algebras generated by unambiguous context free grammars are not just many sorted algebras. Each nonterminal N of a grammar G corresponds to a sort of the algebra Alg(G) generated by G. The carrier set of Alg(G)N of this sort in Alg(G) is the language L(N) = { u> \ N-*' w }. Therefore, if G has a production rule Nl—*N2 then L{N2)CL(N.). The importance of non-context free decidable data structures should not be neglected. The next section describes how these should be handled. 3.4. Sub-Data Types In formal language theory it is well known that any decidable set 5 is a decidable subset of some context free set, CF. Also, it is easy to observe that a function f:S—*CF is computable if and only if it is the restriction of a computable function f:CF—*CF. This indicates that the computability over S is inherited from that over C F , thus the data structure S should be treated within the structure of C F . 36 3.5. A Language Based on this Approach A n alternative specification technique for data structures in which recursive definition can be realized as unambiguous context free grammars has been given. In the next chapter, Chapter 4, this idea wil l be developed into a programming language, tak-ing into account sub-data types as well. 4. The Functional Programming Language Kaviar 4. Introduct ion The functional programming language that has been developed with context free grammars as data types has been named Kaviar. Backus-Naur Form, abbreviated as B N F , is a well know meta-language for specify-ing the concrete syntax of a programming language. The intent of this chapter is, to present the B N F of Kaviar and explain the operation of the interpreter. 4.1. Lex ica l Convent ions There are five classes of tokens: identifiers, keywords, constants, strings, and other separators. Each class is discussed below. 4.1.1. Identifiers A n identifier is a sequence of lower case letters and digits; the first character must be a letter. The dash - counts as a letter. 4.1.2. K e y w o r d s The following identifiers are reserved for use as keywords, and may not be used otherwise: type sub-type is-default Other identifiers that are reserved, because they are the names of predefined func-tions which may not be redefined are: read le add eq gt subtract ne ge multiply It divide 37 38 4.1.3. Constants There are only two kinds of constants available to the user: the integer constants of the predefined type Nat, and the two constants of the predefined type Bool, T and F, for true and false. 4.1.4. Str ings A string is a sequence of characters surrounded by double quotes, as in 4.1.5. Separators Blanks, tabs, newlines and comments, collectively called white space, are ignored except as they serve to separate tokens. Some white space is required to separate other-wise adjacent identifiers, keywords and constants. The characters / * introduce a comment, which terminates with the characters * / . Comments may be nested and may appear anywhere in a Kaviar program. 4.2. T h e B N F of K a v i a r The following sections present the B N F of Kaviar in understandable chunks along with examples and an informal explanation. 4.2.1. A K a v i a r P r o g r a m A Kaviar program is a sequence of type definitions, followed by a sequence of func-tion definitions. <Program > ::= <Definitions> { <Invocation_Exprs> } <Definitions> ::= <Type_Defns> <Function_Defns> The type definitions precede function definitions only to enhance the speed of the type checking capabilities of the interpreter. 39 Once all the definitions have been completed, the invocation expressions follow, enclosed within braces. The next section describes the definition of types. 4.2.2. T y p e Definit ions Context free data types as well as sub-types, which may be non-context free, are accomodated by Kaviar. <Type_Defns> ::= <Context_Free_Types> <Sub_Data_Types> The context free types are defined by a context free grammar, while sub-types are defined by predicates. Since sub-types are, essentially, restrictions on context free types, they follow the definition of these types to simplify ensuring the predicate operates over some previously defined context free type. Considered first wil l be the definition of context free types. 4.2.2.1. Context F ree D a t a T y p e s Each context free type definition can be viewed as defining only one of the nonter-minals N,;, l<i<k, A/={AT 1 ,7V 2 , . . . , Nk } of a grammar, G . Recall that there is no restriction on the number of alternatives a single nonterminal may have. That is, a non-terminal Nj may appear on the left hand side of arbitrarily many production rules. In Kaviar , each alternative is further named by a parser. This facilitates, when given a terminal string CJ, derived from a nonterminal { u> | JV,-—w } finding which alternative of TV, u was derived from. The parser names for each alternative of iV, must be distinct so that the alternative used in deriving w can be uniquely determined. As an example, consider the familiar LISP symbolic expression, or S-expression. Its recursive definition is: 40 Basis: A n atom is an S-expression. Recursive Construction Rule: If a x and s2 are S-expressions, so is (si.»2)-Closure: Only those values that result from finitely many applications of the recursive construction rule to the basis values are S-expressions. A n equivalent context free grammar over the atoms x and y is: s-expr —* atom s-expr —• ( s-expr . s-expr ) atom —• x atom —* y As just described, each alternative of a given nonterminal, N,;, is further named by a parser. A nonterminal symbol will begin with <, followed by the nonterminal name, TV,-, a colon, the parser name and then the closing >. Terminal symbols are strings and so wil l be enclosed within double quotes. Then, the grammar for S-expressions becomes: <s-expr:atom> - * <atom:a> <s-expr:list> — " ( " <s-expr:car> " . " <s-expr:cdr> " ) " <atom:x> - * " x " <atom:y> — " y " Since this form is lengthy, it is abbreviated by type s-expr { <a tom> —• <atom:a>; < l i s t> -> " ( " <sexpr:car> " . " <s-expr:cdr> " ) " type atom { < x > -+ " x " ; < y > — " y " } and this is exactly the form the B N F for context free types in Kaviar generates. The interpreter converts this abbreviated form to the lengthier form. The B N F for a context free type, then, is: < Context_Free_Types > ::= <Context_Free_Type> | < Context_Free_Type > < Context_Free_Types > <Context_Free_Type> ::= type <Type_Name> { <Alternat ive> } <Type_Name> ::= <Identifier> <Alternat ive> ::= <Production> | 41 < Production > ; < Alternative > <Product ion> ::= < <Parser> > —• <Rewrite_Rule> <Rewrite_Rule> ::= <Nul l_Rule> | <Non_Nul l_Rule> < N u l l _ R u l e > ::= X <Non_Nul l_Rule> ::= <Terminal> | < Nonterminal > | < Terminal> <Non_Nul l_Rule> | < N o n t e r m i n a l <Non_Nul l_Rule> <Terminal > ::= <Str ing> <Nonterminal> ::= < <Type_Name> : <Selector> > <Selector> ::= <Identifier> This allows for an arbitrary number of context free data types. Notice that nonterminal symbols on the right hand side of the production rule are composed of two parts: a type name and a selector. The type name must be the name of a context free type, already defined or to be defined by <Context_Free_Type>, or one of the two predefined types, Nat or Bool. The selector is an identifier. Its use wil l become clear when recursive functions are discussed. Selector names for all the nonter-minals appearing on the right hand side of a given alternative must be distinct. Sub-type definitions follow context free type definitions. The next section describes these types. 4.2.2.2. Sub-Data Types Suppose A/, is a context free type and P( x : N{ ) : Bool is a defined total predicate. By R = P(7V.) a decidable subset { x | x 6 L(A/,) D P(x) = T } 42 of L(iV,) can be specified. Then R is called a sub-type of JVj-. That is, a sub-type res-tricts a context free type Nt- to only those values which satisfy the predicate P. These sub-types can be non-context free. In Kaviar , a sub-type is specified as: <Sub_Data_Types> ::= <Sub_Data_Type> | <Sub_Data_Type> <Sub_Data_Types> <Sub_Data_Type> ::= sub-type <Type_Name> ( <Context_Free_Type_Name> ) = < Predicate > ; <Context_Free_Type_Name> ::= <Identifier> This allows for any number of sub-types the user wishes to define. As mentioned already, the sub-type is a decidable subset of a context free type hence <Context_Free_Type_Name> must be the name of one of context free types. The B N F of the predicate that must be satisfied is given by: <Predicate> ::= ( <Predicate> ) | -> < Predicate > | < Function_Name > | ( < Predicate> ) fl < Predicate> | ( < Predicate> ) U < Predicate> | -> <Predicate> n <Predicate> | -• < Predicate > U < Predicate > | <Function_Name> f~l < Predicate> | <Function_Name> U < Predicate> <Function_Name> ::= <Identifier> That is, the predicate is an arbitrary logical formula. The functions referenced in the formula are restricted to be unary over a context free data type, being the same type the sub-type is defined over. In addition, these functions must have as their result type the predefined type Bool. This is so that the predicate itself can evaluate to true or false. Consider the following context free type number defined as being a natural number: type number { < n > < N a t : n > 43 } and suppose the subset to be specified is { x | x 6 L(number).n x < 5 = T} that is, that is x is a number, but is also less than five. This sub-type may be specified as: sub-type lt-five{ number ) = less5; So the functionality of lea»5 must be: less5( x : number ) : Bool Now that the construction of types has been dealt with, a discussion of how to define functions that operate over these types appears in the next sections. 4.2.3. Func t ion Definit ions Function definitions in Kaviar have the form: <Function_Defns> ::= <Fcn_Defn> | <Fcn_Defn> <Function_Defns> <Fcn_Defn> ::= <By_Recursion> | < B y _ U nific at ion > Functions may be defined by one of two methods: by recursion, or by unification. Each method will be discussed in turn, accompanied by an example. 4.2.3.1. Recurs ive Funct ions The B N F of recursive functions is: <By_Recursion> ::= <Function_Name> ( <Typed_Variable_List> ) : <Result_Type> { <Case_Stmts> } <Typed_Variable_List> ::= <Identifier> : <Type_Name> | <Identifier> : <Type_Name> , <Typed_Variable_List> <Result_Type> ::= <Identifier> <Case_Stmts> ::= <Case_Stmt> | <Case_Stmt> ; <Case_Stmts> 44 <Case_Stmt> ::= is- <Parser> ( <Identifier> ) —• <Recursive_Exprs> <Parser> ::= <Identifier> <Recursive_Exprs> ::= <Recursive_Expr> | <Recursive_Expr> <Recursive_Exprs> <Recursive_Expr> ::= < Number> | < String > | < Identifier > | <Truth_VaIue> | < Selector > < < Identifier > > | <Function_Name> ( <Rec_Par_List> ) < Rec_Par_List > ::= <Recursive_Exprs> | <Recursive_Exprs> , <Rec_Par_List> Suppose a recursive function was to be defined which takes an arbitrary S-expression and reverses it. The functionality of the function would be: reverse( s : s-expr ) : s-expr Ideally, a <Case_Stmt> should be given for each alternative of the type of the identifier in the parameter list to take care of any derivation of that type. The type a-expr was defined as: type s-expr { < a t o m > —• <atom:a>; < l i s t> — " ( " <s-expr:car> " . " <s-expr:cdr> " ) " } which has two alternatives. If the value of * is an atom, reversing an atom yields itself so the <Case_Stmt> would be: is-atom( s ) —• s If the value of a, however, is a list, reversing a list consists of concatenating the reverse of the tail end of the list, given by the derivation of the nonterminal <s-expr:cdr>, with the reverse of the head of the list, given by the derivation of the non-terminal <s-expr:car>. Hence, the <Case_Stmt> would look like: is-list( s ) " ( " reverse( c d r < s > ) " . " reverse( car<s> ) " ) " 45 where the selectors car and cdr select the values corresponding to the derivations of their nonterminals. The operation of selectors will be discussed in more detail in a fol-lowing section. So the complete function is: reverse( s : s-expr ) : s-expr { is-atom(s) —• s; is-list(s) — " ( " reverse^ c d r < s > ) " . " reverse( car<s> ) " ) " } If the function defined has more than one parameter a <Case_Stmt> which operates over any of the identifiers could be given, but only the <Recursive_Exprs> of the first applicable parser found wil l be evaluated. The notion of what an applicable parser is will be discussed in a later section. Of course, each function defined must be. given a unique name. Kaviar permits no operator, or function, overloading. The identifiers present in the <Typed_Variable_List> are the only ones that may appear within the function. The type of a given identifier in this list may be a context free type name, a sub-type name or one of the predefined type names. Likewise for the result type of the function. Before entry to the function each parameter is checked to ensure it is of the correct type. If it is not an error will report the value at fault. If the type of an identifier is a context free type, AT,, the interpreter can determine whether the value, w £ L(N,-) by running Earley's algorithm on it . A digression will now be taken to explain the operation of Earley's algorithm. 4.2.3.2. T h e P a r s i n g M e t h o d of E a r l e y Earley's algorithm, developed in the early seventies, [Earley,1970], [Aho&UIlman,1972], [Graham&Harrison,1976], provides a means of determining whether a terminal string w can be derived by some context free grammar. That is, whether w6L(G). 46 4.2.3.2.1. Conventions The following conventions will be used to represent various symbols and strings concerned with a grammar while discussing Earley's algorithm. (1) The lowercase letters a,b,c and d wil l represent terminal symbols, as will the digits 0,1,...,9. (2) The capital letters A,B,C and D represent nonterminal symbols. (3) U,V,...,Z represent either nonterminal or terminal symbols. (4) The Greek letters ot,ft,... represent strings of nonterminal and terminal sym-bols. X denotes the null string, and E represents the start symbol. (5) u,v,...,z represent strings of terminal symbols only. Subscripts and superscripts do not change these conventions. 4.2.3.2.2. Terminology Let G = ( N , T , P , E ) be a context free grammar and let v—a^a^ • • • a„, n>0, l < y < n , Ay 6E be the input string. A n item of the form: [A —*XlX2 • • Xk •X/l+1 • • • Xm , i ] represents: (1) A production A —*Xt • • • Xm in P such that a portion of the input string which is derived from its right side is currently being scanned. (2) A point in that production called the metasymbol, denoted by the symbol • not in N or E , which shows how much of the production's right side has been recognized so far. The integer k can be any number ranging from zero, in which case • is the first symbol, through m+1, in which case it is the last. A production of the form A — w f i wil l be called a dotted rule. (3) A pointer i, 0<i<n, back to the position in the input at which that instance of the production was first looked for. 47 For each integer j, 0<j<n, a set of items 7y wil l be constructed such that [A-*a»/9,i] is in 7y for 0<i<j if and only if for some 7 and 6, E — * • ' 6 , 7—*' a x • • • a, and a—*' a , + 1 • • • a ; . In other words, if the dotted rule A—*a»f) is in /y then work is proceeding on a potentially valid parse in that it is known that there is a sentential form 7A6 where 7—*' a 1^2 ' " ' ai • Furthermore, A—nyfl indicates that a—•* a , + 1 • • • ay . Nothing yet is known about 0 or 6. That is, A — c o u l d be used in some input sequence that is consistent with OJ up to position The number of distinct items in 7y is determined by the grammar and the input string. The sequence of lists 10, I\, . . . , /„ will be called the parse lists for the input string OJ. 4.2.3.2.3. Informal Explanation of Earley's Algorithm Earley's algorithm works on any context free grammar. The grammar need not be in any normal form. The function of the algorithm is to form the parse lists. The operation of the algorithm is as follows. A n input string a x • • • a„ is scanned from left to right. As each a ; is scanned, a set of items for 7; is constructed which represents the condition of the recognition process at that point in the scan. In general, parse list 7y is operated on by processing the items, in order, performing one of three operations depending on the form of the item. The predictor operation is applicable when the form of the item is [A—•a«B /S,i], that is, when there is a nonterminal to the right of the metasymbol. It causes the addi-tion of a new item of the form [B—••7,j] to /y for each alternative of that nonterminal, here B. The metasymbol is placed at the beginning of the production in each new item since none of its symbols has yet been scanned. The pointer is set to / since the item was created in 7y. Thus the predictor adds to 7y all productions which might generate substrings beginning at a y + j . In other words, it indicates which rules might possibly generate the 48 next portion of the input since the dotted rules introduced by the predictor represent rules that could be used in the derivation since their left hand sides occur in valid sen-tential forms. The scanner serves to update those dotted rules (or partial subtrees) whose expected next element is the next input symbol. In other words, the scanner is applica-ble just in the case the item has the form [B—•a«a /#,i] that is, when there is a terminal to the right of the metasymbol. It compares that symbol with a ; and if they match the metasymbol is moved over one in the item, as in [B—•aa«/?,i], to indicate that that termi-nal symbol has been scanned and then adds the item to Ij. The pointer remains the same because the item is not being created, only updated. The third operation, the comple ter , has as its role to update those productions whose expected next element is a nonterminal which generates a suffix of the input read thus far. That is, the completer is applicable to an item if its metasymbol is at the end of its production, as in [A—*a«,i]. It goes to the parse list /, indicated by the pointer and looks for all items from it that have the nonterminal on the left hand side, here A , to the right of the metasymbol as occurs in [B—*a»A^,k]. It moves the metasymbol over the nonterminal in these items [B—•aA«/?,k] and adds them to / ; . /, can be thought of as the parse list being operated on when A was being looked for. It has now been found, so for all the items in /, which caused search for an A the metasymbol must be moved over it in them to show that it has been successfully scanned. Earley's algorithm is, in effect, a top down parser in which all the parse lists gen-erated represent possible parses carried along simultaneously. 4.2.3.2.4. Ear ley 's Pars ing A l g o r i t h m Input: A context free grammar G = ( N , T , P , S ) and an input string w = a 1 a 2 • • • a„ in 49 Output: The parse lists 7 0 , I\, ...,/„ . Method: First, construct 7 0 as follows: 1 [1] If S—>a is a production in P, add [ S - * « a , 0 ] to 7 0 . Now perform steps [2] and [3] until no new items can be added to 7 0 . [2] If [B—7«,0] is in 7 0 add [A-*aB«/?,0] for all [A—*a«B^,0] in 7„. [3] Suppose that (A—*a»B$,0] is an item in 7 0 . A d d to 7 0 , for all productions in P of the form B—•7, the item [B—••7,0]. Now construct 7 ; , having constructed 7 0 , Iv . . . , 7 ;_j. [4] S c a n n e r . For each [B—*a»a^,i] in 7 ;_ x such that a = a ; , add [B—»aa»#,i] to Ij . Now perform steps [5] and [6] until no new items can be added. [5] C o m p l e t e r . Let [A-*a« , i ] be an item in 7 ; . Examine 7, for items of the form [B—*aeAft,k]. For each one found, add [B- *aA«^ ,k] to 7 ; . [6] P r e d i c t o r . Let [A—•a«B#,i] be an item in 7 ; . For all B—»7 in P, add [B->«7 , j ] to Ij . The algorithm, then, is to construct Ij for 0<j<n. \ Since the number of parse lists constructed is of bounded size, it is easily shown that Earley's algorithm terminates. . If the underlying grammar is unambiguous, then this algorithm can be exe-cuted in 0 ( n 2 ) reasonably defined elementary operations when the input is of length n. In all cases, it can be executed in 0 ( n 3 ) reasonably defined elementary opera-tions when the input is of length n. For proofs of these and all other complexity 1 Before adding an item to a parse list it is important to ensure that it is not a duplicate since this could lead to unbounded growth of the list. 50 figures regarding Earley's algorithm see [Aho&Ullman,1972]. 4.2.3.2.5. E x a m p l e of the Operat ion o f Ea r l ey ' s A l g o r i t h m Consider the following grammar G for generating symbolic expressions with the productions numbered as shown: 1 : S - * A • 2: S — ( S . S ) 3: A — x 4 : A ^ y and let u; = ( z . y ) be the input string. Since E = S , the items [ S - » « X , 0 ] and (1) [S^ . (S .S) ,0] (2) are the first to be added to / 0 . The predictor is applicable to the first of these items. Operating on (1) it produces [A —+»z ,0] and . [ A - > . y , 0 ] / 0 is now complete so move on to construct Iv The scanner is applicable to (2) since the terminal to the right of the metasymbol matches a 1 =( . The item [S- ( .S .S ) ,0 ] is added to 7X. Notice that the metasymbol has been moved over one in the item to indicate that the symbol has been scanned. The predictor now causes '[S^+mA ,1} and (3) [S—(S.S),1] to be added. The predictor is once again applicable to (3) and adds [A —*»z ,1] and y,i] to Zj No further items can be added to / j . In constructing 72 note that a 2 = z and so the scanner adds 51 to 7 2 . The completer is applicable to this state because the metasymbol is at the end of its production. It goes back to the parse list indicated by the pointer, in this case Iv and looks for all states from 7j which have A to the right of the metasym-bol. The only item meeting these specifications is [S—••A,l] in Ix and so: [S^A .,1] (4) is added to 7 2 moving the metasymbol over A in the item. Considering (4) causes the completer to reexamine Iv this time searching for items with the metasymbol preceding S in them. It can add one more item to 7 2 [ S - ( S . . S ) , 1 ] This completes 7 2 . A complete run of the algorithm on grammar G is given in Table I: Table I: Parse Lists for G with w=(x.v) / i /? [S—•A.O) [S—•(S.S),0] [A—»x,0] [ A — y , 0 S—(•S.S),0] S- * «A,1] S—(S.S),1] A—«x, l ] A ^ . v . l l A - * x » , l ] S->A«,1] S—(S«.S),0] / i U 7, [S—(S..S),0] S—«(S.S),3] A-+«x,3] A - « y , 3 l A—y. ,3] S ^ A « , 3 S-^(S.S«),0] [S-(S.S).,0] It is possible to determine membership in the language by inspecting the items of the last parse list. If no item of the form [E—•a»,0] is in 7 n , then w is not in L(G). 4.2.3.2.8. Modif icat ions Required to Ear ley 's Pars ing A l g o r i t h m Recall that in Kaviar all production rules are named by parsers. Their form <A:p , > - 0 52 Also, all nonterminals in $ have selector names and have the form < B : « y >. Because of this, Earley's algorithm requires some adjustments. The algorithm that works is given below. Input: A context free grammar G = ( N , T , P , E ) and an input string OJ— a ! a 2 • • • a„ in E ' . Output: The parse lists 7 0 , Iv . . . , I„ . Method: First, construct 7 0 as follows: [1] If <E:p, >-+a is a production in P , add [<E:p, >—••a,0] to 7 0 . Now perform steps [2] and [3] unti l no new items can be added to 7 0 . [2] If [<B:p,;>—• i;0] is in 7 0 add [<A:p y > — a<B:sk >«/? ,0] for all [<A:p, >->of<B:sk > £ , 0 ] in 7 0 . . [3] Suppose that [<A:p, > — • a « < B : « y >/?,0] is an item in 7 0 . A d d to 7 0, for all productions in P of the form <B:pt >—•T, the item [<B:pt >—••'7,0]. Now construct 7 ; , having constructed 7 0 , Iu . . . , lj_x. [4] Scanner . For each [<B:p, >- *a»a / ? , i ] in 7 ;_ x such that a = a ; , add [<B:p, >—aa«/9,i] to 7 ; . Now perform steps [5] and [6] until no new items can be added. [5] C o m p l e t e r . Let [<A:p,->—»a», i ] be an item in 7y . Examine 7, for items of the form [<B:p ; >—+<*• <A:pk >/?,k]. For each one found, add [<B:p y >^a<A:pk > « £ , k ] to 7y . [6] P r e d i c t o r . Let [<A:p,- > — • a « < B : p y >ft,\] be an item in 7y. For all < B : p t in P, add [<B:p. t > -^«7 , j ] to 7y . It is now known how the interpreter determines whether an input string is gen-erated by some context free grammar but what if it is a sub-type? The next section describes this process. 53 4.2.3.2.7. T y p e Check ing Sub-Data T y p e s If the type of the identifier in the <Typed_Variable_List> is a sub-type, S,, the interpreter can determine whether the value is of the correct type by first running Earley's algorithm with E being the name of the <Context_Free_Type_Name> the sub-type is defined over. Then, the functions in the <Predicate> are interpreted resulting in some logical formula. If this resulting logical formula evaluates to true the type of the value is indeed 5 , . 4.2.3.2.8. Interpreting a Recurs ive Func t ion If all the parameters of a function are of the correct type, the interpreter proceeds to determine the <Case_Stmt> to execute. Suppose the function reverse is invoked as: reverse( " ( " " x " " . " " y " " ) " ) v The terminal string " ( " " x " " . " " y " " ) " 6 L(s-expr) so for the duration of the function the identifier s has this value. The parsers in a given <Alternative> must be valid for the type of the identifier it operates over. For example, the type of s is s-expr so the only parsers that may operate on s are those that appeared in the definition in the type s-expr. Looking back at the definition of s-expr the available parsers are found to be atom and list. If the type of the identifier is a sub-type, the applicable parsers are those of the context free type the sub-type is defined over. For example, if an identifier had type It-five which is defined over the context free type number, the only applicable parser would be n. If the type of the identifier is the predefined type Bool, the user has two parsers available for use. They are true and false just as if the type Bool had been defined as: type Bool { 54 <true> —• T ; <false> - » F } No parsers are provided for type Nat because there are an infinite number of them. The interpreter steps through the <Case_Stmts> of the function one by one until the applicable one is found. How is a <Case_Stmt> determined applicable? First, the <Parser> of the <Case_Stmt> is extracted, call it parser. Then the alternative with this parser name is extracted from the type definition. This might be <type:parser> —* $ Then, if an item of the form [<type:parser>—•/?•,()] is found on the last item list, / „ , for the value of the identifier the parser is operating over, then the alternative <type:parser>—*fi was the one the value was derived from. This <Case_Stmt> is then termed applicable. The interpreter would then go on to execute the <Recursive_Exprs> of this <Case_Stmt>. Otherwise, the interpreter would move on to the next <Case_Stmt> to determine if it is applicable. If the interpreter exhausts all of the <Case_Stmts> without finding one to be applicable an error will result. Supposing an applicable parser is found, the next section describes how its <Recursive_Exprs> are evaluated by the interpreter. 4.2.3.2.9. Interpret ing Recurs ive Expressions There are six types of recursive expressions. The expressions occurring are evaluated in sequence and the results concatenated together. The final result must be of type <Resul t_Type> . Numbers of type Nat, truth values of type Bool and strings require no interpretation. The interpretation of an identifier is the value assigned to it on entry to the function. Selectors must operate on the same identifier as the parser, and can only be the ones present in the alternative of the parser. For example, the alternative list of type 55 s-expr is: < l i s t > " ( " <s-expr:car> " . " <s-expr:cdr> " ) " On entry to reverse, the value of * was set to " ( " " x " " . " " y " " ) " . The interpreter can determine that this value was derived from the alternative < l i s t > , so the only selectors that may be active are car and cdr since they are the ones that appear in the definition of the alternative. Recall, each nonterminal on the right hand side of the production rule has the form: < <Type_Name> : < Selector > > The <Selector> may be used as a function that operates over the value of the identifier the parser was applicable to. The effect of a selector, as used in the following extract from the function reverse is-list( s ) — " ( " reverse( c d r < s > ) "."reverse( ca r<s> ) " ) " is to select, from the value of the identifier, the subtree corresponding to the derivation of the nonterminal. The type of the subtree wil l , of course, be <Type_Name> . For example, the derivation tree of the value of * is: <s-expr:list> J " ( " <s-expr:car> " . " <s-expr:cdr> " ) " i i <atom:a> <atom:a> I i UX>) liyl) The selector returns the value of the subtree. So car<s> — "a :"and cdr<s> = "y". If the expression is a function invocation, its parameters are first evaluated and then the function is called. The result of the function is then concatenated to the par-tial result. When all of the expressions on the right hand side of the arrow have been evaluated the resulting terminal string is checked to see if it is of type <Result_Type> and if so, it is returned, otherwise an error is reported. 56 The evaluation of reverse( "( ' ' " x " " • " " y " " ) " ) would then proceed as follows. The parser ia-liat(a) would be found applicable. Its expressions are: " ( " reverse( c d r < s > ) " . " reverse( ca r<s> ) " ) " The string " ( " becomes the partial result. The parameter to revcrae, cdr<a>, is evaluated, yielding " y " , so reverse( " y " ) is invoked. It returns " y " as a result. This result is concatenated onto the partial result, the partial result now becoming " ( " " y " . Continuing in this fashion the final result is the terminal string " ( " " y " " • " " x " " ) " , which is of type a-expr. Now that the operation of recursive functions is understood, the next sections describe the second type of function definition - by unification. 4.2.3.3. Unification Functions Functions may, alternatively, be defined by unification. The B N F for these types of functions is: <By_Unif icat ion> ::= <Function_Name> ( <Type_Name> ) : <Result_Type> { <Unif_Clauses> } <Unif_Clauses> ::= <Clause> | default - * <Unif_Exprs> < Clause> ::= <Schema> —* <Unif_Exprs> | <Schema> —• <Unif_Exprs> ; <Clause> | <Schema> —* <Unif_Exprs> ; default —*• <Unif_Exprs> <Uni f_Exprs> ::= <Uni f_Expr> | <Uni f_Expr> <Unif_Exprs> <Uni f_Expr> ::= < Number> | <Str ing> | < Identifier > | <Truth_Value> | <Function_Name> ( <Unif_Par_Lis t> ) <Uni f_Par_Lis t> ::= <Unif_Exprs> | <Unif_Exprs> , <Unif_Par_Lis t> 57 <Schema> ::= <Str ing> | < Identifier> | < String > < Schema > | < Identifier> <Schema> Unification can be intuitively looked at as being pattern matching. In Kaviar, unification is only one way meaning variables may only occur in the <Schema> and not in what it is being matched against. A l l unification functions are unary since otherwise it would be impossible to tell what the <Schema> of a given <Clause> should be unified against. The next section describes how to build a <Schema> so that the interpreter wil l be able to successfully unify expressions. 4.2.3.4. F ind ing a Unif ier Finding a unifier consists of finding an expression for each identifier symbol occur-ring in the <Schema> so that when these expressions are substituted for their respec-tive identifiers the value of the original parameter results. To do this successfully, the <Schema> must mimic the form of the alternatives of the type. Ideally, there should be a <Clause> for each alternative of the type so that all possible derivations of the type may be handled. The <Schema> is built by specifying a string for every terminal symbol in the <Rewrite_Rule> and an identifier for every nonterminal symbol. The definition of S-expressions was given as: type s-expr { < a t o m > —• <atom:a>; < l i s t> — " ( " <s-expr:car> " . " <s-expr:cdr> " ) " } A unification function which reverses arbitrary S-expressions would have the func-tionality: reverse( s-expr ) : s-expr 58 so the schemas of the clauses should mimic the alternatives of s-expr since this is the type of the parameter. The right hand side of the alternative <a tom> has only one nonterminal, so the schema would consist of only one identifier. Reversing an atom yields itself so the < Clause > would be: atom —• atom The right hand side of the second alternative, < l i s t > , contains terminal and non-terminal symbols. Working from left to right, simply copy terminal symbols and assign nonterminal symbols distinct identifiers. So for < l i s t> the <Schema > would be: I a \ • 82 ) Reversing a list requires concatenating the reverse of the tail end of the list with the head of the list. Thus the complete function would be: reverse( s-expr ) : s-expr { atom —• atom; " ( " « i " • " » 2 " ) " — " ( " reverse( » 2 ) " . " reverse( » t ) " ) " } The interpreter finds a unifier by going through the <Unif_Clauses> of the func-tion one by one trying to find which <Schema> the value of the parameter can be unified against. The alternative the parameter value was derived from is what the <Schema> is compared to. For example, the alternative the terminal string " ( " " x " " . " " y " " ) " was derived from is: < l i s t > —• " ( " <s-expr:car> " . " <s-expr:cdr> " ) " Identifiers can be unified against nonterminal symbols and terminal strings must match exactly. If unification can take place, the interpreter proceeds to evaluate the <Unif_Exprs> to the right of the arrow, binding any identifiers in the <Schema> to 59 the value of the subtree the nonterminal they stand for derives. It makes sense that the only identifiers that can appear in the <Unif_Exprs> are those that appeared in the <Schema>. If the reserved word default is found instead of a <Schema> the <Unif_Exprs> to the right of the arrow wil l be evaluated immediately. If no <Schema> can be unified against the interpreter wil l indicate failure. Assuming a unifier is found, the interpreter goes on to evaluate the unification expressions. The next section describes this. 4.2.3.5. Eva lua t ing Uni f icat ion Expressions The unification expressions are evaluated in the exact same manner as recursive expressions, except for the fact that the selector functions cannot be used. Once all the types and functions are defined there must be some way of using all these definitions. The next section describes the invocation expressions. 4.2.4. Invocation Expressions The B N F of the invocation expressions is: <Invocation_Exprs> ::= <Invoc_Expr> | <Invoc_Expr> ; <Invocation_Exprs> <Invoc_Expr> ::= N l | < String > | < Identifier > | read ( <Typed_Variable_List> ) | <Function_Name> ( <Parameter_List> ) < Parameter_List > ::= <Parameters > | < Parameters > , <Parameter_List> < Parameters > ::= <Parameter> | <Parameter> <Parameters> <Parameter> ::= <Str ing> | < Number > | < Identifier > | 60 <Truth_Value> | <Function_Name> ( <Parameter_List> ) | Input may be prepared for a Kaviar program by placing sentential forms in a data file, each one terminated by the character The values may be read from the data file by issuing a readf z t : t x xn : tn ) invocation expression. The value read wil l be associated with the identifier x, if its type is indeed t,. For example, the data file for the read statement: read( a j : s-expr, a2 • s-expr ) might look like: "("• " x " " " " y " " y " .. When a string is given as an invocation expression it will simply be written to stan-dard output. Similarly, when an identifier is given the value associated with it by some read statement wil l be written to standard output. When M is encountered a newline wi l l be written to standard output. As in interpreting recursive and unification expressions, parameters to a function are first evaluated, if necessary, and then the function is invoked. Upon return, the result of the function wil l be written to standard output. 4.2.5. Built in Functions There are several built in functions available to the user. The basic arithmetic functions all have two parameters of type Nat with result type Nat. They are: add( x, y ) / * x + y * / subtract( x, y) / * x - y * / mutiply( x, y ) / * x * y * / divide( x, y ) Divison is integer division in which the fractional part is truncated. The expression divide(x,y) produces the remainder when x is divided by y, and thus is zero when y divides x exactly. 61 The other built in functions take two arguments of type Nat and produce a Bool result. They are: 4.3. H o w to R u n a K a v i a r P r o g r a m A Kaviar source program is expected to come from standard input. A l l output is directed to standard output. There are three options available to the user. The -q option removes double quotes from values the invocation expressions write to standard output. This makes the output look tidier and would be used when the user is sure the program is working correctly. The -a option.stops sub-type checking. In normal circumstances the interpreter verifies that something is a sub-type by evaluating a predicate. When this option is given it will bypass evaluating the predicate. Once a program is fully debugged this option might be used to enhance the speed of the interpreter. A n d finally there is the -d option. If the Kaviar program has any read statements in it the interpreter expects that the user has prepared a data file. The name of this file is to be given immediately after the d as in : 4.4. Implementat ion Restrict ions and Details The following are restrictions which have been arbitrarily set. To change these lim-its, the constants need only be changed and the interpreter recompiled. No more than the first S I G _ C H A R S (12) characters of an identifier is significant, although more may be used. eq(x,y) ne(x,y) le(x,y) lt(x,y) ge(x,y) gt(x,y) * x = y * x = ^ y * x < y * x < y * x > y * x > y dname 62 A t most M A X _ C F _ T Y P E S (50) and M A X _ S U B _ T Y P E S (50), context free and sub-types, respectively, may be defined in any one source program. Also, at most M A X _ F C N (50) functions by either method may be defined. The maximum length of a string is M A X _ S T R _ L E N (256) counting the opening and closing quotes. There is currently no way of including double quotes within a string. Also, the character % should not be used in strings since the Unix routine print/ is used for output and the % introduces a conversion specification. The interpreter does no checking for overflow in evaluating the built in functions. Behaviour of the interpreter in these cases is system dependent. Although the symbol - * has been used throughout the B N F for an arrow, the implementation recognizes an arrow as the concatenation of the characters = and >. Similarly, the symbol U and D have been used- for logical or and and, respectively, but the implementation recognizes the characters | and & for these connectives. 4.5. Evaluating this Approach Having introduced an alternative method for data abstraction, along with a pro-gramming language using this method, it is time to evaluate it. Chapter 5 does tthis evaluation and provides a comparison with the initial algebra approach. 5. Evaluation of the Recursive Term Algebra Approach 5. Introduct ion As promised, this chapter will evaluate^ the recursive term algebra approach for specifying data types. It will follow the criteria given in §2.5 and show its superiority over the initial algebra approach by means of examples. 5.1. F o r m a l i t y and Construct ib i l i ty The formality of the recursive term algebra approach for specifying data types can not be disputed. Constructing specifications is a straightforward task. One need only translate a recursive definition to an unambiguous context free grammar by means described in §3.3. Once again, consider the integer stack abstraction. In Kaviar, this might be defined as: type stack { <error> - * "error"; <empty> —* "empty" ; <not-empty> —*• <list :data> } The alternative <error> is introduced for error handling. A non-empty stack's contents is described by the type list: type list { < i n t > < N a t : n > ; <Iist-of-int> <Nat :hd> <l i s t : t l> } One other type is needed to describe the values the data portion a non-empty stack may be. It is: type data { <empty> —* "empty"; < l i s t> —• <list:list-of-int> 63 64 } The basic operations of a stack are: C R E A T E : S T A C K P O P : S T A C K — S T A C K U {ERROR} P U S H : S T A C K X I N T E G E R — S T A C K In Kaviar these operations would be written as functions. The C R E A T E operation creates an empty stack. This function may be written as: create( Bool ) : stack { default "empty" } The P U S H operation might be written as: push( s : stack, i : Nat ) : stack { is-error( s ) —• s; is-empty( s ) -+ i ; is-not-empty( s ) —* i data<s> } As usual, the result of an operation applied to E R R O R is E R R O R . PUSHing a value onto an empty stack yields a stack containing that single value. If the stack is not empty, PUSHing a value onto it corresponds to prepending the value to the stack. The P O P operation could be written as: pop( s : stack ) : stack { is-error( s ) —• s; is-empty( s ) —• "error"; is-not-empty( s ) —• take-tl( da ta<s> ) } take-tl( 1 : list ) : data { i is-int( 1 ) —• "empty"; is-list-of-int( 1 ) — t l < l > } If the stack is empty is it not possible to P O P a value off it , so an error results. If the stack is not empty P O P p i n g off a stack with only one element yields an empty stack, whereas if the stack has more than one element, take the tail end of it. 65 In §3.1.1.1 is -was shown that the addition of the operations D O W N : S T A C K — S T A C K (J { E R R O R } R E A D : S T A C K I N T E G E R U {ERROR} R E T U R N : S T A C K — S T A C K U {ERROR} which perform, respectively, movement down the stack by one position, yield the con-tents of the stack at the current position, and position to the top of the stack, led to the generation of infinite sets of equations. To define these operations in Kaviar, the stack abstraction must first be modified to include the position information. This might be done as: type stack { <error> —• "error"; <empty> "empty" " | " <Nat :pos> ; <not-empty> - * <list :data> " | " <Nat:pos> where the vertical bar separates the elements of the stack from the position information. This change affects the definitions of the basic operations. C R E A T i n g an empty stack must also set the position information to zero, as in : create( Bool ) : stack { default -+ "empty" " | " 0 } A P U S H operation places an element on the top of the stack and must now also increment the position information so that it points to this new top element. The modifications required are: push( s : stack, i : Nat ) : stack { is-error( s ) —• s; is-empty( s ) -»• i " | " 1; is-not-empty( s ) —• i da ta<s> " | " add( pos<s> , 1 ) Similarly, P O P removes the top element from the stack and so must decrement the position information to point to the new top element. Thus P O P must be modified as follows: pop( 3 : stack ) : stack { is-error( s ) —• s; is-empty( s ) —* "error"; is-not-empty( s ) —• take-tl( da ta<s> ) " | " subtract( pos<s> , 1 ) Now that the stack abstraction includes the position information, definition of the operations D O W N , R E A D and R E T U R N may proceed. D O W N could be implemented as: down( s : stack ) : stack { is-error( s ) —• s; is-empty( s ) —• "error"; is-not-empty( s ) —• set-ptr( eq( pos<s> , 1 ), da ta<s> , pos<s> ) set-ptr( b : Bool, 1 : list, n : Nat ) : stack { is-true( b ) —*• "error"; is-false( b ) — 1 " | " subtract( n, 1 ) } If the pointer is at the bottom of the stack already, moving down results in an error. Otherwise decrement the position information. The introduction of D O W N , however, requires additional changes to P U S H and P O P to take care of the fact that POP( D O W N ( s ) ) = E R R O R , and PUSH( D O W N ( s ) ) = E R R O R So before PUSHing or POPping it is necessary to check that the position information points to the top of the stack. If it does not, an error must result. The changes needed are: push( s : stack, i : Nat ) : stack { is-error( s ) —* s; is-empty( s ) ->• i " | " 1; is-not-empty( s ) —• test-top( eq( count( da ta<s> , 0 ), pos<s> ), i da ta<s> " | " add( pos<s> , 1 ) ) } pop( s : stack ) : stack 67 is-error( s ) —• s; is-empty( s ) —• "error"; is-not-empty( 3 ) —» test-top( eq( count( da ta<s> , 0 ), pos<s> ), take-tl( da ta<s> ) " | " subtract^ pos<s> , 1 ) ) } test-top( b : Bool, s : stack ) : stack { is-true( b ) —• s; is-false( b ) —• "error" } R E T U R N i n g the pointer to the top of the stack simply requires setting the position information to point to the top of the stack, or equivalently to the number of elements in the stack. A function which implements R E T U R N might be written as: return( s : stack ) : stack { is-error( s ) —• s; is-empty( s ) "error"; is-not-empty( s ) —• da ta<s> " | " count( da ta<s> , 0 ) } count( 1 : list, n : Nat ) : Nat { is-int( 1 ) —• add( n, 1 ); is-list-of-int( 1 ) —• count( t l < l > , add( n, 1 ) ) } The function R E A D could be written in a similar fashion. Thus using the recursive term algebra approach it is possible to compute invocation expressions such as: { down( down( push( create( T ), 4 ) ) ); return( down( push( push( create( T ), 5 ), 7 ) ) ) } In fact, where in the initial algebra approach infinitely many equations such as (DOWN)k+1(PUSH) k(iv . . . , f„ )—ERROR , for k=0, l ,2 , . . . and RETURN (DO WN )m (PUSH)" (iv . . . , in)=(PUSH)n ( i „ . . . , in) for all m > l , m < n often arise but are not permitted, using the recursive term algebra approach they are computable. 68 From this example, it is evident the recursive term algebra approach provides a simpler and effective means of defining an algebra. 5.2. Comprehensibility and Minimality In the initial algebra approach, adding a new operation requires the equational axioms to be reworked so that different terms are forced to denote the same object, forming the equivalence class. Hence the addition of an operation causes the specification to grow reducing both the comprehensibility and minimality of the specification as defined in §2.5. In contrast, adding a new operation to operate on a term algebra specification does not disturb the specification at all. It may, however, affect some of the other functions defined, as D O W N affected P U S H and P O P . This, however, will be true in any specification technique. Because adding new operations does not affect the specification, it remains the same size and stays comprehensible and minimal. 5.3. Range of Applicability Applications most natural to this specification technique are those involving sym-bolic manipulation. Examples might include symbolic integration or differentiation, transformation of grammars and so on. 5.4. Extensibility Suppose a specification had to be changed. If the changes caused new basis values to be added to the recursive definition, this would correspond to the addition more ter-minal production rules in the grammar. Likewise, if the recursive construction rule was extended the result would be the addition of a new nonterminal production rule in the grammar. 69 Also, changing an existing specification to cany more information, as the position information was added to the stack abstraction, can be done without much effort. Thus, extending the specification is a simple task. In the initial algebra approach this involves rewriting the equational axioms which can involve considerable work. 5.5. Possible Research Given that is has been shown the recursive term algebra approach is a viable method for the specification of data types, Chapter 6 proposes some extensions and an interesting application of the method. 6. Further Research and Conclusions 6. Introduct ion This chapter proposes some extensions to the recursive term algebra approach for the specification of data types. Also, an interesting application of this method is presented. Some conclusions then follow these proposals. 6.1. Possible Extensions to K a v i a r It might be interesting to investigate and implement the following ideas. 0.1.1. Parameter ized T y p e s By introducing variables over types, that is, nonterminals, it is possible to define parameterized types. For example, if S is a type variable then: type s-expr( S ) { < a t o m > —• S; < l i s t> — " ( " <s-expr( S ):car> " . " <s-expr( S ):cdr> " ) " } defines a type of parameterized symbolic expressions. It might be suitable to define such types using two level grammars. Definition of parameterized types would allow for the introduction of polymorphic functions. A function such as reversal of an S-expression could now be defined as: reverse( s : s-expr( S ) ) : s-expr( S ) { is-atom( s ) —•• s; is-list( s ) — " ( " reverse( c d r < s > ) " . " reverse( ca r<s> ) " ) " } i making it applicable to all S-expressions not just, say, to S-expressions whose atoms are numeric. 70 71 6.1.2. N-ary Uni f icat ion Funct ions The current implementation of Kaviar requires functions defined by unification to be unary only. By introducing additional syntax to accomodate n-ary functions the use-fulness of these functions might be increased. Semantic issues become more complicated, however. 6.2. A Possible App l i ca t ion of this M e t h o d Given that data types are equivalent to context free grammars, logic programming provides, in the logic grammar notation, a simple and convenient means of specifying and, because of the procedural interpretation of Horn clause logic, implementing data types. Logic grammars were first introduced by Colmerauer [Colmerauer,1978]. His metamorphosis grammar, strictly more powerful than a context free grammar, is a collec-tion of rewriting rules which can mechanically be translated into Horn clauses, hence, to a logic program. Subsequent work has added several different logic grammar formalisms, and a modification of the definite clause grammars [Abramson,1984] may be used to specify context free data types. A logic program can translate grammatical notation into Horn clauses, generating the selector functions car and cdr, and the parsers atom and list for S-expressions. These functions and predicates may be combined by the user to write other functions over S-expressions. The logic grammar formalism may also be used to define sub-types. Using these ideas, a typed logic programming language may be imposed over a typeless one to aid the reliability of large scale logic programs. A type free logic pro-gramming language ideally suited to this is Prolog. 72 6.3. Conclusions The initial algebra approach has been proposed as a method for the specification of data types. It was found, however, that it is often quite difficult to construct specifications using this technique. This spawned the idea that recursive term algebras should be used instead. The algebras generated by recursive definition were found to be expressible as unambiguous context free grammars. A functional programming language was then developed around the idea. Evaluation of the recursive term algebra technique showed it to be superior to the initial algebra technique. B i b l i o g r a p h y [Abramson,1984]. Abramson, H., "Typing Definite Clause Translation Grammars and the Logical Specification of Data Types are Unambiguous Context Free Grammars," Proceedings of the International Conference on Fifth Generation Computer Systems, Tokyo, Japan, November 6-9, 1984. [Aho&Ullman,i972J. -Aho, A.V. and Ullman, J.D., The Theory of Parsing, Translation and Compiling, Volume. 1: Parsing, pp. 320-330, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1972. [Burstall&Goguen,1982]. Burstall, R.M. and Goguen, J.A., "Algebras, Theories and Freeness: An Introduction for Computer Scientists," University of Edinburgh, Department of Computer Science, Internal Report CSR-101-82, February 1982. [Colmerauer,1978]. Colmerauer, A., "Metamorphosis Grammars," in Natural Language Communication with Computers, ed. Bole, L., Springer-Verlag, 1978. [Earley,1970j. Earley, J., "An Efficient Context-Free Parsing Algorithm," Communications of the ACM, vol. 13, no. 2, pp. 94-102, February 1970. [Goguen, 1977]. Goguen, J.A., "Abstract Errors for Abstract Data Types," Proceedings of the IFIP Working Conference on the Formal Description of Programming Concepts, pp. 21.1-21.32, August, 1977. [Goguen&Tardo,1979]. Goguen, J.A. and Tardo, J.J., "An Introduction to OBJ: A Language for Writing and Test-ing Algebraic Program Specifications," Specifications of Reliable Software Conference Proceedings, Cambridge, Massachusetts, April 1979. 73 74 [Goguen ,Thatcher&Wagner,1978]. Goguen, J.A., Thatcher, J.W., and Wagner, E.G., "Initial Algebra Approach to Specification, Correctness and Implementation of Abstract Data Types," in Current Trends in Programming Methodology, ed. Yeh, R., vol. 4, pp. 80-140, Prentice-Hall, Englewood Cliffs, New Jersey, 1978. [Goguen,Thatcher,Wagner&Wright,1975]. Goguen, J.A., Thatcher, J.W., Wagner, E.G., and Wright, J.B., "Abstract Data Types as Initial Algebras and the Correctness of Data Representations," Proceedings of the Confer-ence on Computer Graphics, Pattern Recognition, and Data Structure, pp. 89-93, sponsored by UCLA Extension in participation with the IEEE Computer Society and in cooperation with the A C M Special Interest Group on Computer Graphics, Los Angeles, California, May 14-16, 1975. [Graham&Harrison,1976]. Graham, S.L. and Harrison, M.A., "Parsing of General Context-Free Languages," in Advances in Computers, ed. Yovits, M.C., vol. 14, pp. 77-139, Academic Press, New York, New York, 1976. [Guttag,1975]. Guttag, J.V., "The Specification and Application of Programming of Abstract Data Types," Computer Systems Research Group report CSRG-59, University of Toronto, 1975. [Guttag&Horning,1978]. Guttag, J.V. and Horning, J. J., "The Algebraic Specification of Abstract Data Types," Acta Informatica, vol. 10, pp. 27-52, 1978. [Kanda&Abrah amson ,1983]. Kanda, A. and Ab rah amson, K., "Data Types as Term Algebras," University of British Columbia, Department of Computer Science, Technical Report 8S-2, March 1983. [Klaeren,1980]. Klaeren, H.A., "An Abstract Software Specification Technique based on Structural Recursion," ACM SIGPLAN Notice, vol. 15, no. 3, pp. 28-34, March 1980. [Liskov&Zilles,1978]. Liskov, B.H. and Zilles, S., "An Introduction to Formal Specifications of Data Abstrac-tions," in Current Trends in Programming Methodology, ed. Yeh, R., vol. 1, pp. 1-32, Prentice-Hall, Englewood din's, New Jersey, 1978. [Majster,1977]. Majster, M.E., "Limits of the Algebraic Specification of Abstract Data Types," ACM SIG-PLAN Notices, vol. 12, no. 10, pp. 37-42, October 1977. [Meseguer&Goguen,1983). Meseguer, J. and Goguen, J.A., "Initiality, Induction and Computability," Computer Sci-ence Laboratory, Computer Science and Technology Division, SRI International, CSL Techn-ical Report 140, Menlo Park, California, December, 1983.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A functional programming language with context free...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A functional programming language with context free grammars as data types Syrotiuk, Violet Rose 1984
pdf
Page Metadata
Item Metadata
Title | A functional programming language with context free grammars as data types |
Creator |
Syrotiuk, Violet Rose |
Publisher | University of British Columbia |
Date Issued | 1984 |
Description | Specifying data structures is important in programming languages. One approach uses initial algebras, but constructing specifications using them is difficult and hampered by restrictions. The recursive term algebra approach, uses recursive definition along with basic operations implied by the definition itself to induce an algebra of terms. Such a recursive definition may be transformed into an unambiguous context free grammar. These grammars are used to define data structures in a functional programming language named Kaviar. Provisions are made for non-context free data structures as well. This approach avoids the difficulties encountered in using the initial algebra method. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-16 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051854 |
URI | http://hdl.handle.net/2429/24779 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1984_A6_7 S97.pdf [ 3.73MB ]
- Metadata
- JSON: 831-1.0051854.json
- JSON-LD: 831-1.0051854-ld.json
- RDF/XML (Pretty): 831-1.0051854-rdf.xml
- RDF/JSON: 831-1.0051854-rdf.json
- Turtle: 831-1.0051854-turtle.txt
- N-Triples: 831-1.0051854-rdf-ntriples.txt
- Original Record: 831-1.0051854-source.json
- Full Text
- 831-1.0051854-fulltext.txt
- Citation
- 831-1.0051854.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051854/manifest