UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An expressive language Whaley, Paul William 1975

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1975_A6_7 W43_4.pdf [ 3.82MB ]
Metadata
JSON: 831-1.0051771.json
JSON-LD: 831-1.0051771-ld.json
RDF/XML (Pretty): 831-1.0051771-rdf.xml
RDF/JSON: 831-1.0051771-rdf.json
Turtle: 831-1.0051771-turtle.txt
N-Triples: 831-1.0051771-rdf-ntriples.txt
Original Record: 831-1.0051771-source.json
Full Text
831-1.0051771-fulltext.txt
Citation
831-1.0051771.ris

Full Text

AN EXPRESSIVE LANGUAGE by PAUL WILLIAM WHALEY B . S c , U n i v e r s i t y of B r i t i s h Columbia A THESIS SUBMITTED IU PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n the Department of Computer Science We accept t h i s t h e s i s as conforming to the r e q u i r e d standard The University of British Columbia March, 1975 In presenting th is thes is in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Un ivers i ty of B r i t i s h Columbia, I agree that the L ibrary sha l l make it f ree ly ava i l ab le for reference and study. I fur ther agree that permission for extensive copying of th is thes is for scho la r ly purposes may be granted by the Head of my Department or by h is representat ives . It is understood that copying or pub l i ca t ion of th is thes is fo r f inanc ia l gain sha l l not be allowed without my wri t ten permission. Depa rtment The Univers i ty of B r i t i s h Columbia Vancouver 8, Canada i Abstract A programming language system which enables a programmer to create his own constructs for the description of an algorithm i s presented. The syntactic and semantic design and some d e t a i l s of the implementation are discussed and the language i t s e l f i s described. For the introduction of new constructs by the programmer, the language system (called "Tove") provides a parsing language for syntactic description and a set of basic procedures for semantic description. The combination of these two enable the programmer to define new procedures, control structures, and operators. F i n a l l y , some problems i n the design and implementation are discussed, together with some thoughts about future work on such systems. i i TABLE OF CONTENTS CHAPTER 1. I n t r o d u c t i o n 1 1.1. S y n t a c t i c Design 1 1.1.1. Syntax Of Tove ........3 1.1.2. T r a d i t i o n a l S y n t a c t i c Methods 4 1. 1.2. 1. FORTRAN .5 1.1.2.2. Algol60 7 1.1.2.3. LISP 7 1.1.2.4. Algol68 8 1.1.2.5. P a s c a l And Sue 9 1.2. Semantic Design 9 1.2.1. P r o c e d u r a l O r g a n i z a t i o n .....10 1.2.2. Scope Conventions ..........11 1.2.3. Values And Data Types ...........................13 1.2.4. C o n t r o l S t r u c t u r e s ...14 CHAPTER 2. User's Guide ....................................15 2.1. Overview Of Tove .....15 2.2. The Scanner .............................17 2.3. The P a r s i n g Language ......19 2.4. The P a r s i n g Of Expre s s i o n s .26 2.5. B a s i c Procedures 28 2.5.1. A r i t h m e t i c Procedures ........................... 28 2.5.2. L o g i c a l Procedures 29 2.5.3. S t r i n g Procedures 29 2.5.4. C e l l Procedures .................................31 TABLE OF CONTENTS i i i 2.5.5. Atom Procedures 33 2.5.6. Conversion Procedures ........................... 33 2.5.7. Comparison Procedure 35 2.5.8. Input-Output Procedures .....36 2.5.9. Quoting Procedures .38 2.5.10. Sequencing Procedures 40 2.5.11. D e c l a r a t i o n Procedures ......................... 43 2.5.12. E v a l u a t i o n Procedures 44 2.5.13. E r r o r Procedure 45 2.5.14. System Procedures ........46 2.6. E r r o r P r o c e s s i n g .................................... 48 CHAPTER 3. Some D e t a i l s Of The Implementation ..............51 3.1. The Scanner .52 3.2. The Parser And T r a n s l a t o r 52 3.2.1. P a r s i n g Method ........52 3.2.2. I n t e r n a l R e p r e s e n t a t i o n Of Programs ............. 55 3.3. The I n t e r p r e t e r 57 3.3.1. Block E x i t And Repeat 59 3.4. Table Management ......60 CHAPTER 4. Conclusi o n s ...........63 4.1. Problems And P o s s i b l e S o l u t i o n s 63 4.1.1. The Scanner ..................................... 63 4.1.2. The Parser And T r a n s l a t o r 64 4.1.3. The I n t e r p r e t e r 67 4.2. Some Successes 67 B i b l i o g r a p h y 69 APPENDIX I . SAMPLE RUN OF TOVE .............71 iv S e v e r a l people have c o n t r i b u t e d towards the design and implementation of t h i s system and towards the p r e p a r a t i o n of t h i s t h e s i s . S p e c i f i c a l l y , my s u p e r v i s o r s Alan B a l l a r d and Ray R e i t e r must be thanked f o r moral support and f o r i d e n t i f i c a t i o n of problems and sugg e s t i o n s f o r improvements i n both the system and the t h e s i s . Alan B a l l a r d and Mark Duflont a l s o have worked very hard improving the SUE System Language used i n the implementation, and a l s o have provided many engaging d i s c u s s i o n s , some of which were r e l e v a n t to the work presented here. I would a l s o l i k e t o thank my parents f o r t h e i r support and encouragement and the N a t i o n a l Research C o u n c i l of Canada for t h e i r f i n a n c i a l support. 1 CHAPTER ONE INTRODUCTION This chapter presents the motivation and design of the syntax and semantics of the Tove system, and some of the previous work i t i s based on. 1 ,1 . SYNTACTIC DESIGN The purpose of any syntax i n a programming language (or in any language) i s to provide a framework for communication between the writer and the readers of the language. A programming language must s a t i s f y the needs of three groups: the programmers writing i n the language, the people reading the programs and the compilers reading and t r a n s l a t i n g the language. That programmers have to be able to read the language i s becoming more appreciated, es p e c i a l l y as the size of programs grow. A very large program cannot be contemplated as a single whole i n a l l i t s i n t r i c a c y , so that a programmer attempting to modify such a large program must be able to read and comprehend parts of i t , as i t i s expressed i n the programming language. Even during the i n i t i a l development of a program modifications are made to i t , so a l l successful programmers must be able to read and comprehend both t h e i r own and other programmers' programs. The necessity for programs to be read and understood by compilers (or interpreters) restr a i n s programming languages, 1.1. SYNTACTIC DESIGN SYNTACTIC DESIGN 2 both s y n t a c t i c a l l y and semantical ly , S p e c i f i c a l l y , very f l e x i b l e and complex syntax in a language w i l l force the compiler to be large and probably also slow. Thus the economic l i m i t a t i o n s on the compiler for a programming language force the syntax of the language to be vas t ly simpler than the syntax cf a natura l language. Since the wr i ter , readers, and compiler of a language must be able to decide and agree exactly what the program w i l l do, a programming language must be much more precise than a natura l language. This also necessitates a s impler , less ambiguous syntax than that of a natural language. Viewed in th i s manner the problem of designing a syntax for a programming language involves a trade-off between the comfort of the three groups who make demands on the syntax of a language. When the programmer i s writ ing in the language he would l i k e a consise , short syntax rather than a verbose and redundant one which w i l l cause him to do a l o t more writ ing or say everything twice. The programmer reading the language wants a c lear syntax with enough redundancy so that he w i l l be able to understand the program with less e f f o r t . The compi ler ' s needs are a formal and precise syntax that i s e a s i l y analysed by a parser which i s small and fa s t . 1.1 . SYNTACTIC DESIGN SYNTACTIC DESIGN 3 1.1.1. Syntax Of Toye The major g o a l of the s y n t a c t i c design of Tove i s the p r o v i s i o n of f a c i l i t i e s which w i l l produce a more readable and w r i t a b l e language. The f a c i l i t i e s are provided by a scheme which allows programmer s p e c i f i c a t i o n of the syntax of h i s program. He i s able t o choose a n a t u r a l syntax which he and ( h o p e f u l l y ) other r e a d e r s w i l l be a b l e to understand. L i k e any other freedom, i f t h i s s y n t a c t i c freedom i s not accepted r e s p o n s i b l y then chaos can r e s u l t . S p e c i f i c a l l y i t i s p o s s i b l e , and indeed e a s i e r , to w r i t e unreadable programs. Thus the w r i t e r of a Tove program must bear i n mind the r e s p o n s i b i l i t y he has t o r e a d e r s of the program, i n c l u d i n g h i m s e l f . The scheme i s s u f f i c i e n t l y r e s t r i c t e d so t h a t the compiler w i l l be able to parse programs without undue d i f f i c u l t y . The s y n t a c t i c s p e c i f i c a t i o n scheme c o n s i s t s of a p a r s i n g language, which i s used by the programmer to d e s c r i b e the syntax he wishes to use to the system and to readers of h i s programs. T h i s p a r s i n g language i s converted to an i n t e r n a l form which i s i n t e r p r e t e d by the p a r s e r , c a u s i n g the p a r s e r to parse the programmer's c o n s t r u c t s . The a c t u a l p a r s i n g method used i s a r e c u r s i v e descent p a r s e r p a t t e r n e d a f t e r the top down operator precedence methods of V. B. P r a t t . [ P r a t ] Included i n the p a r s i n g language are the c a p a b i l i t i e s to s p e c i f y t h a t a procedure has an a r b i t r a r y number of arguments, that some arguments may be omitted, and t h a t p a r t i c u l a r d e l i m i t e r s (such as " e l s e " ) are to be used between arguments. The a b i l i t y to s p e c i f y p r i o r i t i e s of o p e r a t o r s has not been i n c l u d e d , although 1.1.1. SYNTAX OF TOVE SYNTACTIC DESIGN 4 i t i s believed that i t could have been without major changes. The exclusion i s due to the lack of time for implementation and also because there are few operators that have standard, wel l -agreed p r i o r i t i e s . 1. 1.2. T r a d i t i o n a l Syntact ic Methods The design of programming languages has been much studied i n the past twenty years and many techniques have been developed to reconc i l e the c o n f l i c t i n g demands. Often methods have been developed to allow two of the three groups to be accomodated at a small expense to the t h i r d . A good example of th i s i s the in t roduct ion of paragraphers. A paragrapher i s a part of the software system supporting the programming language which produces a l i s t i n g of the program with a standard indentat ion and format. This provides very good redundancy for a reader of the program since he can r e l y on the v i s u a l cues of the indenting to guide him i n understanding the program. I t a lso does not cost the writer any e f f o r t . In fact s ince paragraphing i s of such importance to r e a d a b i l i t y that i t must be done, the prov i s ion of automatic paragraphers frees the writer from t h i s task. Like everything i n l i f e i t i s not without i t s cost , and i n t h i s case the software system grows larger and a b i t slower. The saving i n e f for t to the writer and readers, who are people, now almost c e r t a i n l y outways the costs of an automatic paragrapher so that any complete programming system should now provide automatic paragraphing, The inc lus ion of a comment f a c i l i t y i n a programming 1.1.2. TRADITIONAL SYNTACTIC METHODS SYNTACTIC DESIGN 5 language provides a medium for the writer to speak d i r e c t l y to h i s readers in natura l language, without also speaking to the compiler . As such, comment f a c i l i t i e s are very useful and i t i s inconceivable that any modern programming language should be without a good comment f a c i l i t y . However, i f the language i s such that i t cannot be understood by a reader without extensive comments, then the writer w i l l be forced to write the program twice, once for the compiler and again, in the comments, for his readers. Thus comments are necessary for a readable language, but they are not s u f f i c i e n t i n themselves to make a language readable. 1.1.2.1. FORTRAN Ear ly i n the h i s tory of language design, the balance lay to the s ide of making the compilers small and fast at some expense to both writers and readers of the languages. This was because programs were smal l , both machine time and space were expensive, few of the current techniques used for construct ing compilers were known, programming costs were low, and the increase in programmer product iv i ty which could be gained by the use of more convenient languages for the programmers was greatly underestimated. These factors lead to the developement of assemblers and languages l i k e FORTRAN. [FORT] The FORTRAN programming language uses a card-oriented syntax. It i s unusual i n that i t ignores a l l blanks in statements. Each input card i s divided into three f i e l d s : columns 1 to 5 for statement numbers, column 6 for i n d i c a t i o n of 1.1.2.1. FORTRAN SYNTACTIC DESIGN 6 c o n t i n u a t i o n cards, and columns 7 through 72 f o r the statements. Statements are recog n i z e d by an add-hoc parser. The language i s not s u i t e d to p a r s i n g by any other methods s i n c e the l a c k of keywords can r e q u i r e a great d e a l of look-ahead before i t can be determined whether a symbol i s used as a v a r i a b l e or as a pseudo-keyword. For example, i t i s p o s s i b l e to have an array c a l l e d FORMAT or IF. The syntax c o n t a i n s s e v e r a l anomalies such as the i m p o s s i b i l i t y of a compiler d e t e c t i n g a missing comma i n a DO statement. Since v a r i a b l e s can be used without d e c l a r i n g them, a m i s s p e l t v a r i a b l e r e s u l t s i n the a l l o c a t i o n of a new v a r i a b l e , not i n a complaint from the compiler. The language seems not to have enough redundancy to enable the compiler to d e t e c t common w r i t i n g e r r o r s . F u r t h e r to these problems, the r e s t r i c t i o n of l a b e l s t o only numbers, the r e s t r i c t i o n of the l e n g t h of i d e n t i f i e r s t o only s i x c h a r a c t e r s , and the lack of reasonable c o n t r o l s t r u c t u r e s l e a d very q u i c k l y to unreadable programs. The major advances of FORTRAN over assemblers are the i n t r o d u c t i o n s of i n f i x n o t a t i o n f o r the b u i l t - i n a r i t h m e t i c o p e r a t o r s and s u b r o u t i n e s . The i n f i x n o t a t i o n g i v e s FORTRAN i t s name, which stands f o r FORmula TRANslator. F u n c t i o n s and su b r o u t i n e s which are d e f i n e d by the user are c a l l e d by a p r e f i x n o t a t i o n c o n s i s t i n g of the name i d e n t i f y i n g the procedure, f o l l o w e d by the parameters of the procedure enclosed i n parentheses. 1.1. 2. 1. FORTRAN SYNTACTIC DESIGN 7 1. 1.2.2, Algol60 With the developement of Algol60 [AL60], s e v e r a l techniques f o r e f f i c i e n t l y implementing languages which are more programmer-oriented were i n t r o d u c e d and p u b l i c i z e d . These techniques i n c l u d e the formal s y n t a c t i c d e s c r i p t i o n s and t h e i r subsequent i n c o r p o r a t i o n i n t o compiler generators. Alg6160 r e p r e s e n t s a great s y n t a c t i c advance over languages such as FORTRAN. The syntax of Algol60 i s d e s c r i b e d i n a uniform, f o r m a l i z e d manner and so the language i s s u i t e d to machine generation of p a r s e r s . T h i s leads immediately to the l a c k of s e r i o u s anomalies, with the e x c e p t i o n of the " d a n g l i n g e l s e " problem. Keywords are r e c o g n i z e d by typographic conventions and so cannot be confused with i d e n t i f i e r s . L a b e l s can be c h a r a c t e r i d e n t i f i e r s and i d e n t i f i e r s are no l o n g e r r e s t r i c t e d to s h o r t l e n g t h s . The b u i l t - i n o p e r a t i o n s have reasonable syntax, i n c l u d i n g i n f i x o p e r a t o r s and some reasonable c o n t r o l s t r u c t u r e s . Procedures d e f i n e d by the programmer are c a l l e d with the same s t y l e of syntax as t h a t used by FORTRAN. 1.1.2.3. LISP LISP [ L I S P ] i s s y n t a c t i c a l l y very d i f f e r e n t from e i t h e r FORTRAN or ALGOL60 s i n c e i t has no i n f i x syntax. A l l c o n s t r u c t i o n s are r e p r e s e n t e d by p r e f i x syntax which c o n s i s t s of a l e f t p a r e n t h e s i s , the atom i d e n t i f y i n g the procedure to be c a l l e d , and the arguments of the procedure enclosed by a r i g h t p a r e n t h e s i s . T h i s s y n t a c t i c convention i s very simple, e a s i l y parsed, and can be used f o r both programs and data s i n c e i t 1.1.2.3. LISP SYNTACTIC DESIGN 8 corresponds c lo se ly with the i n t e r n a l representation that i t i s t rans la ted i n t o . Its main drawbacks are i t s many parentheses and i t s cumbersomeness when unary or binary procedures are used extensively (as i n ar i thmetic c a l c u l a t i o n s ) . Unless a LISP program i s paragraphed by machine and c a r e f u l l y read, a misplaced parenthesis can be d i f f i c u l t to detect , sometimes not showing up u n t i l the program i s very thoroughly tested. Despite these problems the syntax of LISP remains elegant, s imple, and uniform. 1.1.2.U. Algol68 With the large decreases in computing machine costs and the simultaneous r i s e s i n programmer costs , recent languages have been designed even more for ease of writ ing and reading and less fo r ease of compilat ion. Algol68 [AL68] i s an example cf such a language. I t i s to be hoped that these newer languages are much eas ier to read and write while only a l i t t l e more cost ly to compile. The syntax of Algol68 i s patterned c lo se ly af ter that of Algol60, i n that i t has a formal descr ip t ion and uses typographic conventions to d i s t ingu i sh programmer-defined i d e n t i f i e r s from the terminals of i t s syntax. Algol68, however, provides an operator dec larat ion mechanism whereby the user can define i n f i x procedures, in addi t ion to the usual pref ix procedures. Algol68 i s c e r t a i n l y a very complex language to parse. Ramer [Rame] describes the e s s e n t i a l technique used to process 1.1 .2 .4. ALG0L68 SYNTACTIC DESIGN 9 such syntac t ic construct ions as operator dec lara t ions . Before the Algol68 program can be completely parsed, the compiler must f i r s t scan the ent i re text of the program and i d e n t i f y and process a l l operator declarat ions and a l l occurrences of the declared operators. Once t h i s has been done, the text can be parsed by formal syntac t i c methods. The language requires a multiple-pass parser because the dec larat ion of an operator may follow i t s use i n the program text . 1.1 . 2 . 5 . Pascal And Sue Pascal [Pasc] and e spec i a l ly i t s de r iva t ive , the Sue System Language [SUE] are good examples of recent languages which were designed s p e c i f i c a l l y for ease of reading. They are very easy to read and understand and they are also f a i r l y easy to compile. However they are r e l a t i v e l y d i f f i c u l t to wri te , c h i e f l y because of the strong compile-time type-checking which requires large amounts of information i n the s p e c i f i c a t i o n of types. 1 . 2 . SEMANTIC DESIGN Semantically the Tove language i s very unsophist icated, and most c lo se ly resembles LISP. The major goal of the semantic design was s i m p l i c i t y , both i n the f a c i l i t i e s provided and in t h e i r implementation. Much of t h i s s i m p l i c i t y i s a necessary r e s u l t of the l i m i t e d time ava i l ab le for implementation. Tove has only the very simplest of data types and only elementary operations on them. The s tructures i t has are very s i m i l a r to LISP CONS c e l l s and atoms. I t s cont ro l s tructures are again 1 .2. SEMANTIC DESIGN SEMANTIC DESIGN 10 simple but the u n r e s t r a i n e d jump ("go to") i s not i n c l u d e d . These c o n t r o l s t r u c t u r e s are patt e r n e d l a r g e l y a f t e r those of the Sue System Language, which i s i t s e l f patterned a f t e r P a s c a l . The scope conventions of Tove and the r e s u l t i n g run-time environmental mechanisms are i n l a r g e p a r t borrowed from BCPL. [ BCPL ] 1 . 2 . 1 . P r o c e d u r a l O r g a n i z a t i o n Included i n the language i s the a b i l i t y o f a programmer to d e f i n e c o n t r o l procedures which permit the c o n t r o l of the e v a l u a t i o n of t h e i r arguments. In essence, the programmer can write h i s own c o n t r o l s t r u c t u r e s . T h i s i s another f e a t u r e of the language t h a t the program w r i t e r must take care with l e s t he render the program incomprehensible. T h i s i n c l u s i o n enables the language t o be completely procedured, meaning t h a t any program i n the language i s j u s t a c o l l e c t i o n of procedure c a l l s , some being c a l l s to p r e - d e f i n e d b a s i c procedures and others to procedures d e f i n e d i n the program. Thus the semantics of the language can be d e s c r i b e d by the semantics of these b a s i c procedures and procedure i n v o c a t i o n and r e t u r n . T h i s i s e s s e n t i a l l y the o r g a n i s a t i o n of languages such as LISP. The language i s a l s o l i k e LISP i n i t s e x p r e s s i o n o r i e n t a t i o n , t h a t i s every c o n s t r u c t i s an ex p r e s s i o n and r e t u r n s a value. 1 . 2 . 1 . PROCEDURAL ORGANIZATION SEMANTIC DESIGN 11 1 .2 .2. Scope Conventions The scope conventions of a language determine the modularity of procedures written i n that language. With good scope rules a procedure w i l l only have access to a small set of var iab les and the programmer w i l l not have to worry about other uses of a var iab le name outside that set of va r i ab le s . Scope i s such an important a id to modularity, and therefore a l so to r e a d a b i l i t y and w r i t a b i l i t y , that i t s i n c l u s i o n was considered e s s e n t i a l . However i t was decided that the overhead of maintaining complete environment chains or displays as in the usual Algo l implementations [ALim] should be avoided by using the BCPL approach. Accordingly there are two kinds of va r i ab le s : permanent and t rans ient . Permanent var iables are s t a t i c a l l y a l loca ted and have the usual Algo l scope rules (but of course not the usual l i f e t i m e s ) . Procedures behave as permanent var iables and therefore can be declared ins ide other procedures, g iv ing nested environments as i n A l g o l . Transient var iab les are a l loca ted on the same stack that i s used to maintain procedure invocat ion information (so that recurs ion i s p o s s i b l e ) . They are therefore a l located dynamically at procedure invocat ion time. They can be referenced only ins ide the body of the procedure that they are declared i n . Parameter var iab les are t rans ient var iab le s . The important s i m p l i f i c a t i o n i s the l i m i t a t i o n of the references to t rans ient var iables which means that the environment of a procedure i s only i t s own stack frame, which i s establ i shed whenever the procedure i s c a l l e d . Related to scope i s the question of whether the language i s 1 .2 .2 . SCOPE CONVENTIONS SEMANTIC DESIGN 12 compiled or i n t e r p r e t e d . The c o r r e c t managing of r e f e r e n c e s to v a r i a b l e s r e q u i r e s t h a t the i n t e r n a l r e p r e s e n t a t i o n of the program c o n t a i n s r e f e r e n c e s to the v a r i a b l e s themselves, and not to the names of the v a r i a b l e s , s i n c e a name may re p r e s e n t d i f f e r e n t v a r i a b l e s i n d i f f e r e n t p a r t s of the program. Thus the i d e n t i f y i n g of d i s t i n c t v a r i a b l e s and r e f e r e n c e s to them must be done d u r i n g the process of c o n v e r t i n g the program to an i n t e r n a l r e p r e s e n t a t i o n which can be i n t e r p r e t e d . T r a d i t i o n a l l y a language processor has been c a l l e d a compiler i f i t produced machine code, that i s , i f i t produced an i n t e r n a l r e p r e s e n t a t i o n of the program which c o u l d be executed by a machine. Recent usage has extended the term c o m p i l a t i o n to i n c l u d e t r a n s l a t i o n i n t o a r e p r e s e n t a t i o n which can be executed by a h y p o t h e t i c a l machine which i s implemented i n software. The Tove t r a n s l a t o r produces an i n t e r n a l r e p r e s e n t a t i o n which i s t r e e - s t r u c t u r e d and so i s u n l i k e the common machine i n s t r u c t i o n stream. During t r a n s l a t i o n i t processes v a r i a b l e and procedure d e c l a r a t i o n s and r e s o l v e s the r e f e r e n c e s to names of v a r i a b l e s to p a r t i c u l a r v a r i a b l e s . The i n t e r n a l r e p r e s e n t a t i o n i s i n t e r p r e t e d by a p a r t of the language system. I t i s l e f t to the reader to decide whether the Tove processor i s an i n t e r p r e t e r , a com p i l e r , or both. 1 . 2 . 2 . SCOPE CONVENTIONS SEMANTIC DESIGN 13 1.2.3. Values And Data T vp_.es One of the important aspects of modern programming language development has been the development of data type schemes that allow a g r e a t d e a l of type checking to be done a t compile time. I t was f e l t t h a t time d i d not permit the i n c o r p o r a t i o n of such schemes i n t o the the language, so a run time type checking scheme has been used. Every value i s represented at run time not o n l y by an encoding of i t s value but a l s o by an encoding of i t s type, so the a p p r o p r i a t e n e s s of o p e r a t i o n s on the value can be checked. The simple data types managed by the language are numbers and s t r i n g s . For s i m p l i c i t y of implementation, the data s t r u c t u r i n g methods of LISP were used to provide compound data types. There i s one important d i f f e r e n c e between the LISP CONS c e l l s and Tove c e l l s : i n LISP a c e l l c o n s i s t s of two p o i n t e r s , each p o i n t i n g to another c e l l or an atom, while i n Tcve a c e l l c o n s i s t s of a value (a number, s t r i n g , c e l l or atom), and a p o i n t e r t o another c e l l . Thus Tove has true number values and not numeric atoms as LISP. L i k e w i s e an atom (which i s used synonymously with v a r i a b l e ) has a value and not a p o i n t e r to a value. One of the l a r g e advantages of using c e l l s t r u c t u r e s as data items i s t h a t they are a l s o used f o r the i n t e r n a l r e p r e s e n t a t i o n of programs, t h e r e f o r e r e q u i r i n g only one s e t of management r o u t i n e s f o r both f u n c t i o n s . The same i s true of the c h a r a c t e r s t r i n g v a r i a b l e s , s i n c e t h e i r management r o u t i n e s are a l s o used f o r the v a r i a b l e (atom) names. 1.2.3. VALUES AND DATA TYPES SEMANTIC DESIGN 14 1 . 2 . 4 . Control Structures The basic c o n t r o l s t ructures provided by the language were chosen because of the i r modularity and ease of implementation. Included are the return from a procedure, the repeat or ex i t of a l a b e l l e d block, the i f - t h e n - e l s e , and the a b i l i t y to step through a l i s t of arguments. The replacement of the "goto" by the ex i t and repeat leads to a much simpler implementation, since the transfer involved e i ther re s ta r t s or completes the evaluat ion of a block which must be act ive (currently being evaluated) at the time of the t rans fer . The a b i l i t y to step through a l i s t of arguments i s provided s ince the syntax provides a means whereby a procedure may have an a r b i t r a r y number of arguments. The stepping i s much l i k e a LISP MAP function i n that the same expression i s evaluated for each part of the argument l i s t . 1 . 2 . 4 . CONTROL STRUCTURES 15 CHAPTER TWO DSER»S GUIDE T h i s s e c t i o n c o n t a i n s a d e s c r i p t i o n of the language. I t i s hoped that the d e s c r i p t i o n presented i s s u f f i c i e n t l y d e t a i l e d to enable the reader t o use the language. A run of the Tove system i s appended to t h i s t h e s i s , p r o v i d i n g a l a r g e example of the use of the system. 2,1. OVERVIEW OF TOVE Tove i s an e x p r e s s i o n o r i e n t e d and run time typed language. I t i s e x p r e s s i o n o r i e n t e d i n t h a t every c o n s t r u c t r e t u r n s some value, much l i k e LISP or ALGOL68. In f a c t , l i k e LISP, every c o n s t r u c t i s a procedure c a l l . In Tove the fundamental components are numbers, s t r i n g s , atoms, c e l l s , and procedures. A value i s e i t h e r some number, s t r i n g , atom or c e l l , or the undefined value. Numbers and s t r i n g s are simply t h a t : a number i s an i n t e g e r (within the range -32768 to 32767 c u r r e n t l y ) and a s t r i n g i s a s e r i e s of zero or more c h a r a c t e r s ( c u r r e n t l y up to a maximum of 255) . An atom i s an e n t i t y which has some name and which has a v a l u e . The name i s a symbol, as d e s c r i b e d i n the s e c t i o n on the scanner, and i s used t o r e f e r to the atom. Atoms are l i k e the v a r i a b l e s of most a l g o r i t h m i c languages. Atoms are c r e a t e d and given a name during c o m p i l a t i o n by d e c l a r i n g them. They must be d e c l a r e d t e x t u a l l y before they are r e f e r r e d t o . There are t h r e e 2.1. OVERVIEW OF TOVE OVERVIEW OF TOVE 16 kinds of atoms: permanent, t r a n s i e n t and procedure atoms. T r a n s i e n t atoms can be r e f e r r e d to only i n s i d e the procedure t h a t they are d e c l a r e d i n . They are a l l o c a t e d v i a a stack d i s c i p l i n e so t h a t they are undefined a t the be g i n i n g of e v a l u a t i o n of t h e i r procedure and may have a d i f f e r e n t value f o r each a c t i v e i n v o c a t i o n of the procedure. Permanent atoms are a l l o c a t e d when they are d e c l a r e d so t h a t they have only one value and r e t a i n i t r e g a r d l e s s of procedure c a l l s . They can be r e f e r r e d t o only i n s i d e the procedure i n which they are d e c l a r e d and any procedures whose d e c l a r a t i o n s are enclosed i n that procedure, as i n the u s u a l ALGOL scope convention. There i s a s p e c i a l permanent atom c a l l e d the n i l atom which i n d i c a t e s the absence of an atom. Procedure atoms are l i k e permanent atoms i n t h e i r scope convention, but they are d i s t i n g u i s h e d by having a procedure body as a value. note t h a t two atoms may have the same name, but onl y i f they are d e c l a r e d i n d i f f e r e n t procedures. In Tove there are two kinds of procedure: value procedures and c o n t r o l procedures. A value procedure e v a l u a t e s i t s arguments before the e v a l u a t i o n of the procedure body whereas a c o n t r o l procedure e v a l u a t e s i t s arguments during the e v a l u a t i o n of the procedure body, when the corresponding formal parameters are e v a l u a t e d . In e i t h e r case e v a l u a t i o n of arguments i s done i n the environment of the procedure c a l l and never i n the environment of the procedure body. Value procedures g e n e r a l l y perform some computations on the values of t h e i r arguments and r e t u r n some val u e . C o n t r o l procedures are u s u a l l y used to 2.1. OVERVIEW OF TOVE OVERVIEW Of TOVE 17 govern the order of e v a l u a t i o n of t h e i r arguments. For example, among the b a s i c procedures a d d i t i o n (+) i s a value procedure while ' i f * i s a c o n t r o l procedure. Procedures are c r e a t e d and given a name during c o m p i l a t i o n by d e c l a r a t i o n s which d e s c r i b e the syntax of the procedure's c a l l (invocation) and the semantics of the procedure's e v a l u a t i o n . The s y n t a c t i c p a r t i s s p e c i f i e d by a p a r s i n g language statement. These statements are d i s c u s s e d i n the s e c t i o n on the p a r s i n g language. The semantic meaning of the procedure i s an e x p r e s s i o n which i s the body of the procedure. A c e l l i s an e n t i t y which i s composed of two p a r t s . The f i r s t part i s a value; the second i s another c e l l . T h i s second p a r t i s o f t e n c a l l e d the "next c e l l " of the c e l l . C e l l s are c r e a t e d during e v a l u a t i o n by a b a s i c procedure and are r e c l a i m e d by the language system when they can no longer be accessed. There i s a s p e c i a l c e l l c a l l e d the n i l c e l l which i n d i c a t e s the absence of a c e l l . 2.2. THE SCANNER The scanner i s t h a t p a r t of the i n p u t r o u t i n e s which d i v i d e s the c h a r a c t e r stream i n p u t i n t o tokens. A token i s a symbol, a number, a s t r i n g , or a comment. No token can extend over more than one l i n e of the i n p u t stream. Numbers are a s e r i e s of one or more d i g i t s . They are t h e r e f o r e always p o s i t i v e i n t e g e r s . A s t r i n g i s any s e r i e s of c h a r a c t e r s e x c l u d i n g the q u o t a t i o n mark (") surrounded by q u o t a t i o n marks. While i n s i d e a s t r i n g , two q u o t a t i o n marks are i n t e r p r e t e d by 2.2. THE SCANNER THE SCANNER 18 the scanner as a s i n g l e q u o t a t i o n mark, not the end of the s t r i n g . I f the r e i s no c l o s i n g q u o t a t i o n mark bef o r e the end of the input l i n e , then an e r r o r i s announced and the s t r i n g i s stopped. A comment i s the percent c h a r a c t e r (%), not i n s i d e a s t r i n g , f o l l o w e d by any c h a r a c t e r s up t o the end of the i n p u t l i n e c o n t a i n i n g the percent. Number tokens and s t r i n g tokens are converted to number va l u e s and s t r i n g v a l u e s while comment tokens are ig n o r e d . A symbol can be e i t h e r any a l p h a b e t i c c h a r a c t e r f o l l o w e d by a s e r i e s of zero o r more a l p h a b e t i c c h a r a c t e r s or d i g i t s , or i t can be a s e r i e s of one or more s p e c i a l c h a r a c t e r s . The alphabet i s A through Z plus the underscore (_). A s p e c i a l c h a r a c t e r i s any c h a r a c t e r which i s not a d i g i t , not i n the alphabet, and not a q u o t a t i o n mark nor a blank. For example: Numbers: 1 0 31415 S t r i n g s : ""«" «A""B" " A B N C H X J (<*>$+1 •» Symbols: a a6 a_b _6 +- ( 0 C -> Comments: % e v e r y t h i n g past the % i s a comment % t h i s i s a l l comment *'%:;<=""!L pl u s junk The scanner i n t e r p r e t s : (1+a_6)* ( - 3 H "boo""") as the tokens: ( symbol 1 number 2. 2. THE SCANNER THE SCANNER 19 + symbol a 6 symbol )* symbol (- symbol 3 number I I symbol boo" s t r i n g symbol 2.3. THE PARSING LANGUAGE The p a r s i n g language i s t h a t p a r t of Tove which i s used to present d e s c r i p t i o n s of syntax to the system. Each statement i n the p a r s i n g language d e s c r i b e s the syntax of a c a l l of a procedure and the formal parameters of the procedure. The syntax of a p a r s i n g language statement i s d e s c r i b e d i n the f o l l o w i n g , using ALGOL68 s y n t a c t i c n o t a t i o n s . Any t e r m i n a l w r i t t e n i n c a p i t a l l e t t e r s i n d i c a t e s t h a t the t e r m i n a l must be t h a t symbol (e.g. the t e r m i n a l EXPR must be the symbol EXPR). In t h i s d e s c r i p t i o n of the p a r s i n g language the words "argument" and "parameter" are not used synonymously. An argument i s a p a r t i c u l a r a c t u a l argument of a c a l l of a procedure whereas a parameter i s the formal parameter used i n the d e f i n i t i o n of the procedure. The purpose of the syntax i s to determine the a s s o c i a t i o n (or binding) between the a c t u a l arguments of a c a l l and the formal parameters of the procedure c a l l e d . 2.3. THE PARSING LANGUAGE THE PARSING LANGUAGE 20 P a r s i n g statement : l e f t syntax , d e f i n e d symbol , r i g h t syntax. The d e f i n e d symbol of the p a r s i n g statement i s the name of the procedure whose syntax i s being d e s c r i b e d . L e f t syntax : EXPR , formal symbol ; . The l e f t syntax of a procedure d e s c r i b e s the l e f t argument of the procedure. I f i t i s empty, then the procedure has no l e f t argument. I f i t i s non-empty then the l e f t argument must be an e x p r e s s i o n and the procedure i s e i t h e r b i n a r y or p o s t f i x . The formal symbol becomes the name of a t r a n s i e n t atom which i s the formal parameter corresponding to the a c t u a l l e f t argument of a c a l l of the procedure. For example: foo i s a procedure with no parameters p a r s i n g statement: foo p o s s i b l e c a l l : foo fooey i s a procedure with one l e f t parameter p a r s i n g statement: EXPR a fooey p o s s i b l e c a l l : 3 fooey parameter a i s a s s o c i a t e d with argument 3 Right syntax : r i g h t syntax element , r i g h t syntax ; . The r i g h t syntax i s a p o s s i b l y empty s e r i e s of r i g h t syntax elements. I t d e s c r i b e s the syntax of the r i g h t arguments of the procedure. The r i g h t arguments are a 2.3. THE PARSING LANGUAGE THE PARSING LANGUAGE 21 concatenation of the arguments d e s c r i b e d : by the r i g h t syntax elements, i n the same order as these elements. Right syntax element : EXPR , formal symbol; The corresponding argument i s an e x p r e s s i o n . The formal symbol becomes the name of a t r a n s i e n t atom which i s the formal parameter corresponding to the argument. For example: Foo i s a procedure with one r i g h t parameter p a r s i n g statement: foo EXPR b p o s s i b l e c a l l : foo "boo" parameter b i s a s s o c i a t e d with argument "boo" Fun i s a b i n a r y procedure p a r s i n g statement: EXPR a fun EXPR b p o s s i b l e c a l l : 7 fun "too" a i s a s s o c i a t e d with 7 and b with " too" Right syntax element : PERHANENT_ATOM , formal symbol ; The argument i s a symbol which becomes the name of a permanent atom. The c o m p i l a t i o n of the procedure c a l l causes the c r e a t i o n of the permanent atom. The formal symbol becomes the name of a t r a n s i e n t atom which i s the formal parameter corresponding to the argument. The formal parameter w i l l have as value the permanent atom c r e a t e d . T h i s syntax element i s i n c l u d e d mainly f o r the d e s c r i p t i o n of b a s i c procedures. 2.3. THE PARSING LANGUAGE THE PARSING LANGUAGE 22 For example: p a r s i n g statement: foo PERMANENT_ATOM bar p o s s i b l e c a l l : f oo x27 x27 i s made the name of a permanent atom and bar i s i a s s o c i a t e d with t h i s permanent atom Right syntax element : TRAflSIENT_ATOM , formal symbol ; T h i s element i s j u s t l i k e the PERMANENT_ATOM case, except that the argument becomes the name of a t r a n s i e n t atom r a t h e r than a permanent one. Again, t h i s syntax element i s i n c l u d e d mainly f o r the d e s c r i p t i o n of b a s i c procedures. For example: p a r s i n g statement: f i d o TRANSIENT_ATOM b i f p o s s i b l e c a l l : f i d o l e n l e n i s made the name of a t r a n s i e n t atom and b i f i s a s s o c i a t e d with t h i s permanent atom Right syntax element : PROCEDURE_DEFINITION , formal symbol , stop symbol ; T h i s syntax element i s intended o n l y f o r d e s c r i b i n g the b a s i c procedures which are used to d e c l a r e procedures, and should not normally be used by the user. I t i n d i c a t e s t h at the argument w i l l be a d e c l a r a t i o n of a procedure. The a c t u a l argument must be a p a r s i n g statement, f o l l o w e d by the symbol IS, then an e x p r e s s i o n and the stop symbol of 2.3. THE PARSING LANGUAGE THE PARSING LANGUAGE 23 j t h i s syntax element. The p a r s i n g statement d e s c r i b e s the i syntax of a c a l l of the procedure and the e x p r e s s i o n i s the body of the procedure. The fo r m a l symbol becomes the name of a t r a n s i e n t atom which i s the formal parameter corresponding to the argument. T h i s atom has as value the procedure atom c r e a t e d by the d e c l a r a t i o n . For example: pa r s i n g statement: proc PROCEDURE_DEFINITION proc_atom endproc p o s s i b l e c a l l : proc EXPR a foo IS a - 2 endproc foo i s made the name of a procedure atom, with syntax EXPR a foo and body a - 2 . Proc_atom i s a s s o c i a t e d with t h i s procedure atom. Right syntax element : SEXPR , formal symbol , stop symbol ; T h i s syntax element p r o v i d e s a convenient way of i n p u t i n g p a r t i c u l a r c e l l s t r u c t u r e s . I t should be noted t h a t these s t r u c t u r e s may not correspond to v a l i d i n t e r n a l r e p r e s e n t a t i o n s of ex p r e s s i o n s and so g e n e r a l l y the s-expr e s s i o n argument should not be ev a l u a t e d . See the d e s c r i p t i o n of the p e r i o d (,) quoting procedure, The argument i s an s- e x p r e s s i o n . An s - e x p r e s s i o n i s a •number, s t r i n g , or symbol, or i t i s a s e r i e s of zero or more s-e x p r e s s i o n s enclosed i n parentheses. I f an s-ex p r e s s i o n i s a symbol which i s the name of an atom then the s - e x p r e s s i o n i s i n t e r n a l l y r e presented by that atom. I f the symbol i s not the name of an atom then i t becomes 2.3. THE PARSING LANGUAGE THE PASSING LANGUAGE 24 the name of a (newly-created) permanent atom. I f the s-expre s s i o n i s a s e r i e s of s-expressions then the s-e x p r e s s i o n i s i n t e r n a l l y the c e l l l i s t of those s-e x p r e s s i o n s . I f the s-exp r e s s i o n i s e n t i r e l y empty then i t i s i n t e r n a l l y the n i l atom. The stop symbol i s used to i n d i c a t e the end of the s - e x p r e s s i o n . Examples: p a r s i n g statement: p r i n t _ t r e e SEXPR t r e e e n d p r i n t _ t r e e p o s s i b l e c a l l : p r i n t _ t r e e (a b ( + foo 19) " s t r i n g " ) e n d p r i n t _ t r e e Tree's value i s a c e l l l i s t of atom a, atom b, a c e l l l i s t of atom +, atom f o o , and number 19, and s t r i n g " s t r i n g " . p o s s i b l e c a l l : p r i n t _ t r e e e n d p r i n t _ t r e e i Tree's value i s the n i l atom. p o s s i b l e c a l l : p r i n t _ t r e e ( ) e n d p r i n t _ t r e e Tree's value i s the n i l c e l l . p o s s i b l e c a l l : p r i n t _ t r e e boo e n d p r i n t _ t r e e Tree's value i s the atom boo. ght syntax element : OPTIONAL , o p t i o n a l symbol , r i g h t syntax , ENDOPTIONAL ; T h i s c o n s t r u c t i o n i n d i c a t e s t h at the argument may be omitted from a c a l l of the procedure. I f the o p t i o n a l symbol of the syntax element i s present i n a c a l l then the arguments d e s c r i b e d i n the r i g h t syntax enclosed by the ENDOPTIONAL must a l s o be present. The omission of the 2 . 3 . THE PARSING LANGUAGE THE PARSING LANGUAGE 25 o p t i o n a l symbol i n d i c a t e s the omission of those arguments. Formal parameters of omitted arguments have as value the undefined value. For example: p a r s i n g statement: do EXPR index OPTIONAL by EXPR s t e p _ s i z e ENDOPTIONAL EXPR loop enddo p o s s i b l e c a l l : do i by 1 i n s i d e _ l o o p enddo Index i s bound to i , s t e p _ s i z e to 1, and loop to i n s i d e _ l o o p . p o s s i b l e c a l l : do i inside__loop enddo Index i s bound to i , s t e p _ s i z e to undefined, and loop to i n s i d e _ l o o p . ght syntax element : LIST , formal symbol , r i g h t syntax , ENDLIST , se p a r a t o r symbol , stop symbol ; T h i s syntax element p r o v i d e s a method of d e f i n i n g a procedure which i s to have an a r b i t r a r y number of arguments. The argument corresponding t o t h i s syntax element i s a l i s t of one or more p a r t s . The p a r t s a l l have the same syntax and t h i s syntax i s d e s c r i b e d by the r i g h t syntax enclosed by the ENDLIST. The s e p a r a t o r symbol must be between p a r t s of the argument l i s t . The stop symbol i n d i c a t e s the end of the l i s t and so must be the l a s t symbol of the argument. The f o r m a l symbol becomes the name of a t r a n s i e n t atom which i s the f o r m a l parameter corresponding to the l i s t . The •eachpart* r o u t i n e , which 2.3. THE PARSING LANGUAGE THE PARSING LANGUAGE 26 w i l l be d e s c r i b e d i n the semantic s e c t i o n , i s used to a s s o c i a t e each part of the argument l i s t with the corresponding formal parameters of the r i g h t syntax enclosed by ENDLIST. i For example: p a r s i n g statement: s e l e c t LIST cases EXPR c a s e p a r t ENDLIST , e n d s e l e c t p o s s i b l e c a l l : s e l e c t e a s e l , case2, case3 e n d s e l e c t Cases i s bound to a i n t e r n a l r e p r e s e n t i o n o f the l i s t e a s e l , case2 and case3. Casepart i s bound t o the undefined v a l u e . See the d e s c r i p t i o n of the eachpart r o u t i n e f o r a way to bind c a s e p a r t to each of e a s e l , case2 and case3. Right syntax element : d e l i m i t e r symbol . The argument i s the d e l i m i t e r symbol. 2.4. THE PARSING OF EXPRESSIONS Tove a l l o w s the user to d e f i n e i n f i x o p e r a t o r s , so there must be some way of determining which o p e r a t o r s have precedence. The problem can be s t a t e d as: gi v e n the phrase o p e r a t o r l operand ope r a t o r 2 , where o p e r a t o r l has a r i g h t operand and operator2 has a l e f t operand, how do we decide whether to a s s o c i a t e the operand with o p e r a t o r l or operator2 ? In other words, i s the phrase grouped as ( o p e r a t o r l operand) operator2 or i s i t o p e r a t o r l (operand operator2) ? In such cases of c o n f l i c t i n 2.4. THE PARSING OF EXPRESSIONS THE PARSING OF EXPRESSIONS 27 Tove the operand always a s s o c i a t e s with the l e f t operator (the f i r s t g r o u p i n g ) , assuming t h a t the operand i s as s h o r t a s t r i n g of the i n p u t as p o s s i b l e . T h i s leads to a l e f t to r i g h t e v a l u a t i o n of e x p r e s s i o n s with no precedence. For example given a syntax of EXPR a + EXPR b, EXPR c * EXPR d f and ( EXPR e ) we have: 4 + 2 * 3 i s the same as ( 4 + 2 ) * 3 and e v a l u a t e s to 18 4 * 2 + 3 i s the same as (4 * 2) + 3 and e v a l u a t e s t o 11 4 + ( 2 * 3 ) e v a l u a t e s to 10 4 * ( 2 + 3 ) e v a l u a t e s t o 20 Note t h a t when an e x p r e s s i o n i s an i n t e r i o r argument, ( i . e . the corresponding syntax element i s not the f i r s t or l a s t element i n the p a r s i n g statement) then t h a t argument i s the l o n g e s t v a l i d e x p r e s s i o n t h a t i t can be. I t w i l l g e n e r a l l y c o n t i n u e u n t i l a d e l i m i t e r (such as ") ") or an atom which has no l e f t argument. I f an argument's corresponding syntax element i s the l a s t element of the par s i n g statement then the argument i s as s h o r t as i t can be. Thus i n the f i r s t example the argument a s s o c i a t e d with b i s only 2 and not 2 * 3 s i n c e t h i s i s a rightmost argument. In the t h i r d example the argument a s s o c i a t e d with e i s 2 * 3 and not 2 s i n c e t h i s argument's corresponding syntax element i s not the l a s t i n the p a r s i n g statement (the syntax element " ) " f o l l o w s i t ) . Note a l s o t h a t i n t h i s t h i r d example the argument a s s o c i a t e d with b i s (2 * 3) s i n c e t h i s i s the s h o r t e s t v a l i d p o s s i b i l i t y . 2.4. THE PARSING OF EXPRESSIONS BASIC PROCEDURES 28 2.5. BASIC PROCEDURES The b a s i c procedures d e s c r i b e d i n t h i s part of the user's guide are b u i l t - i n procedures which provide the fundamental components of the semantics of the language. The meaning of any procedure i s u l t i m a t e l y r e s o l v a b l e to some combination of these b a s i c procedures. The d e s c r i p t i o n of the procedures i s d i v i d e d i n t o s e c t i o n s based on a c l a s s i f i c a t i o n of the procedures. 2.5.1. A r i t h m e t i c Procedures Tove pro v i d e s the f i v e common b i n a r y o p e r a t i o n s : a d d i t i o n , s u b t r a c t i o n , m u l t i p l i c a t i o n , d i v i s i o n , and modulo (remainder or r e s i d u e ) . These are represented by the symbols -, *, /, and mod r e s p e c t i v e l y . A l l of these are s y n t a c t i c a l l y b i n a r y procedures. The unary negation i s provided by the p r e f i x procedure represented by the symbol underscore (_). A l l a r i t h m e t i c i s performed on s i x t e e n b i t numbers. I f the operands are not numbers or i f the r e s u l t s overflow the s i x t e e n b i t s (32767 t o -32768) then an e r r o r i s r e p o r t e d . Examples: 4 + 2 * 3 e v a l u a t e s to 18 4 - 2 * 3 e v a l u a t e s to 6 1 0 / 4 e v a l u a t e s t o 2 11 mod 4 e v a l u a t e s to 3 10 e v a l u a t e s to -10 2.5.1. ARITHMETIC PROCEDURES BASIC PROCEDURES 29 2.5 .2 . Log ica l Procedures The basic l o g i c a l procedures provided are or, and, exclus ive or and l o g i c a l negation. These are represented by the symbols | , S , xor, and -1 r e spec t ive ly . The f i r s t three are binary procedures while the l a t t e r i s pos t f ix . A l l these procedures operate bi t-wise on the sixteen b i t s of t h e i r arguments, which must be numbers, and produce numbers. Examples: 5 & 3 i s 1 7 | 3 i s -5 3 & 1 | 5-i i s -6 2 .5 .3 . S t r ing Procedures For manipulating s t r i n g s , basic procedures are provided which combine two s t r i n g s , d iv ide a s t r i n g taking the f i r s t or l a s t par t , extract a character from a s t r i n g , and give the length of a s t r i n g . For the descr ip t ion of some of these procedures, we define the index of a character i n a s t r i n g to be a number g iv ing the pos i t ion of the character in the s t r i n g , s t a r t i n g with zero for the f i r s t character . EXPR a | | EXPR b The concatenation procedure returns a s t r i n g which i s the characters of the value of the f i r s t argument followed by the characters of the value of the second argument. For example: 2 .5 .3 . STRING PROCEDURES BASIC PROCEDURES 30 " h i " || "there •t i s " h i t h e r e " I I n | | "empty" i s "empty I I EXPR a <| EXPR b T h i s procedure r e t u r n s a s t r i n g which i s the c h a r a c t e r s of the value of the f i r s t argument which are before the c h a r a c t e r whose index i s the value of the second argument. The l e n g t h of the s t r i n g r eturned i s t h e r e f o r e always the value of the second argument. ! For example: "abcdef" <| 3 i s "abc" "abcdef <| 6 i s "abcdef" "abc" <| 0 i s "" "abc" <| 4 i s an e r r o r EXPR a >| EXPR b T h i s procedure r e t u r n s a s t r i n g which i s the c h a r a c t e r s of the value of the f i r s t argument which i n c l u d e and f e l l o w the c h a r a c t e r whose index i s the value of the second argument. when combined with the <| procedure any s u b s t r i n g can be e x t r a c t e d . Examples: "abcdef" >| 3 i s "def" "abcdef" >| 6 i s "" "abc" >| 0 i s "abc" "abcdef" >| 3 <| 2 i s "de" "abcdef" <| 3 >| 2 i s " c " "ab" >| 3 i s an e r r o r 2.5.3. STRING PROCEDURES BASIC PROCEDURES 31 EXPR a char EXPR b T h i s procedure r e t u r n s a s t r i n g which i s the c h a r a c t e r of the value of the f i r s t argument whose index i s the value of the second argument. ' For example: "abc" char 2 i s " c " "ab" char 0 i s " a " "ab" char 2 i s an e r r o r l e n EXPR a The le n g t h procedure r e t u r n s the le n g t h of the value of i t s argument. For example: l e n "abc" i s 3 l e n "" i s 0 l e n 3 i s an e r r o r 2.5.4. C e l l Procedures EXPR a c r e a t e EXPR b T h i s procedure r e t u r n s a new c e l l whose value i s the value of the f i r s t argument and whose next c e l l i s t h a t given by the value of the second argument. EXPR a o T h i s procedure r e t u r n s the value of the c e l l which i s the value of a. 2.5.4. CELL PROCEDURES BASIC PROCEDURES 32 EXPR a S> T h i s procedure r e t u r n s the next c e l l p a r t of the c e l l which i s the value of a. EXPR a = 0 EXPR b The value p a r t of the c e l l which i s the value of b i s mod i f i e d t o the value of a. The modified c e l l i s r e t u r n e d . EXPR a =a> EXPR b The next c e l l p a r t of the c e l l which i s the value of b i s mod i f i e d t o the c e l l which i s the value of a. The modified c e l l i s r e t u r n e d . (| LIST a EXPR b ENDLIST , |) T h i s procedure produces a l i s t of c e l l s . There i s one c e l l f o r each part of the l i s t a. Each c e l l has as value part the value of the corresponding p a r t of the l i s t a f and has as next p a r t the c e l l c o r r e s p o n d i n g t o the next p a r t of l i s t a. The next c e l l p art of the l a s t c e l l i s the n i l c e l l . The procedure r e t u r n s the f i r s t c e l l . Examples: (assume t h a t the atom n i l has as value the n i l c e l l . ) 1 c r e a t e n i l c r e a t e s a c e l l with value 1 and next c e l l n i l . 1 c r e a t e n i l <3 i s 1 1 c r e a t e n i l S> i s the n i l c e l l 2 = 0 (1 c r e a t e n i l ) O i s 2 2 =<S (1 c r e a t e n i l ) 3> i s the n i l c e l l 2.5.4. CELL PROCEDURES BASIC PROCEDURES 33 1 c r e a t e n i l =3> (2 c r e a t e n i l ) a!> i s a c e l l with value 1 and next c e l l n i l (| 1, 2, "ab", x |) i s a l i s t of c e l l s with values 1, 2, "ab", and the value of x (| "a", 2, 5, |) a> §> O i s 5 2.5.5. Atom Procedures EXPR a -> EXPR b The only o p e r a t i o n a v a i l a b l e s p e c i f i c a l l y f o r atoms i s the assignment of a value t o an atom. T h i s assignment procedure a s s i g n s the value of a to the atom b. Note that b, the second argument, i s not ev a l u a t e d and must be an atom. The assignment i s t h e r e f o r e l e f t t o r i g h t . The assignment procedure r e t u r n s the value of a which was assigned. For example: 3 -> x i s 3 and the value of x i s 3 3 -> x -> y i s 3 and the value s of x and y are 3 2.5.6. Conversion Procedures EXPR a t o s t r i n g T h i s procedure c o n v e r t s the value of a i n t o a s t r i n g . If the v a l u e of a i s a s t r i n g , t h a t s t r i n g i s r e t u r n e d . I f i t i s a number, the s t r i n g r e t u r n e d i s the a p p r o p r i a t e s e r i e s of d i g i t s , 2.5.6. CONVERSION PROCEDURES BASIC PROCEDURES 34 p o s s i b l y with a l e a d i n g minus s i g n (-). I f i t i s an atom, the s t r i n g r eturned i s the symbol which i s the name of the atom. I f i t i s a p o i n t e r then an e r r o r i s announced. For example: 3 t o s t r i n g i s "3" 15 t o s t r i n g i s "-15" "a" t o s t r i n g i s "a" • x* t o s t r i n g i s "x" (see the d e s c r i p t i o n of the quote (') procedure) EXPR a tonumber The value of a i s converted i n t o a number. I f i t i s a s t r i n g , then i s must be of the form: zero or more blan k s , p o s s i b l y a minus s i g n (-), zero or more blanks, one or more d i g i t s , the end of the s t r i n g or some c h a r a c t e r which i s not a d i g i t (0 t o 9) . For example: " 3270" tonumber i s 3270 " - 3" tonumber i s -3 "15a" tonumber i s 15 2.5.6. CONVERSION PROCEDURES BASIC PROCEDURES 35 2.5.7. Comparison Procedure EXPR a :: EXPR b The comparison procedure i s the main way to compare two value s of the same type. I f the value of a i s not of the same type as the value of b, then an e r r o r i s announced. I f the value of a i s g r e a t e r than the value of b, the number four i s r e t u r n e d . I f equal, then two i s r e t u r n e d and i f l e s s then one i s r e t u r n e d . Numbers are compared i n the usual a r i t h m e t i c manner. S t r i n g s are compared by the u s u a l a l p h a b e t i c o r d e r i n g : f i r s t compare the two s t r i n g s f o r the le n g t h of the s h o r t e r , i f they are egual then the s h o r t e r s t r i n g i s l e s s than the l o n g e r . . Atoms and c e l l s have no o r d e r i n g and t h e r e f o r e they are e i t h e r equal or unequal. I f they are eq u a l , two i s r e t u r n e d , i f not then f i v e i s r e t u r n e d . Examples: 1 :: 3 i s 1 4 :: -2 i s 4 "a" :: "ab" i s 1 "b" :: "ab" i s 4 "" :: " « i s 1 x :: x i s 2 1 c r e a t e n i l :: (1 c r e a t e n i l ) i s 5 2.5.7. COMPARISON PROCEDURE BASIC PROCEDURES 36 2.5.8. I l l£ut-Out£ut Procedures EXPR a p r i n t The value of a i s converted to a s t r i n g (unless i t i s a c e l l ) and appended to the c u r r e n t output b u f f e r . The b u f f e r i s not w r i t t e n u n t i l the newline procedure i s c a l l e d , or the b u f f e r i s f u l l (255 c h a r a c t e r s ) . The o r i g i n a l (unconverted) value of a i s r e t u r n e d , newline T h i s procedure causes the c u r r e n t output b u f f e r to be w r i t t e n and r e t u r n s the empty s t r i n g , parser T h i s procedure reads the s h o r t e s t p o s s i b l e e x p r e s s i o n from the i n p u t and r e t u r n s i t s i n t e r n a l r e p r e s e n t a t i o n , , read_sexpr An s - e x p r e s s i o n i s read from the input and i t s i n t e r n a l r e p r e s e n t a t i o n i s r e t u r n e d . For the d e f i n i t i o n of an s-e x p r e s s i o n see the d e s c r i p t i o n of s-expr arguments i n the s e c t i o n on the p a r s i n g language. I f the r e a d i n g causes a new inp u t l i n e t o be read, and i t i s read from a t e r m i n a l , the p r e f i x c h a r a c t e r i s changed to a p e r i o d (.). The s- e x p r e s s i o n must end with a p e r i o d , read 2.5.8. INPUT-OUTPUT PROCEDURES BASIC PROCEDURES 37 The next l i n e of the in p u t i s r e t u r n e d , as a s t r i n g . T h i s procedure always causes the reading of a l i n e . I f the l i n e i s read from a t e r m i n a l , the p r e f i x c h a r a c t e r i s changed to a minus s i g n (-). EXPR a wr i t e The value of a must be a s t r i n g . I t i s w r i t t e n as the next l i n e of output. readfrom EXPR a A f t e r c a l l i n g t h i s procedure, a l l i n p u t i s read from the inp u t f i l e s p e c i f i e d by the value of a, When the end of the input f i l e i s reached, i n p u t w i l l once again be read from the d e f a u l t i n p u t f i l e (which i s SCARDS). Any l i n e s ' r e a d from the f i l e w i l l be echoed to the user (unless the corresponding s t a t u s switch i n d i c a t e s otherwise - see the d e s c r i p t i o n of the s t a t u s procedure). I f the value of a i s a s t r i n g , then that i s the name of the f i l e to be read from. I f the value of a i s a number between 0 and 19, then i t i s the MTS l o g i c a l u n i t number to be read from. I f the value of a i s -1, then the in p u t w i l l be read from SCARDS, Any other value of a i s an e r r o r . The value of a i s r e t u r n e d . w r i t e t o EXPR a A f t e r c a l l i n g t h i s procedure, l i n e s w r i t t e n using the write procedure are w r i t t e n to the wr i t e f i l e s p e c i f i e d by the value of a. E r r o r messages and output produced by p r i n t and newline 2.5.8. INPUT-OUTPUT PROCEDURES BASIC PROCEDURES 38 are always w r i t t e n to SPRINT, I f the value of a i s a s t r i n g , then t h a t i s the name of the f i l e to be w r i t t e n t o . I f the value of a i s a number between 0 and 19, then i t i s the MTS l o g i c a l u n i t number to be w r i t t e n to. I f the value of a i s -1 or -2, then l i n e s w i l l be w r i t t e n to SPRINT and SPUNCH, r e s p e c t i v e l y . Any other v a l u e of a i s an e r r o r . The value c f a i s r e t u r n e d . Examples: i n p u t : (1 p r i n t ; " boo" p r i n t ; 2 p r i n t ; newline ; 3 p r i n t ) output: 1 boo2 : 3 in p u t : read : abc"def 95 h i the value read i s the s t r i n g of c h a r a c t e r s : abc"def % h i i n p u t : (read write) the next input l i n e w i l l be echoed 2.5,9. Quotinc] ££2£§^ures These two procedures r e t u r n t h e i r arguments without e v a l u a t i n g them. « EXPR a • ; T h i s procedure r e t u r n s the i n t e r n a l r e p r e s e n t a t i o n of the ex p r e s s i o n which i s the argument corres p o n d i n g t o a . Note t h a t 2.5.9. QUOTING PROCEDURES BASIC PROCEDURES 39 the argument must be a v a l i d expression, but i s not evaluated by t h i s procedure. This routine should be used to introduce values which are subsequently to be evaluated. (See the descr ip t ion of the eva l rout ine . ) For example: * newline * i s the atom named newline 1 2 ' i s the number 2 * 1 + 2 ' i s the i n t e r n a l representation of 1 • 2 * x 1 i s a syntax error i f x i s not the name of an atom .,SEXPR a . This procedure returns the i n t e r n a l representat ion of the s-expression which i s the argument corresponding to a. This should be the usual way of introducing par t i cu l a r c e l l s tructures or atoms which are to be used i n c e l l s t ructures . For example: , newline . i s the atom newline . x . i s the atom x. I f x i s not an atom then i t i s made a permanent one . ( 1 + 2 ) . i s a c e l l l i s t of the number 1, atom and the number 2 2 .5 .9 . QUOTING PROCEDURES BASIC PROCEDURES 40 2.5.10. Sequencing Procedures These procedures a f f e c t the order of e v a l u a t i o n of e x p r e s s i o n s . stop T h i s procedure s t o p s the Tove system and r e t u r n s the user to MTS. EXPR a r e t u r n The value of a i s re t u r n e d as the value of the procedure body i n which the c a l l of r e t u r n i s made. For example, i f procedure dumb has as body the e x p r e s s i o n : (1 r e t u r n ; x) then the value of procedure dumb i s always 1 and x i s never ev a l u a t e d i n the procedure body. i f EXPR a then EXPR b e l s e EXPR c e n d i f The i f procedure e v a l u a t e s a and based on t h a t value i t r e t u r n s the value of e i t h e r b or c. Only one of b or c i s ever e v a l u a t e d . The t h i r d argument, c, i s evaluated when the value of a i s one o f : the number zero (0), the empty s t r i n g {""), the n i l c e l l , the n i l atom, or the undefined value. In a l l other cases the second argument, b, i s e v a l u a t e d . For example: i f 0 then 1+2 e l s e 4 e n d i f i s 4 i f 1 then 3+2 e l s e 2+2 e n d i f i s 5 (: TRANSIENT_ATOM a : EXPR b : EXPR C :) 2.5.10. SEQUENCING PROCEDURES BASIC PROCEDURES 41 This procedure defines a l abe l l ed block. ; The t rans ient atom associated with a becomes the name of the block. The argument c must be the same atom as a. The second argument, b, i s c a l l e d the body of the block. The value of b i s returned. EXPR a ex i t EXPR b This procedure returns the value of a as the value of the block l a b e l l e d by the atom which i s the argument b. The l a b e l l e d block must s t i l l be a c t i v e : that i s , t h i s procedure must only be ca l l ed during the evaluation of the body of the block. repeat EXPR b This procedure re s ta r t s the evaluat ion of the body of the block l abe l l ed by the atom which i s the argument associated with b. Examples: (: b lockl : 1 ex i t b l o c k l ; 2 : b lockl :) i s 1 (: block2 : 1 p r i n t ; repeat block2 : block2 :) i s an i n f i n i t e loop, p r i n t i n g 1 at each r e p e t i t i o n ( LIST a EXPR b ENDLIST ; ) Each of the arguments i n the LIST are evaluated i n turn ( lef t to r ight) and the value of the l a s t i s returned. In the present implementation of the parser , the parentheses may be omitted from an enclosed argument. 2 .5.10. SEQUENCING PROCEDURES BASIC PROCEDURES 42 For example: (:x: f i r s t ; second :x:) i s the same as (:x: ( f i r s t ; second ) :x : ) EXPR a eachpart EXPR b endeachpart The eachpart procedure i s used i n procedures with LIST parameters. The argument a must be a LIST parameter atom of a procedure. , For each part of the LIST argument corresponding to a, the arguments in that part of the l i s t are bound to t h e i r corresponding formal parameters and the argument b (of eachpart) i s evaluated. The l a s t value of b i s returned. For example: I f put i s defined as: put LIST p u t l i s t EXPR putone EXPR puttwo ENDLIST , endput then the body of put might be: p u t l i s t eachpart putone p r i n t ; puttwo p r i n t endeachpart I f put i s c a l l e d as: put 1 "boo", " h i " 2 endput then the output would be: 1boohi2 For each part of the LIST the arguments i n that part (1 and "boo" i n the f i r s t part , " h i " and 2 i n the second) are bound to putone and puttwo (respectively) and putone p r i n t ; puttwo pr in t i s evaluated. 2,5.10, SEQUENCING PROCEDURES 2.5. 1 1 Declaration BASIC PROCEDURES Procedures 43 tran TRANSIENT_ATOM a This procedure declares i t s argument (which must be a symbol) to be a t rans ient atom. I t returns the atom. For example: tran x makes x a t rans ient atom and returns i t perm PERMANENT_ATOM a This procedure declares i t s argument . (which must be a symbol) to be a permanent atom. It returns the atom. For example: perm y makes y a permanent atom and returns i t proc PR0CEDURE_DEFINITI0N a endproc This procedure defines a value procedure. It returns the procedure atom defined, For example: proc EXPR a ++ EXPR b i s a + b * 2 endproc defines the procedure atom +* to have syntax EXPR a ++ EXPR b and body a + b * 2 . I t returns the atom nproc PROCEDURE_DEFINITION a endnproc This procedure i s the same as the proc procedure except that i t defines a cont ro l procedure. Control procedures d i f f e r from value procedures i n that they evaluate t h e i r arguments when 2.5.11. DECLARATION PROCEDURES BASIC PROCEDURES UU the c orresponding formal parameters are e v a l u a t e d . Value procedures e v a l u a t e a l l t h e i r arguments before t h e i r body i s e v a l u a t e d . In e i t h e r case e v a l u a t i o n of arguments i s done i n the environment of the c a l l of the procedure, never i n the environment of the body of the procedure. For example: I f put i s d e f i n e d as: proc put EXPR a EXPR b endput i s a p r i n t ; b p r i n t endproc then: put 1 newline endput ; newline produces as output a blank l i n e (assuming the output b u f f e r i s i n i t i a l l y empty) f o l l o w e d by a l i n e c o n t a i n i n g the d i g i t But i f put i s d e f i n e d as: nproc put EXPR a EXPR b endput i s a p r i n t ; b p r i n t endnproc then: put 1 newline endput ; newline produces as output a l i n e c o n t a i n i n g the d i g i t 1 (assuming the output b u f f e r i s i n i t i a l l y empty) then a blank l i n e . 2 . 5 . 1 2 . E v a l u a t i o n Procedures These two procedures provide the user access to the i n t e r p r e t e r . EXPR a e v a l The e v a l procedure a l l o w s the user to e v a l u a t e an i n t e r n a l 2 . 5 . 1 2 . EVALUATION PROCEDURES BASIC PROCEDURES 45 representation of a program. The value of a i s treated as an i n t e r n a l representat ion of a program and i s evaluated. The r e s u l t i n g value i s returned. For example: . ( + 1 4 ) . eva l i s 5 ' 1 + 4 • eval i s 5 EXPR a applyto EXPR b The value of a must be an atom. The value of that atom i s returned. I f the atom i s a procedure atom, the arguments for the evaluation of the procedure body are given by the value of b. This value of b must be a c e l l l i s t of the arguments. The arguments are evaluated as they would be i f the procedure were c a l l e d normally. For example: , + . a p p l y t o . ( 2 3 ) . i s 5 2 .5 .13. Error Procedure EXPR a error The value of a i s pr inted as an error message and error mode i s entered. Refer to the sect ion descr ib ing error process ing. The s t r i n g printed i s returned i f the evaluation i s re s tar ted . 2.5.13. ERROR PROCEDURE BASIC PROCEDURES 46 2.5.14. System Procedures These four procedures provide the user some access to the i n t e r n a l information maintained by Tove. EXPR a status EXPR b The status procedure provides a means for the user to af fect the information pr inted by Tove during i t s processing. S p e c i f i c a l l y the user can tes t or set status switches. The value of b gives the status switch to be accessed and the value of a i s what the switch i s to be set to . The former value of the status switch i s returned. Current ly the status switches, the meaning of t h e i r values and t h e i r i n i t i a l values are: status 0 : 0 - don't p r in t the value of top l e v e l expressions 1 - p r i n t the value of top l e v e l forms i n i t i a l l y 1 status 1 : 0 - don't echo l i n e s read during a readfrom 1 - echo l i n e s read during a readfrom i n i t i a l l y 1 EXPR a type Type returns a number which i s an encoding of the type of the value of a. S p e c i f i c a l l y : 0 - undefined value 1 - number 2 - s t r i n g 2.5.14. SYSTEM PROCEDURES BASIC PROCEDURES 47 3 - c e l l 4 - t r a n s i e n t atom 5 - parameter atom 6 - permanent atom 7 - value procedure atom 8 - c o n t r o l procedure atom EXPR a p a r s e _ t a b l e EXPR b T h i s procedure r e t u r n s i n f o r m a t i o n about p a r s i n g . The value of b must be a number which determines the kind of i n f o r m a t i o n r e t u r n e d , and what the value of a i s used f o r : value of b value of a i n f o r m a t i o n returned 1 an atom one i f the atom has a l e f t parameter, e l s e zero 2 an atom the parse index (a number) of f u r t h e r p a r s i n g i n f o r m a t i o n f o r the atom. Zero i f there i s no f u r t h e r i n f o r m a t i o n an atom one i f the atom has a l e f t parameter, e l s e zero a parse index an i n t e r n a l code number r e p r e s e n t i n g the parse kind a parse index a number or a s t r i n g g i v i n g 2.5.14. SYSTEM PROCEDURES BASIC PROCEDURES 48 f u r t h e r parse i n f o r m a t i o n EXPR a i n t e r n a l I n t e r n a l r e t u r n s an i n t e r n a l r e p r e s e n t a t i o n of the value of a. I f i t i s a procedure atom, then the i n t e r n a l r e p r e s e n t a t i o n of the body of the procedure i s r e t u r n e d . I f i t i s a formal parameter of a c o n t r o l procedure, then the corresponding argument (unevaluated) i s r e t u r n e d . In a l l other cases the value of a i s r e t u r n e d . For example: I f peek i s d e f i n e d as: nproc peek EXPR a i s body endnproc and peed i s c a l l e d as: peek some_atom then i n the body of peek the e x p r e s s i o n : 'a* i n t e r n a l i s the atom some_atom but a i s the value of the atom some_atom 2.6. ERROR PROCESSING When an e r r o r i s detected by the Tove system, an a p p r o p r i a t e e r r o r message i s p r i n t e d . I f the user i s read i n g i n p u t from a "readfrom" f i l e , (see the d e s c r i p t i o n of the readfrom procedure) then Tove attempts to continue e x e c u t i o n . In a l l other cases the user i s placed i n e r r o r mode. The e r r o r mode processor changes the p r e f i x c h a r a c t e r to a plus s i g n (+), 2.6. ERROR PROCESSING EBHOR PROCESSING 49 and reads and i n t e r p r e t s simple commands from the user, A command c o n s i s t s only of the symbol which i s the name of the command and may be a b b r e v i a t e d . Any input a f t e r the command name and before the end of the l i n e i s ignored, A d e s c r i p t i o n of the p r e s e n t l y implemented commands f o l l o w s . Parse T h i s command causes the l a s t complete parse to be p r i n t e d , i n a r e p r e s e n t a t i o n of the i n t e r n a l format used f o r e x p r e s s i o n s . Symbols The symbol t a b l e i s p r i n t e d . Atoms The atom t a b l e i s p r i n t e d , Stack The stack used by the i n t e r p r e t e r f o r e v a l u a t i o n i s p r i n t e d . Dump Some i n t e r n a l t a b l e s are w r i t t e n to u n i t zero, which must be a s s i g n e d t o a s e q u e n t i a l f i l e . R e s t a r t Attempt to recover from the e r r o r . T h i s causes a r e t u r n to the computation which r e p o r t e d the e r r o r . I f recovery from an 2.6. ERROR PROCESSING ERROR PROCESSING 50 e r r o r i s attempted during p a r s i n g , then i n gen e r a l the user must r e - e n t e r the l a s t l i n e from the p o i n t of e r r o r . I f recovery i s attempted during e v a l u a t i o n a d e f a u l t value f o r the o f f e n d i n g computation i s u s u a l l y assumed. Top A r e t u r n i s made to the top l e v e l parse e v a l u a t e loop, abandoning the c u r r e n t parse or e v a l u a t i o n . Quit A r e t u r n i s made to HTS, i n such a way t h a t the Tove system can be r e s t a r t e d . i 2.6. ERROR PROCESSING 51 CHAPTER THREE SOME DETAILS OF THE IMPLEMENTATION In t h i s chapter some of the d e t a i l s of the present implementation of the language system are presented. P a r t i c u l a r emphasis i s given to those parts of the implementation which are thought to be of i n t e r e s t . The system i s implemented i n the SUE System Language by severa l major modules. The scanner converts the input characters into a token stream. The parser and t rans la tor are combined i n a s ing le module which converts the token stream into an i n t e r n a l representat ion r e f l e c t i n g the s tructure of the expressions, manages scope entry and e x i t , and processes dec lara t ions . The in terpre ter i s responsible for the evaluation of the i n t e r n a l representations and includes routines for each of the basic procedures. In addi t ion to these modules there are a number of miscellaneous routines such as the error processor and management rout ines for a l l of the various i n t e r n a l tab les . There are several of these tables and areas, each of which i s associated with some component of the language. These include the symbol tab le , atom tab le , parse tab le , eval stack, s t r i n g area, and the c e l l area. 3. SOME DETAILS OF THE IMPLEMENTATION THE SCANNER 52 3.1. THE SCANNER The scanner i s divided into two parts. The f i r s t - l e v e l scanner reads the character input and converts i t to a stream of numbers, strin g s , symbols, and comments. This part of the scanner uses well-known techniques and w i l l not be further described. The second-level scanner i s responsibel for skipping comments and for converting symbols into corresponding atoms. This conversion of symbols i s done using a l i n k i n the symbol table which indicates the atom i n the current scope which has the symbol as name. These l i n k s are managed by the parser-t r a n s l a t o r . The second-level scanner therefore produces a stream of tokens consisting of numbers, strings, and atoms. In the case of atoms, the scanner also returns the symbol which represented the atom. This i s done since the translator w i l l have to know the symbol which was found i f i t was in the context of a declaration or a delimiter. 3.2. THE PARSER AND TRANSLATOR 3.2.1. Parsing Method The parser i s implemented as a recursive parsing i n t e r p r e t e r . The parsing code which i t interprets i s an i n t e r n a l representation of the parsing statements. Each atom has associated with i t some parsing information which i s the representation of i t s syntax. When the parser i s c a l l e d i t i s passed a goal which 3.2.1. PARSING METHOD THE PARSER AND TRANSLATOR 53 i n d i c a t e s whether the parser should parse a simple e x p r e s s i o n or a complex one. A simple e x p r e s s i o n i s one which i s the s h o r t e s t v a l i d e x p r e s s i o n s t a r t i n g at the c u r r e n t token while a complex e x p r e s s i o n i s the longest such v a l i d e x p r e s s i o n . Simple e x p r e s s i o n s are rightmost arguments while complex e x p r e s s i o n s are i n t e r i o r arguments, as e x p l a i n e d i n the user's guide s e c t i o n on the p a r s i n g of e x p r e s s i o n s . I f the p a r s e r i s pa r s i n g a simple e x p r e s s i o n , then i t parses the c u r r e n t token and r e t u r n s the i n t e r n a l r e p r e s e n t a t i o n o b t a i n e d . The p a r s i n g of the c u r r e n t token proceeds by f i r s t examining the token. I f the token i s a number, a s t r i n g , or a n i l a r y atom (one with no parameters), then i t s i n t e r n a l r e p r e s e n t a t i o n i s r e t u r n e d . I f i t i s an atom with more complex syntax, then the pars e r i n t e r p r e t s the p a r s i n g code f o r that syntax, which w i l l parse a l l tokens i n v o l v e d i n the syntax of the atom. During t h i s p a r s i n g the i n t e r n a l r e p r e s e n t a t i o n of the c a l l of the atom i s constructed.and atom and procedure d e c l a r a t i o n s are processed. For an EXPR argument of the atom, the parser i s c a l l e d r e c u r s i v e l y to parse the e x p r e s s i o n . For PERMANENT_ATOM and TRANSIENT_ATOM arguments the next symbol i s read and i s made the name of a new atom. The symbol t a b l e entry f o r t h i s symbol i s ad j u s t e d to i n d i c a t e t h a t i t i s the name of the atom. For PROCEDDRE_DEFINITION arguments, the p a r s i n g statement which f o l l o w s the c u r r e n t token i s converted to p a r s i n g code and a s s o c i a t e d with the procedure atom being d e c l a r e d . T h i s procedure atom i s d e c l a r e d at the c u r r e n t scope, then a new 3.2 . 1 . PARSING METHOD THE PARSER AND TRANSLATOR 54 scope corresponding to the new procedure i s entered. The formal parameters of the new procedure are d e c l a r e d i n the new scope, then the parser i s c a l l e d to parse the body of the procedure. L a s t l y the new scope i s c l o s e d and the o r i g i n a l scope r e -a c t i v a t e d . SEXPR arguments are parsed by a simple r e c u r s i v e r o u t i n e which r e t u r n s the i n t e r n a l r e p r e s e n t a t i o n of the argument. For OPTIONAL arguments the next token i s examined to see i f i t i s the " o p t i o n a l symbol" s p e c i f i e d i n the syntax element f o r the argument. I f i t i s then the p a r s i n g i n t e r p r e t a t i o n c o n t i n u e s s e q u e n t i a l l y i n the p a r s i n g code. Otherwise a l l arguments up to the end of the o p t i o n a l part are set to the undefined value and p a r s i n g c o n t i n u e s a f t e r the end of the o p t i o n a l p a r t . LIST arguments r e q u i r e a l o o p i n g s t r u c t u r e i n the p a r s i n g code. Since a LIST argument must c o n s i s t of at l e a s t one occurrence of the sub-arguments corresponding to the " r i g h t syntax" p o r t i o n of the LIST syntax element, the parser parses t h a t s e c t i o n f i r s t . At the end of the p a r s i n g code f o r t h i s there i s code which causes the parser to check f o r the "s e p a r a t o r symbol" and the "stop symbol". I f the " s e p a r a t o r symbol" i s the next token then the scanner i s c a l l e d f o r the f o l l o w i n g token and the parser backs up i n the p a r s i n g code to again check f o r the " s e p a r a t o r symbol". I f the next token i s not the "s e p a r a t o r symbol" then i t i s compared with the "stop symbol". I f i t i s the "stop symbol" then the p a r s i n g continues with the syntax element f o l l o w i n g the LIST one. I f i t i s not 3.2 . 1 . PARSING METHOD THE PARSER AND TRANSLATOR 55 the "stop symbol" and at l e a s t one "separator symbol" was found, then the p a r s e r branches back to the p a r s i n g code f o r the " r i g h t syntax" of the LIST syntax element. I f no s e p a r a t o r symbol was found an e r r o r i s announced. For d e l i m i t e r syntax elements the c u r r e n t token i s checked to see i f i t i s the symbol s p e c i f i e d . I f i t i s not an e r r o r i s announced. The p a r s i n g of complex e x p r e s s i o n s r e q u i r e s t h a t the parser c o n t i n u e a f t e r p a r s i n g the c u r r e n t token. In t h i s case the pa r s e r examines the next token ( a f t e r the p a r s i n g of the c u r r e n t token has read a l l tokens a s s o c i a t e d with the c a l l of the c u r r e n t token) to see i f i t i s an atom which has a l e f t operand. I f i t i s , then the c u r r e n t parse i s the l e f t argument f o r t h i s atom and the process r e p e a t s by the p a r s i n g of t h i s atom. Otherwise the parser r e t u r n s the c u r r e n t parse. 3.2.2. I n t e r n a l R e p r e s e n t a t i o n Of Programs The t r a n s l a t o r which i s p a r t of the parser module produces a p r e f i x c e l l l i s t r e p r e s e n t a t i o n of the e x p r e s s i o n s which comprise programs. T h i s p r e f i x r e p r e s e n t a t i o n i s very s i m i l a r t o t h a t used by LISP. Any e x p r e s s i o n i s r e p r e s e n t e d by a l i s t of a c e l l f o r the procedure atom which i s c a l l e d f o l l o w e d by a c e l l f o r each of i t s f o r m a l parameters. /** needs mention of number and s t r i n g e x p r e s s i o n s */ I f an argument i s an e x p r e s s i o n which i s a c a l l t o a procedure with parameters and so w i l l not f i t i n a s i n g l e c e l l , i t i s r e p r e s e n t e d by another c e l l l i s t and the f o r m a l parameter's c e l l ' s value p a r t i s the f i r s t c e l l of 3.2.2. INTERNAL REPRESENTATION OF PROGRAMS THE PARSER AND TRANSLATOR 56 t h a t l i s t . I f an argument i s a LIST argument then i t i s rep r e s e n t e d by a c e l l l i s t which begins with a c e l l c o n t a i n i n g a s p e c i a l atom i n d i c a t i n g the LIST argument and a c e l l c o n t a i n i n g the number of sub-arguments i n each p a r t of the LIST. The r e s t of the c e l l l i s t i s a c e l l f o r each sub-argument i n the e n t i r e LIST argument. The c e l l s i n the argument l i s t of the c a l l e d procedure which correspond t o fo r m a l parameters of sub-arguments of a LIST argument a l l c o n t a i n the undefined value. For example: p a r s i n g language statement: s e l e c t LIST cases EXPR c a s e p a r t ENDLIST , e n d s e l e c t p o s s i b l e c a l l : s e l e c t 2 + 3, "b" || "", c e l l _ v a l u e <a e n d s e l e c t i n t e r n a l r e p r e s e n t a t i o n ( i n s-exp r e s s i o n form): ( s e l e c t ( _ l i s t _ 1 ( + 2 3 ) ( || "b" «" ) ( <a c e l l _ v a l u e ) ) _undef_ ) p a r s i n g language statement: s e l e c t LIST cases EXPR s e l e c t p a r t then EXPR c a s e p a r t ENDLIST , e l s e EXPR d e f a u l t e n d s e l e c t p o s s i b l e c a l l : s e l e c t a < b then a + b, a > b then a e l s e b e n d s e l e c t i n t e r n a l r e p r e s e n t a t i o n : ( s e l e c t ( _ l i s t _ 2 3.2.2. INTERNAL REPRESENTATION OF PROGRAMS THE PARSER AND TRANSLATOR 57 ( < a b ) ( • a b ) ( > a b ) a ) _undef_ _undef_ b ) 3.3. THE INTERPRETER The i n t e r p r e t e r i s a simple s t a c k - o r i e n t e d r e c u r s i v e e v a l u a t o r . I t i s given a value of the language and r e t u r n s i t s e v a l u a t i o n . Numbers and s t r i n g s e v a l u a t e to themselves, and so i f the i n t e r p r e t e r i s passed a number or a s t r i n g i t merely r e t u r n s i t . I f the i n t e r p r e t e r i s passed an atom, i t i s examined to see i f i t i s a permanent atom ( i n c l u d i n g procedure atoms) or a t r a n s i e n t atom ( i n c l u d i n g f ormal parameter atoms). I f the atom i s permanent then i t s value part i s e v a l u a t e d . I f t h i s value p a r t i s a value of the language, (that i s a number, s t r i n g , atom, or c e l l ) then i t i s r e t u r n e d as the value of the atom. T h i s value p a r t can a l s o be an i n d i c a t i o n of one of the b a s i c procedures or a programmer-defined procedure. In e i t h e r of these cases the i n t e r p r e t e r e v a l u a t e s the i n d i c a t e d procedure with no arguments. I f the procedure has parameters then an e r r o r i s announced, B a s i c procedures are evaluated by c a l l i n g the i n d i c a t e d b u i l t - i n r o u t i n e . Programmer-defined procedures are e v a l u a t e d by f i r s t s e t t i n g the c u r r e n t 3.3. THE INTERPRETER THE INTERPRETER 58 environment p o i n t e r and s a v i n g i t s o l d value on the stack, a l l o c a t i n g space on the stack f o r the t r a n s i e n t atoms of the procedure, and then e v a l u a t i n g the c e l l which i s the body of the procedure. A f t e r t h i s the environment and stack are r e s e t and the value found f o r the body i s r e t u r n e d . I f the i n t e r p r e t e r i s passed a t r a n s i e n t atom or a formal parameter atom, then the atom c o n t a i n s , i n p l a c e of i t s value p a r t , a number which i s i t s o f f s e t on the stack from the c u r r e n t environment p o i n t e r . A c c o r d i n g l y the value part of the atom i s obtained by combining the o f f s e t and the c u r r e n t environment p o i n t e r and r e t r i e v i n g the value from the stack. T h i s value i s then evaluated i n the same manner as the value part of a permanent atom. To e v a l u a t e a c e l l the i n t e r p r e t e r r e q u i r e s t h a t the c e l l be a v a l i d i n t e r n a l r e p r e s e n t a t i o n of an e x p r e s s i o n , as o u t l i n e d i n the s e c t i o n on i n t e r n a l r e p r e s e n t a t i o n s of programs. If the value p a r t of the c e l l i s a number or a s t r i n g , then t h a t number or s t r i n g i s r e t u r n e d . I f the value p a r t of the c e l l i s another c e l l , then an e r r o r i s announced as t h i s cannot occur i n a v a l i d i n t e r n a l r e p r e s e n t a t i o n of a program. I f the value p a r t i s an atom, then t h a t atom's value p a r t i s obtained and evaluated as o u t l i n e d f o r the e v a l u a t i o n of atoms. In t h i s case, however, i f the atom's value part i n d i c a t e s a procedure, then the procedure's arguments are the value p a r t s of the c e l l s which are the next p a r t of the o r i g i n a l c e l l . I f the procedure i s a value procedure, these value p a r t s are e v a l u a t e d and t h e i r values placed on the stack so t h a t they 3.3. THE INTERPRETER THE INTERPRETER 59 are the i n i t i a l v alues of the for m a l parameter atoms. T h i s i s done before the environment p o i n t e r i s changed when a programmer-defined procedure i s e v a l u a t e d , or before the b u i l t -i n r o u t i n e i s c a l l e d when a b a s i c procedure i s eva l u a t e d . I f the procedure to be invoked i s a c o n t r o l procedure, then the arguments are not eva l u a t e d . To ensure that: the c o r r e c t environment w i l l be used when an argument i s e v a l u a t e d i n a programmer_defined c o n t r o l procedure, those arguments t h a t are atoms or c e l l s are marked as c o n t r o l procedures. The e v a l u a t i o n of numbers, s t r i n g , and the undefined value cannot be a f f e c t e d by the environment, whenever the value part of an atom i n d i c a t e s t h a t i t i s a c o n t r o l argument, the e v a l u a t i o n of t h a t p a r t causes the c u r r e n t environment p o i n t e r to be stacked and the p r e v i o u s environment t o become the c u r r e n t one. The atom or c e l l which i s the argument i s then evaluated, the c u r r e n t environment i s r e s t o r e d , and the value found f o r the argument i s ret u r n e d , 3 , 3 , 1 , Block E x i t And Repeat The c o n t r o l s t r u c t u r e s c o n s i s t i n g of block e x i t and block repeat are much e a s i e r t o implement than the u n r e s t i c t e d "go t o " . When a l a b e l l e d block i s e n t e r e d , the l a b e l atom of the block i s given a value which i s a stack p o i n t e r i n d i c a t i n g the block r o u t i n e a c t i v a t i o n e n t r y . An e x i t which i s given the l a b e l as i t s argument pops the stack back to t h i s a c t i v a t i o n entry and causes the i n t e r p r e t e r to r e t u r n from the c a l l of the block. A repeat a l s o pops the stack back to t h i s a c t i v a t i o n 3 .3 .1 , BLOCK EXIT AND REPEAT THE INTERPRETER 60 entry but the causes the i n t e r p r e t e r to c a l l the block a g a i n . Thus the s t r u c t u r e d e x i t and repeat r e q u i r e l a b e l s to be only stack p o i n t e r s , not program counters as r e q u i r e d by the go to. An i n t e r e s t i n g consequence of the use of e x i t and repeat i n place of the go to i s t h a t the i n t e r p r e t e r has no need of the concept of a program counter, s i n c e only the go to r e q u i r e s t h i s concept. 3.4. TABLE MANAGEMENT The i n t e r n a l t a b l e s of the system are managed by s e v e r a l r o u t i n e s , some of which are s p e c i f i c to i n d i v i d u a l t a b l e s . A l l the t a b l e s can be dyn a m i c a l l y expanded by the system, except the symbol t a b l e . Whenever a t a b l e or area i s expanded, e r r o r mode i s f i r s t entered with an e r r o r message i n d i c a t i n g the t a b l e being expanded. T h i s a l l o w s the user to abandon the c u r r e n t computation and the expansion, or to continue. See the user's guide s e c t i o n on e r r o r mode p r o c e s s i n g . Each t a b l e i s accessed by a p o i n t e r to the begining of the t a b l e and the index of the entry d e s i r e d . T h e r e f o r e t a b l e expansion can be accomplished by o b t a i n i n g a space l a r g e r than the c u r r e n t s i z e of the t a b l e from the o p e r a t i n g system, copying the i n f o r m a t i o n from the o l d space to the new one, then changing the base p o i n t e r t c i n d i c a t e the new space. T h i s a l s o a l l o w s the t a b l e i n d i c e s t o be halfwords r a t h e r than f u l l w o r d s , r e s u l t i n g i n some economy of storage space. The symbol t a b l e i s not dynam i c a l l y expanded because i t i s accessed by a hashing r o u t i n e to look up symbols as they are 3 . 4 . TABLE MANAGEMENT TABLE MANAGEMENT 61 read by the scanner. I t c o u l d be expanded dynamically, but t h i s would r e q u i r e the re-hashing of a l l e n t r i e s , r a t h e r than copying, from the o l d space to a new one. The atom and parse t a b l e s are not g a r b a g e - c o l l e c t e d . When one of these runs out of space, i t i s expanded. The e v a l stack i s a l s o expanded when i t overflows. The s t r i n g area i s where the a c t u a l c h a r a c t e r s of s t r i n g s are s t o r e d , as w e l l as the c h a r a c t e r r e p r e s e n t a t i o n s of symbols (and t h e r e f o r e the names of atoms). In the t a b l e s and areas of the system a c h a r a c t e r s t r i n g i s represented by a length (between zero and 255) and the index i n t o the s t r i n g area of the f i r s t c h a r a c t e r of the s t r i n g . When the s t r i n g area runs out of space, a new s t r i n g area the same s i z e as the o l d one i s obtained from the o p e r a t i n g system and only a c t i v e s t r i n g s are copied from the o l d area to the new one,, The a c t i v e s t r i n g s are found by a garbage c o l l e c t o r ( a c t u a l l y a non-garbage c o l l e c t o r ) which scans a l l the t a b l e s which have s t r i n g area i n d i c e s i n them. The i n d i c e s are changed to i n d i c a t e the p o s i t i o n s of the c h a r a c t e r s t r i n g s i n the new s t r i n g area. The s u b s t r i n g i n g procedures of the language do not make c o p i e s of the c h a r a c t e r s t r i n g s they operate on, so t h a t they r e t u r n a s t r i n g l ength and an index which p o i n t s i n t o the same c h a r a c t e r s t r i n g as the s t r i n g index from which the s u b s t r i n g was taken. T h i s i m p l i e s t h a t the copying from the o l d s t r i n g area t o the new may i n c r e a s e the s i z e of the a c t i v e a r e a , so p r o v i s i o n must be made to expand the new area d u r i n g the copy. F o r t u n a t e l y t h i s i s not d i f f i c u l t and onl y r e q u i r e s the o b t a i n i n g of a "new new" s t r i n g 3.4. TABLE MANAGEMENT TABLE MANAGEMENT 62 area and the d i r e c t copy of the new area i n t o the new new area, s i n c e e v e r y t h i n g i n the new area i s a c t i v e . Another s o l u t i o n to t h i s problem i s th a t chosen by XPL [XPL], which a l s o does s u b s t r i n g s without copying. When the XPL s t r i n g area overflows, a l l the a c t i v e d e s c r i p t o r s are s o r t e d by the order of t h e i r i n d i c e s i n t o the s t r i n g area. The c o m p a c t i f i c a t i o n r o u t i n e can then copy an a c t i v e s t r i n g only once and a d j u s t a l l i n d i c e s i n t o i t c o r r e c t l y . The c e l l area i s maintained by a mark-sweep garbage c o l l e c t o r l i k e t h a t of most LISP implementations, using a f r e e l i s t and no compaction. However, s i n c e the parser, r o u t i n e s keep some c e l l i n d i c e s on the SUE stack t h a t the garbage c o l l e c t o r cannot a c c e s s , i t i s not always s a f e to garbage c o l l e c t . A c c o r d i n g l y , when the r o u t i n e r e s p o n s i b l e f o r a l l o c a t i n g c e l l s r e a l i z e s t h a t there are very few f r e e c e l l s l e f t , i t re q u e s t s a garbage c o l l e c t i o n . The c o l l e c t i o n i s not done u n t i l i t i s safe to do so. I f the c e l l area i s completely exhausted and another c e l l i s to be a l l o c a t e d , then the c e l l area i s expanded (and not garbage c o l l e c t e d ) , Of course the c e l l area i s a l s o expanded i f the garbage c o l l e c t o r does not f i n d enough f r e e c e l l s . 3.4. TABLE MANAGEMENT 63 CHAPTER FOUR CONCLUSIONS This chapter presents some conclusions about the design and implementation of the Tove system: i t s successes and f a i l u r e s , how i t can be improved, and some thoughts for future work. 4.1. PROBLEMS AND POSSIBLE SOLUTIONS It i s to be hoped that the i d e n t i f i c a t i o n of some of the mistakes of t h i s system which fol lows w i l l enable others to avoid these p i t f a l l s . 4.1.1. The Scanner The most obvious flaw i n Tove and the most annoying to the user i s the crude methods used by the scanner to break character s t r ing s in to tokens. S p e c i f i c a l l y the rules regarding the formation of symbols are p a r t i c u l a r l y simple-minded, often requi r ing the user to in se r t blanks i n places he may not expect, e spec i a l ly i f he i s an experienced programmer. E s s e n t i a l l y the rules used do not correspond to common prac t i ce , leading to confusions and a lack of r e a d a b i l i t y . One might argue that the programmer could learn to l i v e with these r e s t r i c t i o n s and accept them as a pr ice to be paid for f l e x i b i l i t y , but such arguments sound too much l i k e those of the bad t a i l o r described by Weinberg. The t a i l o r produced m i s - f i t t i n g su i t s but was ca re fu l to in s t ruc t his c l i e n t s on how they could hunch over and 4.1.1. T H E SCANNER PROBLEMS AND POSSIBLE SOLUTIONS 64 limp to make the s u i t s look as though they were c o r r e c t l y t a i l o r e d . The d i f f i c u l t y of the scanning problem i s t h a t more s o p h i s t i c a t e d r u l e s f o r s y l l a b i c a t i o n (breaking the c h a r a c t e r stream i n t o tokens) that would be more n a t u r a l , would a l s o be more complicated. Probably the best s o l u t i o n i s to d i v i d e the s p e c i a l symbols i n t o d i f f e r e n t c l a s s e s and s t a t e the s y l l a b i c a t i o n r u l e s i n terms of these c l a s s e s . For example, the parentheses should always be t r e a t e d as separate symbols s i n c e t h i s corresponds to common usage. 4. 1.2. The Parser And T r a n s l a t o r Another problem which causes severe d i f f i c u l t i e s i n r e a d i n g the language i s the n e c e s s i t y to process s y n t a c t i c d e c l a r a t i o n s ( p a r s i n g statements) while p a r s i n g the program. T h i s must be done by both the parser of the system and anyone attempting to read the language. without t h i s one cannot t e l l d e l i m i t e r s from b i n a r y , p r e f i x , s u f f i x , or n i l a r y o p e r a t o r s , I f unconventional syntax i s used i n a program, the program can be incomprehensible to a reader. E s s e n t i a l l y the problem i s t h a t too much of the t r a n s l a t o r ' s work must be performed during the p a r s i n g . A p o s s i b l e s o l u t i o n t o t h i s problem which should a l s o a i d the s y l l a b i c a t i o n problem, i s to use typographic conventions to d i s t i n g u i s h symbols i n the v a r i o u s s y n t a c t i c c a t e g o r i e s . For example, a convention c o u l d d i s t i n g u i s h symbols which are d e l i m i t e r s from those t h a t are o p e r a t o r s by i n s i s t i n g t h a t d e l i m i t e r s be a s e r i e s of zero or more a l p h a b e t i c c h a r a c t e r s 4 .1.2. THE PARSER AND TRANSLATOR PROBLEMS AND POSSIBLE SOLUTIONS 65 followed by a co lon , comma, or semicolon. Parentheses could always be used as c o r r a l de l imi te r s , that i s , they could be used to enclose any r i g h t syntax which i s more complex that a simple expression. S i m i l a r l y one could require that symbols composed of s p e c i a l characters be used for binary or unary operators and alphabetic symbols be used only for n i l a r y operators. E s s e n t i a l l y these requirements are that the name i t s e l f of an object describes the syntac t ic propert ies of that object . Indeed i t i s f e l t that with some such typographic conventions i t should be possible to produce a parser which i s capable of parsing programs without concurrently processing syntac t i c dec larat ions . This i s h ighly des irable because i t would imply better readab i l ib ty and also a cleaner s tructure for the system. I t would also mean that a multi-pass system would be pos s ib le , in which the declarat ion of a procedure need not precede i t s use. This type of system i s considered to be very useful and convenient for top-down programming techniques and would a lso enable the implementation of a program developement f a c i l i t y such as that presented by DuMont. [DuMo] Another weakness of the current parser i s i t s dismal e r ror detect ion and recovery. I t w i l l announce a syntac t i c error without reading beyond the f i r s t symbol i n e r r o r , but i t may return from some recurs ive invocat ions af ter looking at the symbol i n error and before announcing the e r ror . This s i tua t ion i s s i m i l a r to that of some bottom-up parsers which do some reductions before announcing a syntax e r r o r . I t i s caused, i n part , by the incomplete knowledge of v a l i d lookahead symbols 4.1.2. THE PARSER AND TRANSLATOR PROBLEMS AND POSSIBLE SOLUTIONS 66 which i s i n turn d i r e c t l y a t t r i b u t a b l e t o the incremental nature of the processing of the p a r s i n g language — each p a r s i n g language statement i s t r a n s l a t e d and i n t e r p r e t e d independently of o t h e r p a r s i n g language statements. In i n t e r p r e t i n g a r i g h t -most e x p r e s s i o n , the p a r s e r does not know i f a lookahead symbol i s i n v a l i d or p a r t of some other p a r s i n g statement. In t h i s case the parser must r e t u r n to i t s c a l l e r and, p o s s i b l y a f t e r other r e t u r n s , the lookahead symbol w i l l be accepted or r e j e c t e d . The d i f f i c u l t y i s t h a t i f i t i s r e j e c t e d , one would l i k e to r e s t a r t the p a r s i n g with a new lookahead symbol, but i n the i n c a r n a t i o n of the p a r s e r t h a t f i r s t saw the e r r a n t symbol, so t h a t the p a r s i n g w i l l continue as though the symbol i n e r r o r was never t h e r e . One way to achieve t h i s i s to maintain the p o s i t i o n of the top of the p a r s i n g stack a t the time of the l a s t read of a symbol. I f an e r r o r i s found i n the lookahead symbol, then i t should be p o s s i b l e to r e s t o r e the p a r s i n g stack and hence the s t a t e of the parser to t h a t of the time of the l a s t read of a symbol, thus r e v e r t i n g t o the d e s i r e d i n c a r n a t i o n of the p a r s e r . C l e a r l y t h e r e i s a need f o r more work to be done on s y n t a c t i c e r r o r r e c o v e r y . 4 .1.2. THE PARSER AND TRANSLATOR PROBLEMS AND POSSIBLE SOLUTIONS 6 7 4.1.3. The I n t e r p r e t e r L a r g e l y due to the c o n s i d e r a b l e r e l i a n c e on previous semantic designs, as o u t l i n e d i n the i n t r o d u c t i o n , the i n t e r p r e t e r has very few problems. One of the remaining problems i s an o v e r s i g h t r e l a t i n g t o the a b i l i t y to a s s i g n v a l u e s t o atoms. As c u r r e n t l y c o n s t i t u t e d , the system i s not able to c o r r e c t l y a s s i g n a value t o a t r a n s i e n t atom which i s passed i n t o a procedure, s i n c e the assignment procedure does not perform the assignment i n the c o r r e c t environment. T h i s c o u l d be c o r r e c t e d by making the assignment procedure check the value i t i s about t o change. I f t h a t value i s a c o n t r o l procedure argument which i s an atom, then the assignment should proceed to t h a t atom's value i n the previous environment, and continue i n t h i s manner u n t i l i t reached a value which was not an atomic c o n t r o l argument. T h i s would enable the w r i t i n g of such c o n t r o l s t r u c t u r e s as f o r - l o o p s which increment or change a v a r i a b l e t h a t they are passed. 4 . 2 . SOME SUCCESSES Despite the problems i n the implementation of the c u r r e n t system, the use of a pa r s i n g language to i n c r e m e n t a l l y d e s c r i b e the syntax of the language i s f e l t t o be q u i t e s u c c e s s f u l . Much p r o f i t a b l e work has been done on the study of syntax d e s c r i b e d by the BNF n o t a t i o n s and i t i s f e l t t h a t f u t u r e work i n the study of other syntax d e s c r i p t i o n languages w i l l prove p r o f i t a b l e . The semantic d e s i g n i s c o n s i d e r e d to be l a r g e l y s u c c e s s f u l , 4.2.,SOME SUCCESSES SOME SUCCESSES 68 p a r t i c u l a r l y the d e c i s i o n to use block e x i t and repeat and the c o n t r o l procedures. The s i m p l i c i t y of implementation of the block e x i t and r e p e a t , combined with t h e i r encouragement of c l e a r e r programs are powerful arguments f o r t h e i r replacement of the "go t o " . The i n c l u s i o n of c o n t r o l procedures i s thought to be a very s u c c e s s f u l method f o r i n c o r p o r a t i n g c o n t r o l s t r u c t u r e s i n t o the p r o c e d u r a l o r g a n i s a t i o n of the system. On the s u b j e c t of complete c o m p i l a t i o n i n t o c o n v e n t i o n a l machine code, i t i s f e l t t h a t o n l y the e v a l u a t i o n procedures (eval and a p p l y ) , the system procedures, and some of the i n p u t procedures r e l y on the i n t e r p r e t i v e nature of the implementation. There are some worries about the a b i l i t y to e f f i c i e n t l y compile c o n t r o l procedures i n t o more c o n v e n t i o n a l machine code, but i f the c o n t r o l procedures are non-recursive and s m a l l , i t should be p o s s i b l e to very e f f i c i e n t l y compile them " i n - l i n e " , t h a t i s i n t o the same machine code u n i t as t h e i r c a l l s , with one copy of the c o n t r o l procedure code f o r each c a l l of the c o n t r o l procedure. T h i s i s the same method t h a t i s used i n c o n v e n t i o n a l a l g o r i t h m i c languages to compile t h e i r b u i l t - i n c o n t r o l s t r u c t u r e s . I f a c o n t r o l procedure does not meet these c r i t e r i a , methods s i m i l a r to those used f o r A l g o l "call-by-name" parameters can be used to compile the procedure. C e r t a i n l y there i s nothing i n the s y n t a c t i c design of the language that p r e c l u d e s an i n c o r p o r a t i o n of d e c l a r a t i o n s of the types of v a r i a b l e s and procedures and compile-time type-checking. T h i s i m p l i e s t h a t a s i m i l a r language would be amicable to complete c o m p i l a t i o n i n t o machine code. 4.2. SOME SUCCESSES 69 BIBLIOGRAPHY [ALim] R a n d e l l , B., and L, S. R u s s e l l , ALGOL 60 IffiElSSSrit a t i o n , Academic Press, 1964. [ AL60 ] Naur, P. (ed) Revised Rejaort on the A lcjorithmic Language ALGOL 60, Communications of the ACM, Volume 6~, Number "T7~96 37 [AL68] van Wijngaarden, A, (ed), B. J . M a i l l o u x , J . E. L. Peck, and C. H. A. Koster, Rejjort on the A l g o r i t h m i c Language ALGOL 68, Numerische Mathematik, Volume 14,~1969.~ [BCPL] Richards, M., The BCPL Reference Manual, P r o j e c t Mac Memorandum M-352-1, Massachusetts I n s t i t u t e of Technology, February 1968. [DuMo ] DuMont, M., A Program Development F a c i l i t y , M. Sc. T h e s i s , Department of Computer Scie n c e , U n i v e r s i t y of B r i t i s h Columbia, September 1974. [FORT] H e i s i n g , H. P., Report of American Standards A s s o c i a t i o n , ASA Committee X3, Communications of the ACM,"volume 7, 19647 BIBLIOGRAPHY 70 [ L I S P ] McCarthy, J . , Recursive F u n c t i o n s of Symbolic Expressions and t h e i r C a l c u l a t i o n by_ Machine, Part I, Communications of the ACM, Volume 3, Number 4, A p r i l 1960. , P. W. Abrahams, D. J . Edwards, • T. P. Hart, and M. I . L e v i n , LISP J . 5 Programmer 1§ Manual, MIT Press, Cambridge, 1965. [Pasc ] B i r t h , N. The Programming Language PASCAL, Acta I n f o r m a t i c a , Volume 1, 1971. [ P r a t ] P r a t t , V. R. , Top Down Operator Precedence, Conference Record of the ACM Symposium on P r i n c i p l e s of Programming Languages, Boston, Massachusetts, October 1973. [ Rame ] Ramer, D. R., C o n s t r u c t i o n of LR (k) P a r s e r s with A££lication t o ALGOL 68, M. Sc. T h e s i s , Department of Computer Science, U n i v e r s i t y of B r i t i s h Columbia, August 1973. [Sue] C l a r k , B. L., The SUE System Language User's Guide ( l®lis§d), Department of Computer Science, U n i v e r s i t y o f ~ B r i t i s h Columbia, A p r i l 1974. Atwood, J . W. (ed), P r o j e c t SUE St a t u s Report, T e c h n i c a l Report CSRG 11, Computer Systems Research Group, U n i v e r s i t y of Toronto, A p r i l 1972. [XPL] McKeeman, W. M., J . J . Horning, and D. B. Wortman, A Compiler Generator, P r e n t i c e - H a l l , 1970. BIBLIOGRAPHY 71 APPENDIX I SAMPLE RUN OF TOVE The f o l l o w i n g i s a sample run of Tove, demonstrating some of i t s f e a t u r e s . * R COMPILER T=2 PAR=HIGH_WATER STACK=3P * READFROM "EXAMPLES" * * STR "EXAMPLES" * % SAMPLE PROCEDURES * * % SOME COMPARISON PROCEDURES * * PROC EXPR A > EXPR B IS * A :: B 5 4 ENDPROC * * ATOM 61 > * * PROC EXPR A < EXPR B IS * A :: B & 1 ENDPROC * * ATOM 64 < * * PROC EXPR A = EXPR B IS * A :: B S 2 ENDPROC * * ATOM 67 = * PROC EXPR A >= EXPR B IS * A :: B & 6 ENDPROC * * ATOM 70 >= * •PROC EXPR A <= EXPR B IS * ATOM 73 <= * * % FACTORIAL PROCEDURE * % USES RECURSION * * PROC ! EXPR N IS * IF N <= 0 * THEN 1 * ELSE N * ! (N-1) * ENDIF * ENDPROC I. SAMPLE RUN OF TOVE 72 * ATOM 76 ! * * % TOWERS OF HANOI PROCEDURE * * PROC HANOI ( EXPR N , EXPR S , EXPR I , EXPR D ) IS * IF N <= 0 THEN 0 RETURN * ELSE HANOI ( N-1,S,D,I) ; * "MOVE "|| (N TOSTRING) ||" FROM "|| S ||" TO " * || D PRINT; * NEWLINE; * HANOI (N-1,I,S,D) * ENDIF * ENDPROC # * ATOM 78 HANOI * * % S-EXPRESSION PRINTER. * % — TAKES ANY VALUE AND PRINTS IT IN S-EXPRESSION FORM. * * PROC EXPR VAL PUT_SEXPR IS * ~ * PERM CELL TYPE; * 3 -> CELL~TYPE; % CODE INDICATING VALUE IS A CELL * ~ * % PROCEDURE TO PUT BLANKS INTO OUTPUT BUFFER * % USED TO INDENT * PROC PUT_BLANKS EXPR HOW MANY IS * (:PUT_LOOP: * IF HOW MANY > 0 * THEN -" " PRINT; * HOW_MANY - 1 -> HOW_MANY; * REPEAT PUT_LOOP * ELSE "" * ENDIF * :PUT_LOOP:) * ENDPROC; # * % PROCEDURE TO DETERMINE IF A CELL LIST * 35 HAS A SUBLIST * % RETURNS 1 IF SO, OTHERWISE 0 * * PROC HAS_SUBLIST EXPR VAL IS * % ~NOTE THAT VAL IN THIS PROCEDURE IS * % I S A DIFFERENT ATOM THAN THE VAL * % IN PUT_SEXPR WHICH IS AGAIN DIFFERENT * % FROM THE VAL IN SEXPR_OUT * {: LIST LOOP : * IF VAL % IS NOT NIL CELL * THEN IF VAL <3 TYPE = CELL_TYPE * THEN 1 * ELSE VAL a> -> VAL; * REPEAT LIST LOOP I. SAMPLE RUN OF TOVE 73 * ENDIF * ELSE 0 * ENDIF * :LIST LOOP:) * ENDPROC; * * 35 AMOUNT TO INDENT FOR EACH LEVEL * * PERM INDENT_INCREMENT; * 4 -> INDENT~INCREMENT; * ~ * % RECURSIVE PROCEDURE TO PRINT THE VALUE * * PROC EXPR VAL SEXPR_OUT EXPR INDENT IS + EXPANDING CELL AREA * IF VAL TYPE = CELL TYPE * THEN « (« PRINT; * TRAN COMPLEX; * HAS_SUBLIST VAL -> COMPLEX; * (:OUT_CELLS: * IF VAL % IS NOT THE NIL CELL * THEN VAL O SEXPR OUT (INDENT + INDENT_INCREMENT) ; * % PRINT THE VALUE OF THE CELL * VAL a> -> VAL; 35 STEP TO NEXT CELL * IF VAL % IS NOT THE NIL CELL * THEN IF COMPLEX * THEN NEWLINE; * PUT_BLANKS INDENT; * ELSE " " PRINT * ENDIF; * REPEAT OUT CELLS * ELSE % DO NOTHING * ENDIF * ELSE 35 DO NOTHING * ENDIF * :OUT CELLS:) ; * ")« PRINT; * ELSE VAL PRINT 35 PRINT THE SIMPLE VALUE * ENDIF * ENDPROC; * 35 DO THE PRINTING * * NEWLINE; * VAL SEXPR OUT 4; * NEWLINE * * ENDPROC * * ATOM 83 PUT_SEXPR * ~ * % CONTROL PROCEDURE FOR LISP-STYLE COND * * NPROC COND LIST CONDPARTS EXPR TEST THEN EXPR BODY I. SAMPLE RUN OF TOVE 74 * ENDLIST , ELSE EXPR DEFAULT ENDCOND * IS * (: COND_BLOCK : ! * CONDPARTS EACHPART * IF TEST * THEN BODY EXIT COND BLOCK * ELSE 0 * ENDIF * ENDEACHPART ; * DEFAULT * : COND_BLOCK :) * ENDNPROC * * ATOM 98 COND * * % CONTROL PROCEDURE FOR IF * % WITH POSSIBLY OMITTED ELSE PART * * NPROC WHEN EXPR TEST * THEN EXPR THEN PART + EXPANDING PARSE TABLE~ * OPTIONAL ELSE EXPR ELSE__P ART ENDOPT ION AL * ENDWHEN * IS *, IF TEST * THEN THEN_PART * ELSE ELSE~PART * ENDIF * ENDNPROC * * ATOM 104 WHEN * WHEN 1 THEN "TRUE" ENDWHEN * STR "TRUE" * * % SAMPLE CALLS OF FACTORIAL * !1 * * INT 1 * !2 * * INT 2 * !4 * * INT 24 * !7 * * INT 5040 * !8 + RESULT OF * IS GREATER THAN 32767 + RES * I. SAMPLE RUN OF TOVE 75 * INT 32767 * % NOTE DEFAULT VALUE ASSUMED AFTER OVERFLOW * * % SAMPLE CALLS OF HANOI * HANOI (2 , "S" f " I " , "D") * HOVE 1 FROM S TO I * MOVE 2 FROM S TO D * MOVE 1 FROM I TO D * * INT 0 * HANOI ( 3 , "SOURCE" , "INTERMEDIATE" , "DESTINATION" ) * MOVE 1 FROM SOURCE TO DESTINATION * MOVE 2 FROM SOURCE TO INTERMEDIATE * MOVE 1 FROM DESTINATION TO INTERMEDIATE * MOVE 3 FROM SOURCE TO DESTINATION * MOVE 1 FROM INTERMEDIATE TO SOURCE * MOVE 2 FROM INTERMEDIATE TO DESTINATION * MOVE 1 FROM SOURCE TO DESTINATION * * INT 0 * * % SAMPLE CALLS OF THE S-EXPRESSION PRINTER * ( . ( A B C ) . PDT_SEXPR) * ~ * ( A B C ) * * STR "" * ( . ( A ( B ) C ) . PUT_SEXPR) * * (A * (B) * C) * STR "" * ( . HANOI . INTERNAL 3> PUT_SEXPR ) * % TO PRINT THE BODY OF HANOI * * (IF * (<= N 0) * (RETURN 0) * ( ( * (HANOI * (- N 1) * S * D * I) * (PRINT * (II * (I I * (II * (I I * (I I * MOVE * (TOSTRING N)) I. SAMPLE RUN OF TOVE 76 * FROM ) * S) * TO ) * D)) * NEWLINE * (HANOI * (- N 1) * I * S * D))) * * STR " " * * % SAMPLE CALLS OF COND * COND * 1 THEN "ONE" PRINT, ' * 0 PRINT THEN "TWO" PRINT * ELSE "ELSE" PRINT * EN DCOND * ONE * STR "ONE" * COND i * " » THEN "ONE" PRINT, * 1 PRINT THEN "TWO" PRINT * ELSE "ELSE" PRINT * ENDCOND * 1TWO * STR "TWO" * COND * " " THEN "ONE" PRINT, * 0 PRINT THEN "TWO" PRINT, % NOTE EXTRA COMMA * ELSE "ELSE" PRINT * ENDCOND * OELSE * STR "ELSE" * % CALLS TO ILLUSTRATE THE INDEPENDENCE OF ATOMS... * PERM DEFAULT % SAME NAME AS LAST PARAMETER OF COND * * ATOM 111 DEFAULT * (1-> DEFAULT) % SET THE ATOM * * INT 1 * COND * DEFAULT THEN "ONE" * ELSE 0 * ENDCOND * * STR "ONE" * STOP * STACK HIGH WATER MARK - ALLOCATED: 12288 USED: 11402 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051771/manifest

Comment

Related Items