UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Defining semantics with attribute grammars Rushworth, Thomas Bryan 1978

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1978_A6_7 R88.pdf [ 2.28MB ]
Metadata
JSON: 831-1.0051781.json
JSON-LD: 831-1.0051781-ld.json
RDF/XML (Pretty): 831-1.0051781-rdf.xml
RDF/JSON: 831-1.0051781-rdf.json
Turtle: 831-1.0051781-turtle.txt
N-Triples: 831-1.0051781-rdf-ntriples.txt
Original Record: 831-1.0051781-source.json
Full Text
831-1.0051781-fulltext.txt
Citation
831-1.0051781.ris

Full Text

DEFINING SEMANTICS WITH ATTRIBUTE GRAMMARS by THOMAS BRYAN RUSH WORTH B.Sc. University of B r i t i s h Columbia (1974> A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE FACULTY OF GRADUATE STUDIES Department of Computer Science We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMEIA May, 1978 (c) Thomas Bryan Rushworth, 1978 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my writ ten pe mn i ss ion. Department of Computer Science The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date August 2, 1978 ABSTRACT This thesis examines the semantic d e f i n i t i o n of a programming language by a form of attribute grammar for ease of understanding. The attr i b u t e s are expressed in a simple macro language and when evaluated produce code for an abstract machine. Part of an actual d e f i n i t i o n i s looked at and found to be too obscure to be useful. The reasons for the obscurity are i d e n t i f i e d and suggestions are made for eliminating them. i i i Table of Contents I. INTRODUCTION 1 II. APPROACH 4 II I . PARTICULARS 10 1. ALGOL 68 V Program Trees 10 2. TO SI 13 3. The Algol Engine 17 IV. TOSI SEMANTICS FOR ALGOL 68 24 1. The PSO Table 26 2. The Algol Engine Code 31 V. CONCLUSIONS 37 BIBLIOGRAPHY 45 Appendix A 47 AKNOWLEDGEMENTS I would l i k e to thank my advisors, John Peck and Harvey Abramson for t h e i r extraordinary patience i n waiting for t h i s thesis as well as for t h e i r suggestions and encouragement. I would also l i k e to thank Ted Venema for his help with the implementation and f o r many hours of discussion. 1 I. INTRODUCTION Formal methods of syntax d e f i n i t i o n are the b a s i s of the majority of compilers w r i t t e n today. However, formal methods of d e f i n i n g language semantics are used p r i m a r i l y as toys ( i f they are used at a l l ) . Two reasons f o r t h i s a r e t h a t formal semantics are not as well understood as formal syntax (and t h e r e f o r e more d i f f i c u l t to use) and there i s more than one use f o r a semantic d e f i n i t i o n . A s y n t a c t i c d e f i n i t i o n i s used to accept or r e j e c t a given s t r i n g as l e g a l i n the language being compiled and to give some a n a l y s i s of an accepted s t r i n g . A semantic d e f i n i t i o n might be used f o r producing proofs of program c o r r e c t n e s s , f o r producing programs a u t o m a t i c a l l y , f o r a l l o w i n g e x t e n s i v e o p t i m i z a t i o n or t o make the f i n a l t r a n s l a t i o n t o machine code (or i n t e r p r e t a t i o n ) easy. As well as many d i f f e r e n t uses f o r a semantic d e f i n i t i o n t h e r e are many d i f f e r e n t approaches. There a r e d e s c r i p t i v e ones (such as Hoare's axiomatic approach [Hoare] or S c o t t ' s d e n o t a t i o n a l one [ S c o t t ] ) and t h e r e are computational ones (such as VDL [VDL] o r a t t r i b u t e grammars [Knuth]). The d e s c r i p t i v e approaches, having t h e i r b a s i s i n p r e d i c a t e l o g i c or mathematics, are more a b s t r a c t and most u s e f u l i n a u t o m a t i c a l l y producing programs and c o r r e c t n e s s proofs. The computational 2 ones are more suitable for implementation as interpreters or t r a n s l a t o r s ( i . e . the semantic part of a compiler) since they give some idea of the computational process to be followed. Regardless of the approach or use of the semantic d e f i n i t i o n being considered, in the end there i s a person who has to write and/or understand i t . Thus tha primary consideration in evaluating a semantic d e f i n i t i o n should be how easily i t i s understood. A secondary consideration i s how useful i t i s for i t s intended application. This th e s i s looks at a particular approach to and use for a formal semantic d e f i n i t i o n to see what parts of i t are hard to comprehend and why. The approach examined i n this thesis i s a computational one similar to an a t t r i b u t e grammar as described by Knuth [Knuth] or Abramson [Abramson]. In both of those approaches, several at t r i b u t e s are attached to each node of a context-free parse tree representing the source program. The attributes in t e r a c t with attributes attached to adjacent nodes and their combined meaning gives the semantics of the source program. In Knuth's approach the attributes are simple values, calculated according to some formula from the att r i b u t e s (values) of adjacent nodes. Abramson replaces these values with parts of small programs or functions (written in some simple language whose semantics are 3 understood) . The functions associated with one node refer to functions associated with adjacent nodes by means of "t r a v e r s a l primitives". (A traversal primitive, given i and j , transfers control to the i t h program or function associated with the j t h adjacent node.) The approach examined here uses ths same sort of a t t r i b u t e s and tr a v e r s a l primitives as Abramson but an optimized and p a r t i a l l y processed parse tree c a l l e d a "program tree" i s used to give structure instead of a context-free parse tree. The intended use of the semantic d e f i n i t i o n was code generation and part of a code generator was a c t u a l l y implemented using t h i s approach in order to evaluate i t s usefulness. The rest of this thesis i s divided i n t o four sections: a discussion of the design decisions for the system implemented, a description of the i n d i v i d u a l parts of the system, an explanation of part of the actual macro defined semantics and f i n a l l y an evaluation of the c l a r i t y and p r a c t i c a l i t y of t h i s method of semantic d e f i n i t i o n . 4 I I . APPROACH There are t h r e e major choices to be made i n g i v i n g a semantic d e f i n i t i o n f o r a language. One i s to choose the language f o r which semantics are d e s i r e d (or i n other words, t o choose a source language). Another c h o i c e i s t h a t of the b a s i s or t a r g e t of the d e f i n i t i o n - the p r e v i o u s l y understood concepts or a c t i o n s that the source language i s being d e f i n e d i n terms of (e.g. machine i n s t r u c t i o n s or l o g i c a l formulas). The l a s t i s how to express the semantics themselves and r e l a t e them t o the source language. The reasons f o r choosing or desig n i n g each of the above f o r the implemented system are d i s c u s s e d i n t h i s s e c t i o n while d e t a i l e d d e s c r i p t i o n s of the v a r i o u s p a r t s are l e f t u n t i l the next s e c t i o n . Programs are w r i t t e n i n a simple macro language c a l l e d TOSI [TOSI] (based on TRAC [ TRAC ]) and are a s s o c i a t e d with the nodes of program t r e e s f o r ALGOL 6 8 1 source programs. When evaluated, the TOSI programs produce code f o r an a b s t r a c t machine c a l l e d the A l g o l Engine. 1 Much of the terminology used i n t h i s t h e s i s i s n e c e s s a r i l y r e l a t e d to ALGOL 68 and so the reader i s assumed to be f a m i l i a r with the A.LGOL 68 terminology used i n the Revised Report [RR]. 5 The method of producing code by evaluating simple programs attached to the nodes of a tree i s a s l i g h t departure from attribute grammars as defined by Knuth [Knuth], but i t has been shown to be successful for small examples by Abramson [Abramson]. (Attribute grammars as o r i g i n a l l y defined by Knuth would be more suitable for interpreters than f o r compilers since they return values instead of performing actions.) The further modification of attribute grammars, based on using program trees instead of parse . trees to give structure, was intended to minimize the s i z e of the d e f i n i t i o n required for a r e a l language. An example of t h i s in ALGOL 68 i s the extra node that would be involved in parsing begin-serial clause end as a closed clause. A l l that i s i n t e r e s t i n g from the code generation point of view i s that the closed clause contains a s e r i a l clause and so the a t t r i b u t e s for a closed clause node would contribute nothing. ALGOL 68 was used as the source language because there was an existing compiler available, complete except f o r parts of the code generation [Earner] [Thomson] [A68V] and because the semantics have been well s p e c i f i e d in the Revised Report [RR]. The existing compiler did much of the semantics for some of the features of ALGOL 68 l i k e mode processing, coercion and operator precedence as well as checking syntax (including i d e n t i f i c a t i o n of indicators) . 6 The primary reason for using an abstract machine (the Algol Engine) as the target for code generation was so those parts of a program that do not belong i n a formal semantic d e f i n i t i o n (e.g. register allocation) could be handled independently. Other reasons for using an abstract machine are potential p o r t a b i l i t y and reduction of the siz e of the semantic d e f i n i t i o n being given. If the source language compiler i s written i n a portable language then the entire compiler can be moved by moving the abstract machine instead of rewriting the code generation pass of the compiler. Unfortunately the ALGOL 68 V compiler was written in PL360 which i s not portable. Eliminating the concern over registers and such d e t a i l s reduces the size of a semantic d e f i n i t i o n (unless the abstract machine has re g i s t e r s of i t s own) but i t can be further reduced by an application of "divide and conquer". If the abstract machine i s s u f f i c i e n t l y high l e v e l , then i t s semantics account for a portion of the semantics of the source language. Of course, i f i t i s too high l e v e l then defining the semantics of the abstract machine i t s e l f i s as d i f f i c u l t as the o r i g i n a l problem and nothing i s gained. (In fact things just get more confused.) An example i s the procedure c a l l . ALGOL 68 procedure c a l l involves parameters, co n t r o l flow and r e s u l t s . 7 The A l g o l Engine i n s t r u c t i o n f o r an ALGOL 68 procedure c a l l c o u ld have ranged from high l e v e l (a s i n g l e i n s t r u c t i o n f o r pa s s i n g the parameters, branching t o the procedure body and s e t t i n g up the return) to low l e v e l (with separate i n s t r u c t i o n s to move the parameters, s t o r e the re t u r n address, a d j u s t the stack, branch to the body and so on). What the A l g o l Engine does provide i s a set of i n s t r u c t i o n s t h a t d i v i d e s the procedure c a l l semantics i n t o two pa r t s , the order of the o p e r a t i o n s and the o p e r a t i o n s themselves. Thus the A l g o l Engine i n s t r u c t i o n s DEPROCEDURE and RETURN handle branching and r e t u r n addresses but the e v a l u a t i o n o f the TOSI programs p o s i t i o n them i n the A l g o l Engine i n s t r u c t i o n stream. TOSI was designed and used to express the semantics as the best compromise i n a co m p l e x i t y / e x p r e s s i v e n e s s t r a d e - o f f . For ease of understanding the language f o r the semantics had to be simple, i n the sense t h a t both the number of concepts i n v o l v e d was s m a l l and nothing was "hidden". On the other hand the language had t o be e x p r e s s i v e enough so th a t programs w r i t t e n i n i t would not obscure t h e i r f u n c t i o n i n an excess of code. D e f i n i n g an e n t i r e l y new language t o f i t these requirements was s p e c i f i c a l l y avoided because language design was not one of the goals . A l s o , u s i n g an e x i s t i n g language made i t p o s s i b l e to e l i m i n a t e the task of l e a r n i n g a new language from the job of understanding the semantic d e f i n i t i o n . 8 The languages considered were versions 2 of GPM [GPM], LISP [LISP] and TRAC [TRAC]. Appendix A gives a d e f i n i t i o n of Ackerman's function in each of these languages. GPM was rejected immediately on the grounds that i t was not expressive enough and the GPM d e f i n i t i o n stack was a hidden complication. An example of the problems ar i s i n g from the d e f i n i t i o n stack i s that GPM has no conditional but an IF..THEN..ELSE..FI based on string eguality can be simulated using the d e f i n i t i o n stack. GPM was used by Abramson [Abramson] and worked well i n the sense that i t produced the correct output from a minimum of semantic d e f i n i t i o n . It was f e l t however, that a larger source language could be dealt with only by using a more expressive d e f i n i t i o n language. A version of LISP was rejected i n favour of TRAC despite the fact that LISP was e a s i l y expressive enough and i n most cases was also more readable. The reason f o r t h i s was that the conceptual complexity of a value returned by a LISP function i s not strongly related to the complexity of the written LISP code fo r that function. (In other words, LISP i s too expressive.) In TRAC on the other hand, for a complicated value to be returned the function d e f i n i t i o n has to be complicated (since 2 Each language would have had to be modified to include the tree traversal primitives and to remove unnecessary features. 9 the mechanism of evaluation i s simple substring replacement). This l e f t TOSI (the modified version of TRAC) as the final choice. 10 I I I . PARTICULARS III.1. ALGOL 68 V Program Trees The part of the A.LGOL 68 V compiler [Ranter] [Thomson] considered consists of four passes: a l e x i c a l scan, an ambiguity scan, a parse and a mode processing pass. The result of the parse i s a program tree made up of 119 types of nodes. F i f t y - f i v e of these nodes have to do with formats and were ignored. Of the remaining sixty-four, several are eliminated and some are modified during pass four. Upon completion of pass four the nodes are divided into two types, each requiring d i f f e r e n t traversal primitives. In addition to producing a program tree the compiler produces a symbol table, a representation table and a mode table. The symbol table (also referred to as the tag table) i s t i e d to the program tree by means of "access numbers" (indexes into the table) stored in appropriate spots in the program tree. The representation table contains information used in p r i n t i n g out error messages. (This table i s n ' t used by the present semantics and so won't be mentioned again.) The mode table, although i t i s a table, i s treated as i f i t were part of the program tree. I t too i s t i e d to the main part of the tree (and to i t s e l f ) by means of numbers. I t i s t h i s that divides the 11 nodes in the program tree into two types as some nodes are actually mode table entries referred to by number instead of pointer. Some examples of program tree nodes and the e s s e n t i a l 3 parts of the information they contain are given here. For a complete description see Thomson [Thomson]. . Mode Nodes: • Node type 2 (primitive mode) Contains: - a number indicating which primitive mode (e.g. i n t , char) i t i s . • Node type 3 (reference to mode) Contains: - the index (in the mode table) of the mode being referred to. • Node type 6 (union mode) Contains: - the number of submodes comprising the union. - the indices of a l l the submodes^ 3 The ALGOL 68 V compiler philosophy i s to hang on to a l l information. (In fact enough information i s retained to completely reconstruct the source program except for spacing.) This means that the nodes contain a tremendous amount of information of which only a small part i s required by the TOSI semantics. Standard Nodes: • Node type 14 (routine text) Contains: - a pointer to the body of the routine. a pointer to a declaration prologue (containing, among other things, the formal parameter l i s t ) . a l i s t of coercions to apply to the res u l t . • Node type 15 (assignation) Contains: - a pointer to the l e f t hand side. - a pointer the ri g h t hand side. - a l i s t of coercions to apply to the re s u l t . • Node type 16 (procedure c a l l ) Contains: - a pointer to the actual parameters. - a pointer to the procedure. • Node type 24 (tag) Contains: - a pointer into the name table. a l i s t of coercions to be applied to the tag. • Node type 35 (loop) Contains: - pointers to the f o r , from, by, to, while and do parts. 13 • Node type 52 (unit s e r i e s ) Contains: - the number of u n i t s . - p o i n t e r s to a l l the u n i t s . I I I . 2 . TOSI The semantics of TOSI macro expansion are simple. A given s t r i n g i s scanned from l e f t to r i g h t f o r f u n c t i o n s of the form: (name, arg1,... .argn) The l e f t m o s t i n n e r f u n c t i o n i s r e p l a c e d by i t s value V and scanning s t a r t s again at the l e f t end of the returned s t r i n g V. Thus [ l e t t i n g ± denote the p o s i t i o n of the scanner] ± (foo(bar, x) , y) becomes [assuming the value of (bar,x) i s zot(comma) x] (f o o i z o t (comma) x, y) then [assuming the value of (comma) i s ,] ( f o o z o t f , x, y) and f i n a l l y Rvalue of fooz o t a p p l i e d t o x and y This scheme i s complicated by two t h i n g s : n e u t r a l e v a l u a t i o n and delayed e v a l u a t i o n . N e u t r a l e v a l u a t i o n i s denoted by a colon (:) preceding the f u n c t i o n and i n d i c a t e s t h a t scanning i s to resume at the r i g h t end of the r e t u r n e d value r a t h e r than the l e f t . Thus i ( f o o : ( b a r , x ) , y ) 14 would become (f oozot (comma) x£, y) and then fvalue of f oozot (comma) x applied to y In other words a function with the name 1foozot(comma)x* would be c a l l e d with the argument y instead of a function with the name 'foozot' and arguments x, y as before. Delayed evaluation of a portion of a string i s denoted by enclosing i t i n <>'s. If these are encountered, the outermost pair i s stripped o f f , but the str i n g between them i s not scanned f o r functions. Thus t(foo<(bar,x) >,y) becomes (f oo (bar,x) ±,y) and a function with the name 'foo(bar,x)' i s called with y as i t s argument. The philosophy behind the implementation of TOSI was to provide as few primitive functions as possible, and to build any required functions from them. Since TOSI was intended to be used f o r more than one compiler, the tree traversal and I/O primitives had to be changeable. This led to a primitive DEFEXT (DEFine an EXTension), used for defining new primitives. For ALGOL 68 the primitives defined using DEFEXT were: NEWNODE, OLDNODE, NEW MODE and OLDMODE - used f o r tree traversal (two types are needed because there are two classes of nodes) , they return nothing. (NEWNODE,i) and (NEWMODE,i) change the current node to be the i t h subnode of the current node. - (OLDNODE) and (OLDMODE) step back up the tree. PRINT (PRINT,string) returns nothing but causes •string' to appear as a l i n e of output. GETHALF, GETBYTE, POTHALF - these either get a or store one in i t . and PUTBYTE value from the current node GETCO ERCION - returns the name of the coercion represented by the current node i f the current node i s part of a l i s t of coercions. NOTNIL (NOTNIL,i,y,n) returns the y argument i f the i t h branch of the current node e x i s t s and the n argument otherwise. other p r i m i t i v e s p rovided by TOSI are: (DEFINE, n am e, arg1,. .. argn,body) d e f i n e s a f u n c t i o n with the gi v e n name whose value when c a l l e d with the a c t u a l parameters x1,...xn i s body with a l l occurrences o f s u b s t r i n g s equal t o a r g i r e p l a c e d by x i . [ e . g . (DEFINE,bar, |z| , <zot (comma) | z| >) would d e f i n e the f u n c t i o n bar used e a r l i e r . ] (SELECT,i) - r e t u r n s the i t h s t r i n g (or f u n c t i o n or program) a s s o c i a t e d with the c u r r e n t node. (STRING,name, value) - does t h e same as DEFINE with no arguments. (DELETE, namel,... nam en) - undefines the given f u n c t i o n s . SUM, DIFF, PROD, QUOT and REM a l l take two numbers and r e t u r n ths sum, d i f f e r e n c e , product, q u o t i e n t or remainder of them r e s p e c t i v e l y . (Missing numbers d e f a u l t t o 1.) 17 F i n a l l y there are primitives of the form: (test,v1,v2,yes,no) where test i s one of EQ, NE, LE, LT, GE or GT for comparing two strings v1 and v2 or i t i s one of the above preceded by a # (e.g. #EQ) for comparing two numbers. The t h i r d argument i s returned i f the tes t i s true, the fourth otherwise. III.3. The Algol Engine The Algol Engine i s an abstract machine designed to s p l i t the semantic d e f i n i t i o n of ALGOL 68 as nearly i n half as possible while keeping the semantics specified in TOSI machine independent. This means that the Algol Engine i s a high l e v e l abstract machine with a large i n s t r u c t i o n set and lots of pieces of "hardware" (e.g. re g i s t e r s , stacks and I/O mechanisms). A complete description of t h i s machine does not exis t and would be beyond the scope of this thesis i f i t did. Instead, a brief outline of i t s architecture i s given and some of the instructions used in l a t e r examples are explained. The structure of the memory i s the most, unusual feature of the Algol Engine. It i s divided into the l o c a l stack (addressed by pairs of index registers) , the work stack (operated as a real 1 8 LIFO stack), heap storage (addressed only by i n d i r e c t i o n through one of the stacks), code storage and free storage. The storage is managed using two tables, a PSO (program stack organization) table and a mode table. Both the l o c a l and work stacks are broken into frames and in the l o c a l stack each frame i s further divided into environs*. The l o c a l stack i s addressad by means of two index r e g i s t e r s , a stack frame register and an environ re g i s t e r . A stack frame r e g i s t e r points to one of the l o c a l stack frames and has i t s own environ registers which point to environs inside that l o c a l stack frame 5. Thus a data address i n an instruction accessing the l o c a l stack consists of a stack frame register, an environ register and a displacement within the environ. The work stack i s a true stack, only the top few items are accessed by any i n s t r u c t i o n ( i.e. no data address i s needed i n work stack manipulating i n s t r u c t i o n s ) . The code storage i s broken into sections with d i f f e r e n t i n s t r u c t i o n s for jumps i n t e r n a l to a code section and jumps between code sections. 4 The l o c a l stack i s divided t h i s way to separate recursion from normal nesting of ranges. A new l o c a l stack frame i s opened when a procedure i s entered (since i t could po t e n t i a l l y c a l l i t s e l f ) but when a new range i s entered only an environ i s used. 5 No l i m i t was placed on the number of stack or environ r e g i s t e r s since they had to be simulated using memory anyway. In the 370 implementation the environ registers for any given stack frame register were kept i n the memory used as the l o c a l stack frame addressed by that r e g i s t e r . The maximum number of environ r e g i s t e r s in use at any one time for a stack frame register was part of the information collected for the PSO table. 19 The following p a r t i a l l i s t of Algol Engine ins t r u c t i o n s i s divided into classes by the purpose of the ins t r u c t i o n and i s intended to give the flavour of the A.lgol Engine. Storage i n s t r u c t i o n s • OPEN-SF sf# - finds the stack frame r e g i s t e r associated with the PSO table entry sf#, saves the existing contents ( i f any) of i t and i t s environ r e g i s t e r s and causes i t to point to a newly allocated l o c a l stack frame described by the PSO table entry sf#. (The registers are saved i n the l o c a l stack memory i n the following way: If the stack frame register i s n i l then so are i t s environ r e g i s t e r s and nothing needs to be saved. If the stack frame re g i s t e r i s n ' t n i l i t points to a stack frame. Whatever environ registers are i n use are saved in that stack frame and the contents of the stack frame re g i s t e r i s saved i n the new stack frame being opened before the r e g i s t e r i s changed to point to i t . ) • CLOSE-SF -restores the previous contents of the top stack frame register (the one pointing to the top l o c a l stack frame) and releases i t s stack frame ( i . e . returns i t to free storage) . • OPEN-ENV envt - f i n d s the environ r e g i s t e r associated with the PSO table entry env# and causes the corresponding environ register belonging to the top l o c a l stack frame to point to the appropriate environ i n i t . • CLOSE-ENV - clears the environ register for the top environ i n the top loc a l stack frame. • MARK-WS - puts a mark at the top of the current work stack frame for use with the YIELD ins t r u c t i o n . Control flow i n s t r u c t i o n s • LABEL label - marks a lab e l l o c a l to the current program section* (this i s r e a l l y an ins t r u c t i o n to the Algol Engine assembler.) • GO label - unconditional jump to the given l a b e l . 21 • EXTERNAL-LABEL label - defines a global l a b e l . • GO-FAR lab e l - unconditional jump to the given external label. • GO-FALSE lab e l - pops a boolean value off the work stack and jumps to the l o c a l l a b e l i f i t i s fal s e . • GO-FOR sfr,envr,disp,label looks at the control variable of a f o r loop (whose address on the l o c a l stack i s given by sfr,env,disp) and the top two items on the work stack (the increment and the bound) to decide i f another i t e r a t i o n should be made and i f so jumps to the l o c a l l a b e l . • RETURN - branch back from a procedure (see the DEPROCEDURE ins t r u c t i o n ) . Coercions • DEPROCEDURE mode* saves the current i n s t r u c t i o n address i n the top lo c a l stack frame, pops the procedure address from the work stack and branches t o the procedure. • DEREFERENCE modet - pops a r e f e r e n c e to mode* o f f the work stack and r e p l a c e s i t with whatever was being r e f e r e d t o . • UNITE model#fmode2# - r e p l a c e s the top item on the work stack (of mode model*) with a union (of mode mode2#) c o n t a i n i n g i t . • VOID mode* - pops the top item o f f the work stack (of mode mode*) lue moving • PUSH s f r , e n v r , d i s p pushes the item at the l o c a l stack address sf r , e n v r , d i sp onto the work s t a c k . • PUSH-C mode*,constant d e n o t a t i o n - pushes the given constant (of mode mode*) onto the work stack. • STORE sf r ,envr, d i s p - c o p i e s the top item on the work stack i n t o the l o c a l stack l o c a t i o n given by s f r , e n v r , d i s p . • YIELD mode* moves the top item on the work stack to the mark (made by a MARK-WS) i n the work stack frame below. (This i s used for returning values from procedures.) • INCREMENT sfr,envr,disp - adds the top item on the work stack (an int) to the control variable at the l o c a l stack location sf r,envr,disp. Any instruction used i n the next section but not explained here should be self-explanatory. 24 IV. TOSI SEMANTICS FOR ALGOL 68 The TOSI functions attached to the nodes of a program tree are b u i l t up of lower l e v e l functions which are not attached to any particular node. Thus the TOSI semantics are given i n two pieces, the f i r s t piece being a large number of DEFINES of functions to get off the ground and the second piece being a l i s t of associations where an association i s a l i s t of programs to be associated with some node. P i c t o r i a l l y the general form of the TOSI semantic d e f i n i t i o n i s : ( • . • ( . . . ( . . . NODETYPE n1 STRING s1 = I " ( . . ) • • ( . . ) " ; I : I Associations NODETYPE nm STRING sk = I " ( . . ) . . ( . . ) " ; I The syntax of ALGOL 68 s p e c i f i e s a number of rel a t i o n s between modes which are used to determine the l e g a l i t y of a particular program. The syntax also s p e c i f i e s the coercion sequences to be performed during the execution of a program. Pass 4 of the ALGOL 68 V compiler handles these aspects of the syntax, adding the coercion sequences to the program tree and changing any modes i n the tree into indexes into a table of a l l ) I ) | Basic d e f i n i t i o n s ) I 25 the d i s t i n c t modes. Since t h i s mode table contains a l l the information required i n the Algol Engine mode table (along with other information not required by the Algol Engine) the TOSI semantics for producing the Algol Engine mode table simply print i t out in a suitable format. The PSO table i s a different situation. Some processing i s done i n pass 4 towards an intended runtime organization that had no environs i n the l o c a l stack, only stack frames. In particular a l l the tags declared i n a procedure or a program are collected by pass 4 into a declaration prologue and attached to the appropriate node. The TOSI semantics must untangle t h i s and build a PSO table from scratch. Also, the table must exist before the code generation tour of the tree i s made and so the TOSI semantics define a major tour of the program tree for the PSO table (any strings numbered 0-1) and a major tour of the program tree for the code generation (any strings numbered 10-11) . A detailed example of one piece from each of the major tours i s given but the rest of the TOSI semantics developed are not included i n thi s t h e s i s since they are not complete and are not easily readable. 26 IV.1 . The PSO Table The PSO table i s a description of the l o c a l stack. It consists of a l i s t of stack frames, a l i s t of enclosed environs for each stack frame and a l i s t of constituent tags for each environ. This l i s t of l i s t s i s b u i l t during the f i r s t major pass and accessed during the second. I t i s here that TOSI*s lack of data structures r e a l l y hurt (as can be seen from the convoluted TOSI code for building the table) . The TOSI representation of the table i s a c o l l e c t i o n of strings where a l l the strings related to entry ' i ' in the table contain ' i ' in t h e i r names. A. stack frame nnn i s represented by the TOSI strings: PSO-nnn-TYPE = SF SF-DEPTH-nnn = maximum number of environs in use at any one time. SF-SFR-nnn = the associated stack frame r e g i s t e r . SF-#ENV-nnn = m, the number of environs. SF-nnn-1 = PSO number of 1st environ. * • SF-nnn-m = PSO number of mth environ. 27 An environ nnn i s represented by the TOSI strings: PSO-nnn-TYPE = ENV ENV-SF#-nnn = PSO number of enclosing stack frame. ENV-ENVR-nnn = associated environ register. ENV-#TAG-nnn = m, the number of tags. ENV-nnn-1 = PSO number of 1st tag. * • ENV-nnn-m = PSO number of mth tag. A tag nnn whose access number assigned by pass 4 i s xxx i s represented by the TOSI strings: PSO-nnn-TYPE = TAG TAG-ACC#-nnn = xxx TAG-xxx-SFR = part of l o c a l stack address. TAG-xxx-ENVR = part of l o c a l stack address. TAG-xxx-DISP = part of l o c a l stack address. TAG-xxx-MODE = number of tag's mode i n the mode table. TAG-ENV-xxx = PSO number of enclosing environ. TAG-PSO#-xxx = nnn In addition there are TOSI strings to i d e n t i f y a l l the stack frames: PSO-#SF = m, the number of stack frames. PSO-SF-0 = PSO number of 1st stack frame, PSO-SF-m = PSO number of the m+1st stack fram e. The TOSI programs for building the PSO tables are not associated with a program tree node since the same functions must be performed at several different nodes. Instead, they are written 28 as separate procedures: % percent signs delimit comments % (STRING,PSO#f- 1) % current PSO table entry counter % % strings for generating the names used in building and accessing the PSO table % % strings for counting stack frames % (STRING, SFR-LEVEL, 0) % CSF-PSO* returns the name of the st r i n g which contains the PSO# of the current stack frame % (STRING, CSF-PSO*, <SF- (SFR-LEVEL) -PS0#>) % strings for counting environs % (STRING,CSF-ENVR-LEVEL , <SF- (SFR-LEVEL) -ENVR-LEVEL>) % CSF-PSO* returns the name of the string which contains the PS0# of the current environ in the current stack frame % (STRING,CENV-PSO*, <ENV- ((CSF-ENVR-LEVEL) ) - (SFR-LEVEL) -PS0#>) % strings used in the second tour only % (STRING,SF#, 0) (STRING, #ENV#, <ENV#- (SFR-LEVEL) >) % start with an empty PSO table % (STRING, PSO-#SF, 0) % the table building functions % (STRING,START-SF, < (1+ ,PSO#) % take care of main l i s t of SFs % (1+,PS0-#SF) (STRING,PSO-SF- (PS0-#SF) , (PS0#) ) % add thi s stack frame to the table % (STRING,PSO- (PS0#) -TYPE , S F) (1+,SFR-LEVEL) (STRING, (CSF-PSO#) , (PS0#) ) (STRING ,SF-DEPTH- (PS0#) ,0) (STRING, (CSF-ENVR-LEVEL) ,0) (STRING,SF-SFR- (PS0#) , (SFR-LEVEL) ) (STRING,SF-#ENV- (PSO#) , 0) >) (STRING,STOP-SF, < (DELETE, (CSF-ENVR-LEVEL) , (CSF-PSO*) ) (1 -, SFR-LEVEL) >) 29 (STRING,START-ENV, < (1+,PS0#) % take care of enclosing SF % (1+,SF-#ENV-( (CSF-PSO#) )) (STRING, SF- ( (CSF-PSO*) ) - (SF-#ENV- ( (CSF-PSO#) ) ) , (PS0#) ) (1+, (CSF-ENVR-LEVEL)) (#GT, ( (CSF-ENVR-LEVEL) ) , (SF-DEPTH- ( (CSF-PSO#) ) ) , < (1 +, SF-DEPTH- ( (CSF-PSO*) ) ) >) % add t h i s environ to the PSO table % (STRING,PSO- (PSO#)-type ,ENV) ( STRING, EN V-SF#- (PS 0#) , ((CSF-PSO*) ) ) (STRING,ENV-ENV R-(PSO#) , ((CSF-ENVR-LEVEL) ) ) (STRING,ENV-#TAG- (PSO#) ,0) (STRING, (CENV-PSO*) , (PSO*)) (STRING,STOP-ENV , < (DELETE, (CENV-PSO*) ) (1-, (CSF-ENVR-LEVEL)) >) (DEFINE, ADD-TAG, | ACC#| , | MODE#| ,< (1 + ,PSO#) % take care of enclosing ENV % (1 + ,ENV-#TAG- ( (CENV-PS0#) ) ) (STRING ,ENV-( (CENV-PSO*)) - (EN V - #TAG- ( (CENV-PSO#) )) , (PSO*)) % add t h i s TAG to the table % (STRING,PSO-(PSO#) -TYPE,TAG) (STRING,TAG-ACC*- (PSO#) , | ACC* |) (STRING,TAG-| ACC#|-SFR, (SF-SFR- ((CSF-PSO*)) )) (STRING,TAG-|ACC*|-ENVR,(ENV-ENVR-((CENV-PSO#)))) (STRING,TAG-|ACC*|-DISP,(ENV-#TAG-((CENV-PSO#)))) (STRING,TAG-|ACC*|-MODE,| MODE#| ) (STRING,TAG-ENV-|ACC*|, ((CENV-PSO#)) ) (STRING,TAG-PSO#-|ACC#| , (PSO#) ) >) % some of the access functions for the code gen. tour % (STRING,OPEN-SF, < (1+,SFR-LEVEL) (1+,SF#) (STRING, (CSF-PSO*) , (PSO-SF- (SF#) ) ) (STRING, (CSF-ENVR-LEVEL) ,0) (STRING, (#ENV#) ,0) (PRINT, OPEN-SF_ _( (CSF-PSO*) ) ) >) 30 (STRING,CLOSE-SF, < (PRINT,CLOSE-SF) (DELETE, (#ENV#) , (CS F-ENVR-LEV EL) , (CSF-PSO*) ) (1-, SFR-LEVEL) >) (STRING,OPEN-ENV, < (1 + , (CSF-ENVR-LEVEL)) (1+ , (#ENV#) ) (STRING, (CENV-PSO*) , (SF- ( (CSF-PSO*) ) - ( (#ENV#) ) ) ) (PRINT, OPEN-ENV ( (CENV- PS 0#) ) ) >) (STRING,CLOSE-ENV, < (PRINT,CLOSE-ENV) (DELETE, (CENV-PSO#) ) (1-, (CSF-ENVR-LEVEL)) >) (DEFINE,L-S-A, |ACC#| , < (TAG- | ACC# | -SFR) < , > (TAG-| ACC# | - ENVR) <,> (TAG- | ACC# | - DISP) >) The TOSI code associated with an i f statment i l l u s t r a t e s the use of the PSO table building functions. (DEFINE, DOWN, | BRANCH! , |STRING| , < (NOTNIL, |BRANCH | ,< (NEWNODE, | BRANCH| ) (SELECT,|STRING|) (OLDNODE) >) >) NODETYPE 43 STRING 10 = % conditional clause % "(DOWN-LVL) % changes the value of LVL# for simulating l o c a l variables % (STRING, IF: (LVL#) , (GEN-LABEL) ) (OPEN-ENV) % i f part % % note that CP#' i s the number of a mini-tour f o r s e r i a l clauses that doesn't open and close an ENV % (DOWN,1, : (EQ, (SUBNODE,1) ,: (SERCL) , : (CP#' ) , : (CP#) )) (GO-FALSE,ELSE-PART- : (IF : (LVL#) ) ) % then part % (DOWN,2, : (CP#) ) 31 (NOTNIL,3,< (GO,OUT-: (IF: (LVLt) ) ) >) % else part % (LABEL,ELSE-PART-: (IF: (LVL#) )) (DOWN,3, : (CP#) ) (NOTNIL, 3,<( LABEL, OUT-: (IF: (LVL#) ) ) >) (CLOSE-ENV) (DELETE, IF: (LVL#) ) (UP-LVL) NODETYPE 43 STRING 0 = " (START-ENV) (DOWN, 1, : (EQ, (SUBNODE, 1) ,: (SERCL) , : (MP#') , : (MP#) )) (DOWN,2, : (MP#) ) (DOWN,3, : (MP#) ) (STOP-ENV) String 0 i s the PSO table pass. It starts an environ and then does a special mini tour of the i f part (subnode 1) so that no environ i s started there. The important thing to notice i s that the code generation pass (string 10) must follow the order of sta r t i n g and stopping environments exactly i n order to be able to access the correct environments. I f the code generation pass does not open (close) environments wherever the PSO table pass started (stopped) them then the counters used to access the environments are incorrect. IV.2. The Algol Engine Code The semantics for a loop are f a i r l y simple given the Algol Engine GO-FOR and INCREMENT inst r u c t i o n s . The following TOSI code gives the semantics for a loop. No code i s generated for a control variable unless at least one of the for or to parts of the loop i s present. 32 % The code generation loop semantics To NODETYPE 35 STRING 10 = % loop % %1% » (DOWN-LVL) % set SKIP to *F* i f a control variable i s u t i l i z e d , *T* otherwise % %2% (STRING , SKIP: (LVL#) ,: (NOTNI1, 1 ,*F*, : (NOTNIL , 4 ,* F*,*T*) ) ) % control variable environ % %3% (EQ, : (SKIP: (LVL#) ) ,*F*, <(OPEN-ENV) >) % f o r part (find location of control variable) % %4% (NOTNIL, 1,< (NEWNODE,1) (STRING,CVLOC: (LVL#) , (L-S-A, : (GET-HA.LF, 13))) (OLDNODE) >, < (EQ, : (SKIP: (LVL#) ) ,*F*, < (1- ,CP-CV#) (STRING, CVLOC: (LVL#) , (L-S-A, (CP-CV#) ) ) >) >) % from part % %5% (EQ, : (SKIP: (LVL#) ) , *F*, < (NOTNIL ,2,< (DOWN, 2, : (CP#) ) > < (POSH-C, 1 , 1) >) % default from % (STORE, : (CVLOC: (LVL#) ) ) (VOID ,1) >) % by part % %6% (EQ, : (SKIP: (LVL#) ) ,*F*,< (NOTNIL, 3, < (DOWN,3, : (CP#) ) > < (PUSH-C, 1,1) >) % default by % >) % to part % (EQ ,: (SKIP: (LVL#) ) , *F*, < (NOTNIL ,H ,< (DOWN, 4, : (CP#) ) > < (PUSH-C, 1,: (MAXINT)) >) % default to % >) % finished i n i t i a l i z a t i o n , s t a r t loop % %7% (STRING , #L#: (LVL#) , (GEN-LABEL) ) %8^0 (EQ, : (SKIP: (LVL #) ) , *F*, < (GO,FOR-LOOP-START- : (#L# : (LVL#) ) ) >) %9% (LABEL, FOR-LOOP-: (# L# : (LVL#) ) ) % while part % %'\0% (NOTNIL,5,< (OPEN-ENV) (DOWN, 5 , : (CP# •) ) (GO-FALSE, FOR-LOOP-END-: (#L#: (LVL#) ) ) (STRING, WHILE-PART : (LVL#) , < (CLOS E-EN V) >) >, < (STRING, WHILE-PART: (LVL#) ,) >) % do part % (DOWN,6, : (CP#) ) % end of loop % 5812% (WHILE-PART: (LVL#)) % defined above % (DELETE, WHILE-PART: (LVL#) ) %13% (EQ , : (SKIP: (LVL)) ,*F*,< (INCREMENT, : (CVLOC: (LVL#) ) ) (LABEL, FOR- LOOP-START- : (#L#: (LVL#) ) ) (GO-FOR, : (CVLOC: (LVL#) ) ,FOR-LOOP-: (#L#: (LVL#) ) ) >) %14% (LABEL, FOR-LOOP-END-: (# L# : (LVL#) ) ) %*\5% (EQ, : (SKIP: (LVL#) ) , *F*, < (CLOSE-ENV) (DELETE,CVLOC:(LVL#))>) % Note that c l o s i n g the c v . env. removes the by and to v a l u e s from the l o c a l stack % %16% (DELETE,SKIP: (LVL#) ,#L#: (LVL#)) (UP-LVL)"; % the PSO t a b l e loop semantics % NODETYPE 35 STRING 0 = "(NOTNIL,1,<(START-ENV)(DOWN,1,:(MP#»))>, <(NOTNIL,4 ,< (START-ENV) (1-,MP-CV#) (ADD-TAG, : (MP-CV#) , 1) >) >) (DOWN,2, : (MP#) ) (DOWN,3, : (MP#) ) (DOWN,4, : (MP#) ) (NOTNIL,5,<(START-ENV) (DOWN, 5, : (MP#') ) >) (DOWN, 6, : (MP#) ) (NOTNIL, 5,< (STOP-ENV) >) (NOTNIL, 1,< (STOP-ENV) >,< (NOTNIL, 4,<(STOP-ENV) >)>)"; To see how t h i s TOSI code works, c o n s i d e r the loop: f o r i to 10 do - . . . od which r e s u l t s i n a t r e e something l i k e : 34 | for from by to while do , , , «~>|loop| • | / / / / ! / / ! • I/////I • ! > _j v mode value r |cdeno| 1 | 10 | L. V tag i 1 i i |tagdec| > to tag table entry for i The f i r s t pass (using string 0, the second one given) i s the PSO table building pass and doesn't do much. Subnode 1 i s not n i l so an environ i s started and the tag i i s toured (" (DOWN,1,MP#)" resulting in A.DD-TA.G being c a l l e d eventually because of the semantics associated with tagdec nodes) . The other non n i l nodes are toured but nothing else a f f e c t i n g the loop i s done except for stopping the environ started e a r l i e r . The second pass i s f o r the code generation and i s explained in pieces. % 1 % - (DOWN-LVL) changes the value of a global string LVL#. The value of t h i s s t r i n g i s then appended to the names of a l l the strings which are meant to be l o c a l to this node of the program tree. Appending the value of LVL# makes the names unigue so they can be treated as l o c a l variables even though they aren't. %2% - the " l o c a l " string SKIP i s set to T i f no control variable 35 i s to be u t i l i z e d by the loop and to F otherwise. %3% - i f a control variable is used, an environment i s opened for i t . %4% - the Algol Engine location for the control variable (assigned by the PSO table pass) i s recovered. %5% - i f a control variable i s being used the code for obtaining and storing i t s i n i t i a l value i s generated. %6% - again, i f a control variable i s being used, code for the bj and to parts of the loop i s generated. %1% - a unigue number is generated to make the Algol Engine l a b e l names emitted unigue. %8% - i f a control variable i s being used, i t must be checked before the loop i s entered (unlike the FORTRAN DO) so a jump to the code for checking i t i s emitted. %9% - a label for actual start of the loop i s generated. %10% - i f the while part i s present, an environ i s opened for i t , the code for i t i s generated (including the loop exit) and a str i n g i s defined which w i l l generate the CLOSE-ENV when call e d . I f the while part i s not present, the string defined does nothing since no environment was opened. - the tour f o r the body of the loop i s made. %12% - the str i n g defined in %10% i s ca l l e d to close the while part environment. %13% - i f a control variable i s used the code f o r incrementing 36 and t e s t i n g i t i s generated. %14% - the l a b e l f o r the w h i l e - e x i t i s generated (whether or not i t i s r e q u i r e d ) . %15% - i f a c o n t r o l v a r i a b l e i s used i t s environment i s c l o s e d . %16% - " l o c a l " v a r i a b l e s are d e l e t e d and LVL# i s updated. The code f i n a l l y generated looks l i k e : OPEN-ENV env# %3% PUSH-C 1,1 %5% STORE s f r , e n v r , d i s p %5% VOID 1 %5% PUSH-C 1,1 %6% PUSH-C 1,10 %6% GO FOR-LOOP-START-nnn %8% LABEL FOR-LOOP-nnn %9% : code from t o u r i n g the do pa r t %11% INCREMENT s f r , e n v r , d i s p %13% LABEL FOR-LOOP-START-nnn %13% GO-FOR sfr,envr,disp,FOR-LOOP-nnn LABEL FOR-LOOP-END-nnn %14% CLOSE-ENV %*\S% 37 V. CONCLUSIONS The TOSI semantics are easy to understand i f they are small and self-contained. As soon as the functions from one node interac t with the functions from another or become large (i.e. involve more than about two decisions) they become more d i f f i c u l t to understand. For example, the TOSI semantics f o r the conditional clause are much easier to understand than the TOSI semantics for the loop. In f a c t , the loop semantics are quite obscure, requiring an explanation i n English (in the comments and the example). This r e l a t i v e obscurity comes from several sources including TOSI programming s t y l e , the inherent complexity of ALGOL 68, the ALGOL 68 V compiler, the TOSI language, the programming aspect of the approach and the attribute grammar approach i t s e l f . Bad programming style i s guaranteed to make a program d i f f i c u l t to understand. In p a r t i c u l a r the loop semantics would have been much cleaner i f the control variable u t i l i z a t i o n decision was made once, dividing the TOSI semantics for the code generation pass into two s i m i l a r pieces, one for loops with control variables and one for loops without. No suggestions for good TOSI programming style w i l l be made here as there i s enough l i t e r a t u r e available on programming style i n general. 38 ALGOL 68 i s a s u f f i c i e n t l y complex language so that even with a reasonably high l e v e l abstract machine l i k e the Algol Engine the "gap" between them i s f a i r l y complex i n i t s own ri g h t . This implies a sort of "minimum" complexity f o r any semantic d e f i n i t i o n of ALGOL 68 i n terms of the Algol Engine. Of course the complexity can be reduced by either simplifying ALGOL 68 or using a higher l e v e l abstract machine but the f i r s t method misses the point (since the intention was to define ALGOL 68 as i t is) and the second just transforms the problem into one of defining the abstract machine. The ALGOL 68 V compiler was designed and written with a normal, run of the m i l l code generation pass in mind. This meant that some aspects of the compiler were unsuitable f o r the approach examined here. The compiler c o l l e c t s a l l declarations inside a program or procedure and puts them i n t o a declaration prologue, accessible from the head node of the tree f o r the program or procedure. The declaration prologue i s required to generate s t a t i c code for storage a l l o c a t i o n on entry to the procedure but i t i s not reguired for Algol Engine code and complicates any formal parameter l i s t s . The compiler also c o l l e c t s a large amount of information which i s useless for generating Algol Engine code. In par t i c u l a r i t finds the size in bytes of each mode (as they were o r i g i n a l l y going to be implemented on the IBM 360) . The Algol Engine, being machine 3 9 independent, cannot use that kind of information. These and other problems with the ALGOL 68 V compiler were annoying but not s u f f i c i e n t l y so to j u s t i f y the e f f o r t of writing another front end for compiling ALGOL 68. A l l the problems above are present i n any task the size of an ALGOL 68 compiler and are not r e a l l y relevant to the use of attribute grammars to define semantics. They are included here because they do contribute to making such d e f i n i t i o n s hard to understand. The l a s t three sources of confusion are relevant. TOSI, although adequate for the simpler parts of the semantic d e f i n i t i o n , lacks two things which are required to make the d e f i n i t i o n easy to understand: data structures (other than strings) and l o c a l variables. The TOSI semantics have to simulate both of them and t h i s c e r t a i n l y makes the semantics harder to read i f not actually making them more complex. Data structures ( l i s t s ) were required f o r the PSO table. They were simulated by defining a counter (for the length of the l i s t ) and a s t r i n g with the number ' i ' embedded i n i t s name for the i t h element of the l i s t . (e.g. PSO-#SF and PSO-SF-i.) Local variables were required for things l i k e labels and data addresses. They were simulated by appending a number (contained in the global variable LVL#) to t h e i r names. This number was changed as the tree was toured, making the f u l l names unigue. 40 (e.g. CVLOC: (LVL#) .) While some source languages may not reguire data structures i n order to define t h e i r semantics c l e a r l y , using t h i s approach only the most t r i v i a l of languages could be handled without l o c a l variables. The easiest way to obtain the missing features i s to switch to a language which already has them since the philosophy behind TOSI makes i t hard to add them c o n s i s t e n t l y 6 . The decision to reject LISP was based partly on the f a c t that LISP had l i s t s as data structures. Now i t can be seen that t h i s was i n c o r r e c t and as a result LISP should have been the f i r s t choice. LISP has l o c a l variables as well as data structures and so would have f i l l e d both requirements. The next source of obscurity in the defintion i s not so easy to cure. The problem i s that t h i s type of semantic d e f i n i t i o n i s r e a l l y a mapping or function from program trees to abstract machine code. Understanding how to compute the function i s not necessarily required to understand the mapping i t s e l f . In f a c t , i f the algorithm f o r computing the function i s overly complicated i t may i n t e r f e r e with understanding the function. For example, i t i s not necessary to understand how an 6 Actually l o c a l variables could be added simply by making the primitive STRING stack the old value when something i s redefined and restore i t when the new value i s deleted. Unfortunately there does not appear to be any way to add l i s t s without introducing new syntax or new primitives. 41 algorithm for checking the planarity of a graph works in order to understand that such an algorithm should say "yes" i f the graph i t i s used on i s planar and "no" otherwise. In the TOSI semantics case so much code i s spent d e t a i l i n g how to do "x" that i t i s often d i f f i c u l t to see what "x" i s . Unfortunately i t is understanding that "x" and not how to do i t that i s the major goal. More s p e c i f i c a l l y , the programming involved in di r e c t i n g the tour obscures the interesting part of the d e f i n i t i o n which i s the re s u l t of the tour: the code generated. Defining the code to be generated by means of any, programming language w i l l , at least to some extent, result i n t h i s sort of problem. There i s a p a r a l l e l s i t u a t i o n i n the syntactic d e f i n i t i o n case. Syntactic d e f i n i t i o n s which include instructions on how to parse (such as Floyd-Evans production language [FE]) are more d i f f i c u l t to comprehend than simple s p e c i f i c a t i o n s (such as ones in BNF) . The solution i s the same too, eliminate the programming and specify the mapping some other way. The d i f f i c u l t y i s that no equivalent to BNF i s available for defining mappings from trees to abstact machine code. The f i n a l source of d i f f i c u l t y i n obtaining a clear d e f i n i t i o n i s in the attribute grammar approach i t s e l f . Attribute grammars have an inherent tendency to s p l i t up parts 42 of the d e f i n i t i o n which, for the sake of c l a r i t y , should be together. This i s because the a t t r i b u t e grammar must at least approximate the syntax and the syntax may include things which are not relevant to the part of the semantic d e f i n i t i o n being s p l i t up. For example the d e f i n i t i o n of the PSO table depends only on the stack frames, nesting and declarations. £.11 the nodes in the Algol 6 8 v program trees which do not a f f e c t these cause confusion when the TOSI semantics for the PSO table must work through them. The conditional clause i s an example. The TOSI code must start and stop environs which may not be needed because there i s no simple way to find out i f they are. I t must also take s p e c i a l pains to prevent the subnodes from s t a r t i n g and stopping environs which are handled in the main node. A p a r t i a l solution i s to c o l l e c t the s p e c i f i c a t i o n s i n one place and r e f e r to them only from the i n d i v i d u a l nodes as was done, fo r example, with ADD-TAG, STAST-ENV and the others. Unfortunately the semantics are s t i l l in pieces (even though they may be physically near each other) when for maximum c l a r i t y they should be a single unit. Starting and stopping (or opening and closing) a stack frame should be one action associated with one node so that everything r e l a t i n g to stack frames can be grasped at once. Just rearranging the pieces i s not going to achieve t h i s . 43 A more complete solution i s to put the a t t r i b u t e s on a graph instead of on a tree. The graph would be composed of several nearly d i s t i n c t trees having only a few nodes i n common. For ALGOL 68 two trees could be used, one f o r the program (called the program tree) and one f o r the PSO related information (called the PSO tree) . The nodes in common might be only the tag dec nodes or they might include the nodes giving nesting and stack frame information (such as the routine text node) as well. In any case separate programs (or mappings or whatever i s being used) could be given for the nodes i n each tree, allowing things which should be kept together to be together. A case i n point i s the stack frame code f o r ALGOL 68 V. The stack frame code i s s p l i t into four parts i n the existing d e f i n i t i o n and associated with two types of nodes (program and routine text). If t h i s multiple tree approach had been used there could have been one node f o r a stack frame and one piece of code for adding i t to the PSO table. If, in addition, the nodes in the program tree pointed to the stack frame nodes in the PSO tree when necessary the Algol Engine instructions for managing the stack frames could be generated without searching the PSO table or keeping track of counters as i s done now. In the general case, the "parse graph" could be produced i n much the same way a parse tree i s now produced by running several d i f f e r e n t parses i n p a r a l l e l and joi n i n g the res u l t i n g trees i n the appropriate places as they are made. 44 The worst confusion source of the l a s t three i s the extra TOSI code required to simulate l o c a l variables and data structures. Fortunately i t is also the easiest to eliminate, simply change TOSI or switch to another language for writing the semantics. The f i n a l source of confusion described above, the fragmentation introduced by the a t t r i b u t e grammar i t s e l f , i s not a major source since i t does not become apparent u n t i l the grammar becomes large. The one source of obscurity that i s hard to avoid using the approach outlined i n t h i s t h e s i s i s the extra programming involved in directing the execution flow around the tree. This suggests that a good area for further research might be alternative methods for specifying the mapping from trees to code. BIBLIOGRAPHY [Abramson] (on pages 2, 5 and 8) H.D. Abramson "A Syntax Directed Macro Processor" BIT 14 (1974), 261-272 [A68V] (on page 5) D.R. Ramer, J.D. Thomson, V.S. Manis and M.S. Johnson "Introduction and User's Guide to ALGOL 68-V" Technical Manual TM-11, University of B r i t i s h Columbia (8 December 1976( [FE ] (on page "1) A.V. Aho and J.D. Ullman "The Theory of Parsing, Translation and Compiling" Vol 1 Prentice Hall (1972) Sec. 5.4.4 443-448 [GPM] (on page 8) C. Strachey "A General Purpose Macrogenerator" Computer J. 8 (1965-1 966) 225-241 [Hoare] (on page 1) C. A.R. Hoare "An Axiomatic Basis for Computer Programming" CACM 12 (1969) 576-583 [Knuth] (on pages 1, 2 and 5) D. E. Knuth "Semantics of Context Free Languages" Mathematical Systems Theory 2 (1968) 127-145 an d "Semantics of Context Free Languages: Correction" Mathematical Systems Theory 5 (1 971) 95-96 [LISP] (on page 8) J. McCarthy "LISP 1.5 Programmer's Manual" The MIT Press, Cambridge, Mass. (1962) [Ramer] (on pages 5 and 10) D.R. Ramer "Construction of LR(k) Parsers With Application to ALGOL 68" MSc Thesis, University of B r i t i s h Columbia (1973) 46 [RE] (on pages 4 and 5) A. van Wijngaarden, B.J. Mailloux, J.E.L. Peck, C. H.A. Koster, H. Si n t z o f f , CH. Lindsay, L.G.L.T. Meertens and R.G. Fisker "Revised Report on the Algorithmic Language ALGOL 68" Sigplan Notices Vol. 12 No. 5 (May 1 977) 1-70 [Scott] (on page 1) D. Scott "Outline of a Mathematical Theory of Computation" Proceedings of Fourth Annual Princeton Conference on Information Science and Systems, Princeton (1970) 169-176 [Thomson] (on pages 5, 10 and 11) J.D. Thomson "Mode Checking i n ALGOL 68" MSc Thesis, University of B r i t i s h Columbia (1973) [TOSI] (on page 4) H.D. Abramson, T.B. Rushworth and T. Venema "TOSI: A Tree Oriented String Interpreter" Software Practice and Experience 7 (1977) 663-670 [TRAC] (on pages 4 and 8) C.N. Mooers "TRAC - A Procedure Describing Language for the Reactive Typewriter" CACM 9 (1966) 215-219 [VDL] (on page 1) P. Wegner "The Vienna Definition Language" A.C.M. Comp. Surverys 4 (1972) 5-63 47 Appendix A A l l three languages (GPM, LISP and TRAC) have been s l i g h t l y modified in order to eliminate differences i n spacing and function names in the following examples. GPM: (DEF , ACK, <(-1, (DEF,-0 , < (>--2<, (DEF,>-.2<, <(ACK, » < - , - 1 # 1 ) « / (ACK, » ( - , - 2 , 1 ) « ) ) > ) / (DEF,>0<, < (ACK, » ( - , - 1 , D « / » 1 < < ) > ) )> ) , (DEF,0, (+,1,-2) ) ) > ) (DEF,ACKER MAN, «(>ACKERMAN<,>-1<, >-2<) = > (ACK, -1,-2) >) LISP: (DEF ACK (N M) (COND ( (#EQ NO) (+ M 1) ) ( (#EQ M 0) (ACK (- N 1) 1)) (T (ACK (- N 1) (ACK N ( - H 1) ) ) ) ) ) (DEF ACKERMAN (N H) (PRINT " (ACKERMAN" N M ") = " (ACK N M) ) ) TRAC: (DEF,ACK,|N|,|M|, <(#EQ,|N| ,0, < ( + r I M I r1)>, < (#EQ,| M| ,0, <(ACK, (-,|N| ,1),1) >, <(ACK, (-,|N| ,1), (ACK,|N|, ( - , |H | , 1) ) ) >) >) >) (DEF,ACKERMAN, |N| ,|M| , <(PRINT,<(ACKERMAN,|N| ,| M| ) = >(ACK, |N |, |M |))>) 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051781/manifest

Comment

Related Items