Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A semantic representation for translation of high-level algorithmic languages Appelbe, William Frederick 1978

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1978_A1 A66.pdf [ 7.53MB ]
Metadata
JSON: 831-1.0076824.json
JSON-LD: 831-1.0076824-ld.json
RDF/XML (Pretty): 831-1.0076824-rdf.xml
RDF/JSON: 831-1.0076824-rdf.json
Turtle: 831-1.0076824-turtle.txt
N-Triples: 831-1.0076824-rdf-ntriples.txt
Original Record: 831-1.0076824-source.json
Full Text
831-1.0076824-fulltext.txt
Citation
831-1.0076824.ris

Full Text

A SEHaNTIC REPRESENTATION FOE TRANSLATION OF HIGH-LEVEL ALGORITHMIC LANGUAGES by B ILL I AM FEE DEB ICK A PP EL BE B. Sc. (Hons.), Monash U n i v e r s i t y , 1973 M.Sc., U n i v e r s i t y o f B r i t i s h Columbia, 1975 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY i n THE FACULTY OF. GRADUATE STUD.IES ( I n t e r d i s c i p l i n a r y S t u d i e s ) He accept t h i s t h e s i s as conforming t o the r e q u i r e d standard. THE UNIVERSITY OF BRITISH COLUMBIA August, 1978 (c) W i l l i a m F r e d e r i c k Appelbe, 1978 In presenting th i s thes is in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Un ivers i ty of B r i t i s h Columbia, I agree that the L ibrary sha l l make it f ree ly ava i lab le for reference and study. I fur ther agree that permission for extensive copying of th is thesis for scho lar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of this thes is for f inanc ia l gain sha l l not be allowed without my written permission. Depa rtment The Univers i ty of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 0*te |C)7fl f W g f i i Abstract Although new high-level programming languages continue to be . proposed, major software development e f f o r t s are s t i l l devoted to developing tran s l a t o r s for the current languages. As yet there i s no clear methodology for designing and implementing tr a n s l a t o r s ; instead there are a large number of tools and techniques applicable to d i f f e r e n t aspects of translator development., This thesis proposes a new technique f o r structuring the development of translators for high-level algorithmic languages, based upon an intermediate graphical program representation c a l l e d GRAIL. Using t h i s approach, the semantic phase of the t r a n s l a t o r converts source language programs into a corresponding GRAIL representation, then subsequently i n t o code for the target machine.,: The primary advantage of t h i s approach i s to simplify the task of developing translators for high-level algorithmic languages., GRAIL i s designed to enable adaptable and e f f i c i e n t translators to be developed for a wide range of source languages. GBAIL was incorporated into a translator writing system, which was used to implement a tra n s l a t o r for the programming language ASPLE. The ASPLE translator was highly adaptable, as both the syntax and semantics of ASPLE could be e a s i l y modified. The tra n s l a t o r writing system demonstrated the f e a s i b i l i t y of using GRAIL to implement adaptable tr a n s l a t o r s , although the software used had limited c a p a b i l i t i e s . i i i Table of Contents II, Design and Implementation of Translators . 4 Translator Development .. 4 Structurng a Translator 8 Translator Implementation 11 II I . Programming Language Semantics and Translators .14 Intermediate Representations f o r Translation . , . i . . . . . . . . . , 1 5 General Intermediate Representations ...21 Defini t i o n of Programming Languages ...... 24 IV. GRAIL . .... 27 GRAIL Data Structures 34 GRAIL Control Structures ... 39 GRAIL Data Declarations 45 V. Implementation of a source to GRAIL Translator ..v... «50 ' LANCE . . .f. 52 QUEST Description 57 Syntax Defi n i t i o n Using QUEST 60 Semantic Definition Using QUEST 63 VI. Code Generation from GRAIL 76 STACODE 77 ST AC ODE Instructions 79 VII. Summary and Conclusions 87 Implementation Results 88 Evaluation of GRAIL ....91 Original Contributions and Puture Research 93 Appendix A. GRAIL Representations 103 GRAIL Primitive NODES 103 ALGOLS Data Structures 104 ALGOLS Control Structures ... ............................ 106 Appendix B. LANCE Translator Writing System ................109 Structure and Generation of a USL Translator 111 I n t r i n s i c MODULES ..................... .................. 112 Generated MODULES 113 Appendix C. STACKER: the STACODE V i r t u a l Machine ........... 124 Appendix D. QUEST Syntax Chart 125 Appendix E. The AS PL E Compiler .............................127 QUEST Source L i s t i n g 127 ASPLE Parser F i l e 135 ASPLE Semantics F i l e 137 Appendix F. ASPLE Program and T r a n s l a t i o n . 140 ASPLE Source 140 GRAIL Representation 141 STACODE Output Program 143 V L i s t of Diagrams 2-1 A Translator .................... .... 8 2-2 Phases of a Translator .......... 9 2-3 Intermediate Representations in the Semantic Phase ..... 10 2- 4 The LANCE Translator Writing System 12 3- 1 A JANUS Procedure 22 4- 1 GRAIL Structure 28 4-2 Simplified GRAIL Structure 30 4-3 Data REFERENCE in GRAIL 32 4-4 Record Representation i n GRAIL 36 4-5 Compound Data Type Representation 37 4--6 GRAIL Representation of a Compound Statement 40 4-7 Transfer of Control i n GRAIL • "1 4-8 GEAIL Representation of Iteration 42 4-9 GRAIL Representation of Expressions .................... 43 4-10 GRAIL Representation of a Case Statement ...44 4-11 GRAIL Representation of FORTRAN Functions 47 4-12 Representation of a PASCAL READ .. . 48 7-1 ASPLE Translator Generated by the THS 87 7-2 Optimizing ASPLE Translator . . . . . . . . . . . i . . . . . . . . . . . . . . . . 8 9 7-3 Adaptable ASPLE Translator 89 B-1 Using LANCE to Generate a Translator. ................. 109 B-2 Structure of a OS1 Translator 110 C-1 STACODE Environment 124 v i Ackriowle dgeraents /* Aliguid per Oranes. — motto of C a l i s t r a l i a */ A doctoral d i s s e r t a t i o n represents a l o f t y pinnacle above the plain of human knowledge, created i n the hope that others can scale higher without becoming l o s t i n ethereal clouds,, Invariably the pinnacle owes i t s existance not only to the ingenuity and persistence of i t s creator, but also to a host of external influences., Recognizing a l l these influences i s impossible, so at the outset I humbly apologize for any errors or omissions in t h i s l i s t of personal acknowledgements. The f e r t i l e environment for t h i s research was provided by the Department of Computer Science at the Oniversity of B r i t i s h Columbia, and the primary source of funding was the Canadian Commonwealth Scholarship Committee. , The software resources f o r the thesis were provided by the TRUST Translator writing System, and the dissertation i t s e l f was prepared using the Texture document processor. Many fellow graduate students helped lighten the load by supplying both wit and wisdom; Tom Rushworth and Ted Venema, Greg Silbur, Hark Scott Johnson, Graeme Hirst Without the warmth of such friendship the long wet winters would have been unendurable. Special thanks must also go to my guidance committee, whose perseverence and patience in reading and rereading the disse r t a t i o n improved i t s g u a l i t y immensely., I e s p e c i a l l y acknowledge my- faculty advisor, Harvey Abramson, whose quiet counsel often succeeded where my obstinacy f a i l e d . F i n a l l y , I must acknowledge Douq Teepie who helped me withstand the oral defence, and es p e c i a l l y Hark Scott Johnson who. inspired the vi s i o n of a golden future through his own perseverance and compassion. 1 Chapter I . / Introduction /* "Programming languages provide'the.'ability to deal with problem solutions at a l e v e l of abstraction that i s i n seme way close to the l e v e l at which the solution was o r i g i n a l l y formulated. As a r e s u l t , areas of research not d i r e c t l y related to programming language design often involve the development of new languages i n which solutions to problems i n that area can most naturally be expressed, With such an approach, a large part of the t o t a l project e f f o r t may, however, be involved in developing the language i n which the desired system i s to be realized. Thus, i f new programming languages are to play an increasingly important r o l e i n software development, the cost (in terms of time and e f f o r t ) reguired to design a programming language and implement a correct t r a n s l a t o r for i t must be reduced." (Whit 76) */ A t r a n s l a t o r i s a software system that w i l l accept as input a source program i n some language and produce as output a semantically equivalent program i n another, target, language. If the source program i s expressed i i i a high-level language (e.g., AIGOL, PL/I) and the output i s expressed i n an object language for a computer |e.g.> relocatable machine codej, the translator i s referred to as a compiler. /* "By now i t i s almost a c l i c h e to say that there i s a software c r i s i s . Nearly everyone recognizes that software costs more than hardware, and that the imbalance i s projected to increase.... Another component of the software c r i s i s i s less commonly recognized, ... namely the extreme d i f f i c u l t y encounteredin attempting to modify an exi s t i n g piece of software.,.. Assuming we a l l agree that there i s a sofware c r i s i s , then, what i s the cause of the problem and what are we to do about i t ? One can f i n d many answers to both guestiofiS* but to this author the answer to the f i r s t , at l e a s t , i s c l e a r : complexity.'! (Wulf 77) */ 2 Translators for high-level programming languages are among the most commonly used types of software. Translators themselves are often large, complex programs, and considerable e f f o r t i s devoted both to developing translators f o r new programming languages and to modifying e x i s t i n g t r a n s l a t o r s . Unless the range of programming languages and machines can be standardized, then considerable e f f o r t w i l l continue to be expended upon developing t r a n s l a t o r s ; however there i s strong resistance to standardization. /* "The twin ideals of a common language for programmers and the immediate interchangability of programs among machines have largely f a i l e d to materialize. The main reason for thi s i s that programmers, l i k e a l l s c i e n t i s t s beforethem, have never been wholly s a t i s f i e d with th e i r heritage of l i n g u i s t i c constructs. He hold that the demand fo r a fixed standard programfflihg language i s the an t i t h e s i s of a desire for progress i n computer science." (McKe 76) */ Since new translators w i l l continue to be needed, there i s considerable motivation for developing technigues f o r reducing the complexity of developing translators, or "structuring compiler development" (Born 76]., Such tools and technigues include standard algorithms f o r scanning and parsing f Aho 72 ], modular compiler designs [ E l s e 70]f«irt 71], technigues for portable compilers [ wait 70 ] f Newe 72] JE Brow 74 ] f/ Snyd 75] f Lane 72, 76], and trans l a t o r writing systems [ Feld 68] f Book 70] f Capo 72] [®asi 72] f Boch 75 ]J*8aih 77 ]. However, i n spit e of recent progress i n these areas, translator development continues to be a complex task. The major complexities encountered in the design and implementation of tr a n s l a t o r s are usually i n the "semantics phase" of the translator [Joh 78]: i n generating the 3 object program from an i n t e r n a l source program representation., This thesis presents a new approach to structuring the semantic phase of the tran s l a t o r for high-level algorithmic programming languages, by incorporating an intermediate semantic representation, GHAIL, int o the design and implementation of translators. The objective of t h i s thesis i s to show how GRAIL can be used to reduce the complexity of developing translators for a wide range of source languages and target machines. 4 Chapter II. Design and Implementation of Translators /* "Every compiler i s developed i n a particular environment, in response to c e r t a i n needs, and that environment w i l l shape not only the form of the completed compiler, but also the process by which i t i s developed." (Horn 76) */ This thesis advocates a new approach to the process of developing translators f o r high-level programming languages, but before t h i s approach i s introduced i n d e t a i l , i t i s important to review several issues related to the process of translator development. These issues include the problems associated with translator development, the tools and techniques currently available to assist i n the process, and the rela t i o n s h i p between GRAIL and other approaches to translator development. Translator Development Typ i c a l l y the development of a translator for a modern high-level programming language i s a major software project [ Horn 76 ]. As with any other major software project the techniques adopted f o r translator development w i l l be dictated by the goals and objectives of the project. The following l i s t of objectives i s t y p i c a l of many translator development projects: 5 (1) Correctness. /*• "Any compiler, f o r example, i s going to have at least •pathological* programs which i t w i l l not compile corr e c t l y . What i s pathological, however, depends to some extent on your point of view. I f i t happens i n your program, you hardly c l a s s i f y i t as pathological, even though thousands of other users have never encountered the bug. irheproducer of the compiler, however, must make some evaluation of the errors on the basis of the number of users who encounter them and how much cost they incur. ...,* (Wein 71) */ Since complex t r a n s l a t o r s for high-level languages are seldom absolutely correct, perhaps " r e l i a b i l i t y " i s a more feasible goal. Usually the more e f f o r t that i s devoted to the project, the more r e l i a b l e i s the product. Often what distinguishes a "production" compiler from an "experimental" compiler i s only the confidence that the implementor has in i t s r e l i a b i l i t y and the e f f o r t devoted to testing and debugging. <2) A v a i l a b i l i t y . Even a highly r e l i a b l e compiler that cannot be run upon any l o c a l l y a v a i l a b l e computer system without d r a s t i c modifications i s of l i t t l e use. An important aspect of any software development project i s the e f f o r t which can be devoted to the project, and the timetable for project completion. A v a i l a b i l i t y i s a measure of the time and e f f o r t required to develop the compiler. In many programming applications the choice of a language i s dictated by the a v a i l a b i l i t y of a compiler for the language. Innovation i n using new prog ramming languages i s frequently discouraged by the d i f f i c u l t y of making 6 compilers available. (3) Adaptability. /* "During the l i f e t i m e of a compiler, requirements and s p e c i f i c a t i o n s may change many times (often, even before i t i s completed). Unless s p e c i a l care i s taken during i t s construction to ensure ad a p t a b i l i t y , responding to these changes may be both traumatic and expensive. " (Horn 76) */ An adaptable program has been defined as one that can readily be modified to meet a wide range of user and system reguirements [Pool 76]. Although some specials-purpose compilers are produced to compile single programs, most compilers must be planned to handle a large number of programs, and to evolve over a considerable l i f e t i m e to meet changing requirements. (4) Helpfulness. A " h e l p f u l " compiler i s one which minimizes the cost of program development i n a s p e c i f i c environment. ,.• Features of a helpful compiler include error detection, diagnosis, and correction, as well as aspects of program development such as documentation, updating source program versions, and maintenance of program l i b r a r i e s [Horn 76]. (5) E f f i c i e n c y . There are several d i f f e r e n t measures of e f f i c i e n c y applicable to compilers, such as the e f f i c i e n c y of developing the compiler, the e f f i c i e n c y of program development using the compiler, and the e f f i c i e n c y of target programs produced by the compiler. 7 The above goals cannot a l l be achieved s i m u l t a n e o u s l y , as t h e r e i s a g e n e r a l t r a d e - o f f between d i f f e r e n t goals. T y p i c a l t r a d e - o f f s i n c l u d e : (1) C o m p i l a t i o n e f f i c i e n c y vs. execution e f f i c i e n c y . A " s t r i p p e d down" compiler w i l l u s u a l l y be s m a l l e r and f a s t e r than a compiler which generates both e f f i c i e n t code and e l a b o r a t e e r r o r d i a g n o s t i c s . (2) A d a p t a b i l i t y vs. e f f i c e n c y . A h i g h l y adaptable compiler w i l l u s u a l l y be l e s s e f f i c i e n t than one which i s e x c l u s i v e l y designed f o r a s p e c i f i c environment. (3) A v a i l a b i l i t y can c o n f l i c t with each of the other g o a l s , as emphasis upon developing a c o m p i l e r g u i c k l y w i l l u s u a l l y reduce the q u a l i t y of the r e s u l t . No one approach to compiler development i s u n i v e r s a l l y a p p l i c a b l e , as any s p e c i f i c technique w i l l be b i a s e d i n favour of c e r t a i n g o a l s . For example, using a technique which i n c r e a s e s the e f f i c i e n c y of the t a r g e t programs, such as g l o b a l data flow a n a l y s i s £ Aho 77J, w i l l i n e v i t a b l y i n c r e a s e the e f f o r t r e q u i r e d t o develop the c o m p i l e r . /* "The r a r e o c c a s i o n s i n which a new technique a l l o w s an improvement without a corresponding c o s t r e p r e s e n t s i g n i f i c a n t advances i n the theory of compilers.?? (Horn 76) */ Some gen e r a l techniques such as " s t r u c t u r e d programming" and w r i t i n g the compiler i n a h i g h - l e v e l language f i t t h i s c r i t e r i o n , but i t i s too s t r o n g a c r i t e r i o n f o r a s p e c i f i c r e p r e s e n t a t i o n such as GHAIL. GRAIL was p r i m a r i l y designed f o r 8 the translator development objectives of adaptability and a v a i l a b i l i t y , without compromising the e f f i c i e n c y of the target programs. Structuring a Translator a user often views a translator as a "black box", which takes a source program as input, and generates an object program and l i s t i n g s as output ( f i g . 2-1). although early translators often had l i t t l e recognizable structure [ Baue 76], recent translators for high-level algorithmic languages have l a r g e l y adopted a standard form. The structure of a t r a n s l a t o r i s usually decomposed into a series of l o g i c a l phases ( f i g . 2-2) [ Aho 77 "j. Source Program TRANSLATOR Object Program Figure 2-1 A Translator 9 TRANSLATOR SCANNER /Lexical % ^•Analysis' PARSER /Syntax x ^Analysis' SEMANTIC ANALYZER t > 1 1 1 I 'source Token'Sequence Parse'Tree Object' Figure 2-2 Phases of a Translator The f i r s t phase, referred to as scanning (or l e x i c a l a n alysis), reduces the source program to a seguence of basic symbols or tokens (e.g., reserved words, <identifi€r>*s, <integer>*s). The second phase, referred to as parsing (or syntax a n a l y s i s ) , performs an analysis of the seguence of tokens to determine the syntactic structure of the program. This syntactic structure, which can be represented as a parse tree, i s then input to the t h i r d phase of semantic analysis. This phase must perform functions including tabulating semantic information (e.g., a l l o c a t i n g run-time addresses f o r i d e n t i f i e r s ) , and generating and optimizing an appropriate seguence of instructions in the object language of the particular machine involved. While s i g n i f i c a n t progress has been made i n the formalization of syntax and in the automatic construction of scanners and parsers, the s p e c i f i c a t i o n of semantics and the design of the semantic phase has received a much less formal and unified treatment. 10 One approach which has frequently been advocated i s to perform code generation in several stages. Instead of generating code f o r the target machine d i r e c t l y from the source program, the source program constructs are f i r s t translated into an intermediate form, w h i c h i s subsequent!y translated i n t o code for the target machine. More than one separate intermediate form can be used, and separate passes can be included to perform optimizations on each of the intermediate programs ( f i q . 2-3). SEMANTIC ANALYZER Final Intermediate Form T I Source to Intermediate Code "7* I Intermediate Code Optimization 1 1 1 Intermediate Code to Object Code • l I Parse Tree i I n i t i a l Intermediate L Form 1 Object Figure 2-3 Intermediate Representations i n the Semantic Phase Many diff e r e n t approaches to intermediate forms have been advocated, including abstract machine modelling fNe we 72], abstract syntax trees fMcKe 76 ] abstract run-time types and control structures [ S h i t 76]. Intermediate forms are often referred to as intermediate languages or codes; however throughout t h i s thesis they w i l l be gen e r i c a l l y described as intermediate representations,. .• A language is primarily a tool for communication, eith e r between users or the user and the computer, so emphasis must be placed both upon i t s r e a d a b i l i t y and i t s w r i t a b i l i t y . By contrast an intermediate representation used i n t e r n a l l y by a translator i s not constrained by "syntactic sugar*, or the need to be e a s i l y input to the computer. 11 Representations are frequently used i n applications other than translator implementation. For example, flowcharts are a commonly used representation for algorithms, though they are not usually regarded as a language. The meaning of the statements within each block of a flowchart i s net s p e c i f i e d by the i flowchart representation, and flowcharts are not designed to be automatically processed by a computer. GRAIL i s designed as an intermediate representation for translators, GRAIL i s based upon a graphical model, so the language implementor can regard the GRAIL program as a directed a c y c l i c graph. A detailed description of the GRAIL representation i s contained i n Chapter 4. The representation i s designed to enable highly adaptable translators to be developed for a wide range of source languages. , Translator Implementation One of the most promising approaches towards the s i m p l i f i c a t i o n of the implementation of t r a n s l a t o r s i s the use of translator writing systems (TSSs), often referred to as compiler-compilers, / H i s t o r i c a l l y , the existence of TffSs i s a r e s u l t of using syntax-tdirected translation technigues to structure the t r a n s l a t o r [ G r i f 76 ]. The ultimate goal of THS development i s to enable e f f i c i e n t t r a n s l a t o r s to be generated from a formal language d e f i n i t i o n and a machine description; however t h i s goal i s far from being r e a l i s e d at present. 12 Currently, TSSs enable both the scanning and parsing phases of the translator to be generated automatically, but r e l y heavily upon the language implementor to define the functions performed in the semantic phase of the translator. Translators using GRAIL were produced using a TBS f Vene 76], as the advantages of automatic scanner and parser generation outweighed the disadvantages encountered in developing general semantic functions for the THS to generate the GRAIL representation. The d e t a i l s of the design of the TWS, referred to as LANCE, and the generation of translators using t h i s system are contained in Appendix B. Using LANCE, a language implementor defines the syntax of the source language and the corresponding semantic functions i n a d e f i n i t i o n a l language, referred to as QUEST. This d e f i n i t i o n , or QUEST program, i s then input to LANCE which generates a trans l a t o r for the source language from the QUEST program ( f i g . 2-4) . IMPLEMENTOR USER OBJECT Program Scanner Parser Semantic , . . . „ Analyzer I L i s t i n g (figure 2-2) Figure 2-4 The LANCE Translator Writing System 13 Translators generated by LANCE are highly adaptable, as they use both a high-level intermediate representation, GBAIL, and a simple low-level intermediate representation, STACODE. Thus the language implementor can e a s i l y modify the translator to r e f l e c t changes in either the source language or the target machine. , 14 Chapter I I I . Programming Language Semantics and Translators /* " I t may seem misleading to talk so much about language d e f i n i t i o n methods when we wish to examine the state of the a r t i n compiler-compilers, but the one i s d i r e c t l y dependent on the other. Compiler implementors have always complained that the language definers ignore t h e i r problems, but both sides should l i s t e n more c a r e f u l l y to what the other has to say.'? (Grif 76) */ The previous chapter introduced G B A I L a s an intermediate semantic representation for translation, based upon the concept of l o g i c a l l y dividing the organization of a translator i n t o several modules or phases. This chapter compares GBAIL both with other intermediate representations currently used i n trans l a t o r s , and with other technigues f o r programming language d e f i n i t i o n . Any intermediate representation such as GBAIL i s closely related to the techniques of language d e f i n i t i o n . There are three direct applications of language d e f i n i t i o n techniques to the intermediate representation: (1) The d e f i n i t i o n of the intermediate representation i t s e l f . (2) The d e f i n i t i o n of the translation of the source language into the intermediate representation. (3) The d e f i n i t i o n of the tra n s l a t i o n of the intermediate representation into the object language. Although many of the techniques used for formal language d e f i n i t i o n are inappropriate for p r a c t i c a l t r a n s l a t o r s , such techniques provide much of the i n s p i r a t i o n f or progress in the 15 design and implementation of t r a n s l a t o r s . Intermediate Representations for Translation fiany t r a n s l a t o r s for high-level languages incorporate intermediate representations either i n t h e i r design or i n t h e i r i m p l e m e n t a t i o n . S i n c e the intermediate representation i s usually of no concern to the language user, the d e t a i l s of the intermediate representation may only be apparent i n the documentation for the tr a n s l a t o r implementation. In spite of t h e i r widespread use there has been l i t t l e attempt to analyze and compare intermediate representations, although some attempts have been made to measure the trade-off between p o r t a b i l i t y and e f f i c i e n c y incurred by adopting s p e c i f i c intermediate representations such as LOHL [Brow 72]. In order to c l a s s i f y the large number of different intermediate representations adopted in practice, three d i f f e r e n t measures are applied to categorize intermediate representations used i n translators: f1) Generality. Generality measures the degree to which the representation i s independent of a particular source language or target machine,, Hany d i f f e r e n t intermediate representations have been developed as convenient tools f o r a particular translator development project. Such representations are often t a i l o r e d for a particular source language implementation, such as the ALGOL68 C compiler which uses 16 2C0DE [Gard 77], the PASCAL P compiler fNori 74] and the BCPL compiler which uses OCODE fRich 71, 74]. Usually these s p e c i f i c intermediate representations have an assembly language format. Such representations can provide enhanced p o r t a b i l i t y for the source language compiler, as i t i s easier to rewrite the translation of the intermediate representation into object code than i t i s to rewrite the entire source language compiler. Other compilers use spe c i a l i z e d i n t e r n a l data structures to simplify the compiler design, such as the BCPL Applicative Expression Tree [Rich 69], which i s used to represent the parse tree generated by the syntax analyzer. By contrast with such sp e c i a l i z e d representations GRAIL i s not designed for a pa r t i c u l a r source language. Rather i t i s designed for the family of ihighrleyel algorithmic languages. Although spec i a l i z e d intermediate representations such as OCODI can be modified and used i n compilers for source languages si m i l a r to th e i r o r i g i n a l a p p l i c a t i o n , the conversion of such sp e c i a l i z e d representations i s often more d i f f i c u l t than designing a new representation. By contrast, GRAIL i s a general intermediate representation, which i s designed to be p r a c t i c a l for many d i f f e r e n t translators without modification. Level. By analogy with programming languages, representations may be c l a s s i f i e d according to the i r l e v e l , or richness of data and control structures, A high-level representation i s 17 i s close to a programming language, wheras a low-level representation i s close to an abstract machine. Although the notion of l e v e l i s rather subjective, the l e v e l of an algorithmic language may be associated with the degree of abstraction, or implementation independence, of algorithms expressed i n the language. The process of t r a n s l a t i o n from a high-level language to a machine language may be decomposed in t o a series of translations v i a intermediate representations, with each step converting one representation to another at a lower l e v e l . Thus one translator may incorporate several successive intermediate representations., Representations such as QCODE and ZCOBE are at a low l e v e l as they resemble a machine language format. GBAIL i t s e l f i s arhigh-level representation, and could be described as an abstract programming language. It i s close to programming languages i n many of i t s semantic concepts, but i t i s designed as an i n t e r n a l representation f o r : use within translators rather than a tool for programmers., Although programming languages must be both readable and writable, no such r e s t r i c t i o n s need be placed upon intermediate representations used within a tra n s l a t o r . A (3) Semantic Content. , The semantic content of the representation i s a measure of how much of the program semantics i s e x p l i c i t included i n the representation d e f i n i t i o n . A representation with low semantic content i s merely a technigue f o r organizing the structure of the program within the tr a n s l a t o r . One common 18 intermediate representation for expressions i s pos t f i x notation, i n which each operator i s preceded by i t s operands. The advantage of such a representation i s that i t eliminates the need for parentheses, and can be d i r e c t l y translated into stack operations., Because postfix notation does not specify either the types of the operators or operands i t has a low semantic content. S i m i l a r l y , specialized representations based upon parse trees, such as the BCPL Applicative Expression Tree [ Rich 69], have a low semantic content as they usually rely upon an external symbol table to maintain relevant semantic information. By contrast GRAIL i s designed to have a high semantic content, as the GRAIL representation of the source language program e x p l i c i t l y contains a l l the semantic information represented by o r i g i n a l source language program.. GRAIL i s a general high-level intermediate representation, incorporating a l l the semantic information necessary to generate e f f i c i e n t code for the target machine. The primary design goal of GRAIL i s to simplify the task of developing adaptable translators for high-level algorithmic languages. One major advantage of adopting a general intermediate representation i s standardizing tr a n s l a t o r development. I f a p r a c t i c a l M u n i v e r s a l " intermediate representation for translators could be developed, then the e f f o r t involved i n developing new translators could be greatly reduced. This approach, of developing a universal computer language, was proposed as early as 1958 [Mock 58] to solve the "NxM tr a n s l a t o r 19 problem"., A straight-forward implementation of N languages on M machines requires one translator for each language/machine pair, i . e . , NxM tran s l a t o r s . , This number could be reduced to N+M i f a universal computer oriented language, or ONCOL, was available. One translator would be required to translate each of the N • languages into ONCOL, and one; translator would be reguired to translate ONCOL into code f o r each target computerv Steel [Stee 61] i s the: only author who gives any concrete proposals for implementing ONCOL, but the project appears to have been abandoned before a complete s p e c i f i c a t i o n of the language could be developed f S i b l 61 ] [Kost 73] fcole 74]. There; are many possible candidates f o r a universal intermediate representation, since on the o r e t i c a l grounds, any computation which can be expressed i n one of the f a m i l i a r languages can also be expressed i n any of the others, or programmed using Turing machines [Hulf 77]. This argument : ignores the issue of eff i c i e n c y . If an algorithm or source program i s to be translated i n t o an e f f i c i e n t object program, then i t i s essential that a translator for the source language minimizes the loss of semantic information which occurs i n the t r a n s l a t i o n process. Information i n a source language program may either be syntactic or semantic. For example, the sequence of characters used by the programmer to representan i d e n t i f i e r i s syntactic, but the scope and type of the i d e n t i f i e r i s semantic. If a universal intermediate representation i s to be e f f e c t i v e , then i t must be capable of representing languages with widely d i f f e r i n g data types, such as APL arrays and LISP l i s t s , without 20 reducing the p o s s i b i l i t i e s f o r e f f i c i e n t object code being generated for a wide range of d i f f e r e n t machine architectures.. Such a universal representation would need to be too complex to be feasible. Instead of attempting to develop a universal intermediate representation, a less ambitious but more successful approach i s to develop intermediate representations which are suitable for a r e s t r i c t e d class of source languages or target machines. GRAIL i t s e l f i s designed as a general intermediate representation for the class of high-level algorithmic languages. The class of high-level algorithmic languages compatible with GRAIL includes ALGOL60, PASCAL, FORT RANI, BASIC, PL/1, BCPL, C and ALGOL68., GRAIL i s not compatible with interpreted languages such as APL, SNOBOL and LISP. The primary c h a r a c t e r i s t i c s of the cl a s s of languages compatible with GRAIL are: (1) Data and control structures must be separate. Thus the source language cannot allow the user to evaluate, or execute, data. (2) Both the scope and type of variables should be s t a t i c a l l y determined by the syntax of the source language program,, (3) GRAIL does not provide a representation for p a r a l l e l programming constructs, or parametric types, though there i s no l i m i t a t i o n on GRAIL which would prevent these applications. 21 €ener,al; intermediate Representations for Algorithmic Languages One of the more common intermediate representations used i n translators are parse trees. Parse trees are universal i n the sense that every program which i s defined by a context-free grammar can be represented by a parse tree; however a parse tree represents only the syntactic structure, and c a r r i e s no e x p l i c i t semantic information. Because of th e i r widespread a p p l i c a t i o n , some attempts have been made to extend the concept of a parse tree to include semantic information (HcKe 76], In t h i s approach a parse tree i s transformed into an a seguence of abstract syntax trees for control structures and expressions. The primary difference between t h i s approach and GRAIL i s the lack of any standardization i n the tree structures used. The form of the abstract syntax trees must be defined f o r a parti c u l a r source language, and the technique of using a seguence of simple tree transformations may eliminate semantic information which could be used to generate more e f f i c i e n t code. The other approaches to general intermediate representations have usually been at a lower l e v e l than GRAIL, Perhaps the most widely used low-*level representation have been based upon three-address codes £Aho 77], i n which a source language program i s represented by a seguence of instructions of the ,:,f orm: •. A := B pj> c where A, B and C are i d e n t i f i e r s , constants or translator generated temporaries and op. i s an operator, such as +, * or goto for transfer of c o n t r o l . This representation i s useful for 22 machine,independent optimization, but i t has a low semantic content as neither the operations nor the data types are speci f i e d by the representation,, Another general intermediate representation whose objectives are related to those of GRAIL i s JANUS [Cole 74 ]. JANUS i s a symbolic representation designed to enable code generation to be separated from semantic analysis, A program representation i n JANUS consists of a seguence of symbolic i n s t r u c t i o n s for an abstract machine, whose organization i s designed to be suitable for a wide range of target machines. The JANUS representation f o r a procedure to calculate the square root of a p o s i t i v e real, number [Pool 76] i s given i n figure 3-1. BEGIN REAL PROC SQRT(1): •. Sqrt returns r e a l , 1 parameter SPACE REAL PARAM G92 0 . declare the formal parameter SPACE REAL LOCAL G93{) . declare a l o c a l variable LINO REAL PROC SQRT () . perform linkage duties LOAD REAL PARAM G9 2() . access the parameter value CMPNF REAL CONST G22 A 0E0. / JLT,I INSTR CODE G99<) . abort on negative argument LOC REAL VOID G98 . STORE REAL LOCAL G93() . save the sqrt estimate LOAD REAL PARAM G92 () . load the parameter value DIV REAL LOCAL G93() . divide by estimate END SQRT. Fiqure 3-1 A JANUS Procedure Unlike three-address codes, JANUS defines both the set of operators and operands used i n the representation., Thus, l i k e GRAIL, JANUS has a hiqh semantic content. The primary difference, between GRAIL and JANUS i s t h e i r l e v e l . .. JANUS i s a low-level representation, emphasizing machine independence, whereas GRAIL i s a hiqh-level representation 23 emphas iz ing s ou r ce l a n g u a g e i n d e p e n d e n c e . A t r a n s l a t o r c o u l d be des igned t o i n c o r p o r a t e both GBAIL and JANOS. I n i t i a l l y the source language would be t r a n s l a t e d i n t o a GBAIL r e p r e s e n t a t i o n , then GBAIL would be t r a n s l a t e d i n t o JA8US code, and f i n a l l y the JANOS code would be t r a n s l a t e d i n t o c o d e - f o r the t a r g e t machine. Such a t r a n s l a t o r would be highlyi adaptab le and capab le of g e n e r a t i n g r ea sonab l y e f f i c i e n t code f o r a wide range o f source languages and t a r g e t machines. 24 Defin i t i o n of Programming Languages /* "The programming language Tower of Babel i s well-known. Less discussed i s the Tower of MetaBabel, symbolic of the many ways programming languages are described and defined. The methods used range a l l the way from natural language to the ultramathematical. The former are subject to a l l the vagaries and inconsistencies that re s u l t from the use of normal prose; the l a t t e r freg.uen.tly. have t h e i r meaning hidden under abstruse notation." (Hare 76) */ Considerable e f f o r t has been devoted to developing technigues for defining programming languages; however there have been few attempts to compare and c l a s s i f y these [Hoar 73] [Marc 76]. Often the techniques adopted for language d e f i n i t i o n have not been related to the techniques adopted f o r language implementation, or defining a t r a n s l a t o r for the lanquage. Although GRAIL i s designed primarily for developing adaptable translators for high-level languages i t can also be used as a t o o l for language d e f i n i t i o n . , A h i g h - l e v e l l a n g u a g e construct can be defined by means of i t s GRAIL representation. y Conversely, GRAIL can be informally defined by analogy with ex i s t i n g high-level language constructs. This informal approach to d e f i n i t i o n i s used i n the description of GRAIL i n chapter 4, however there are many advantages to supplementing t h i s approach with formal d e f i n i t i o n s . The f i r s t major use of formal d e f i n i t i o n techniques was the syntax s p e c i f i c a t i o n of the ALGOL60 report [Saur 63]. Although the syntax of ALGOL60 was formally s p e c i f i e d , the semantics was not. The semantics was decribed partly i n terms of the actions to be associated with various program fragments and partly i n 25 terms of text manipulation operations. A formal d e f i n i t i o n of the semantics of ALGOL60 was given l a t e r by Landin [Land 65 1, based upon an interpreter f o r an abstract representation of ALGOL60 consisting of Imperative Applicative Expressions. The concept of basing a formal d e f i n i t i o n upon an interpreter f o r an abstract program representation reached i t s zenith with the Vienna Def i n i t i o n Language, VDL, developed for the programming language PL/I [ O l i o 74], The approach taken i n the formal d e f i n i t i o n of ALG0L68 was to develop a more powerful method of syntax s p e c i f i c a t i o n , »-grammars, and to express more of they semantics using these [Clea 77]. Recently a semiformal d e f i n i t i o n method developed from VDL has been used to define the PL/I standard [Abr 76]; however the PL/I standard has raised considerable controversy and has not been publicly accepted. The .., primary objection to the PL/I standard, as with other formal d e f i n i t i o n techniques, i s that the d e f i n i t i o n i s simply too hard to understand [Ledq 75] [Abr 76]. Various formal techniques have also been developed for implementinq tr a n s l a t o r s . The f i r s t formal approach to defininq translators was syntax-directed translation [Lewi 68, 74] [ Abra 73, 74]; however the formal model has generally been lim i t e d to simple applications such as source to source t r a n s l a t i o n [Lee 71]. A more recent , development . has been the application of att r i b u t e qrammars to defininq and implementinq translators [Knut 71]. Attribute qrammars were o r i g i n a l l y developed for the 26 d e f i n i t i o n of programming language semantics f'Knut 71] f Wiln 71], and then adopted i n the design of recent tra n s l a t o r writing systems [Boch 75]. attempts have been made to extend the formal d e f i n i t i o n of at t r i b u t e grammars to an implementation model f o r a translator writing system capable of generating p r a c t i c a l translators, but i t i s d i f f i c u l t to reconcile the reguirements of formal rigour with the language implementor*s desire for s i m p l i c i t y [ Raih 77]., The translator writing system used to implement GRAIL was based upon attributed t r a n s l a t i o n , but the TWS was not formally modelled. The p r i n c i p a l j u s t i f i c a t i o n for t h i s informal approach i s that the TWS was not a research objective of the thesis; rather i t was developed as an implementation t o o l to demonstrate the f e a s i b i l i t y of incorporating GRAIL int o adaptable translators. Before a formal d e f i n i t i o n technique can be incorporated into a translator i t must be simple enough to be automatically manipulated e f f i c i e n t l y . Although VDL uses a graphical program representation, as does GRAIL, the VDL representation for even simple programs may be extremely complex, as VDL uses a set of primitives and tree transformations developed for formal d e f i n i t i o n and manipulation. Although GRAIL i s not formally defined, the representation i s s u f f i c i e n t l y simple to be incorporated into a translator, h ; There i s an obvious need f o r formal d e f i n i t i o n techniques, but the current approaches are i n e f f e c t i v e for many p r a c t i c a l applications. 27 Chapter IV. GRAIL T h i s chap te r w i l l show how the ma jo r i t y o f data and c o n t r o l s t r u c t u r e s encountered i n h i g h - l e v e l a l g o r i t h m i c languages can be r ep re sen ted u s i ng GRAIL. Subsequent c h a p t e r s w i l l o u t l i n e the implementat ion of GRAIL, us inq a t r a n s l a t o r w r i t i n g system to generate- ' t r a n s l a t o r s f o r h i gh - leve l - . . a l go r i thmic languages which i n c o r p o r a t e an i n t e r m e d i a t e GRAIL r e p r e s e n t a t i o n . ; A GRAIL r e p r e s e n t a t i o n f o r a source language program c o n s i s t s of a r oo ted d i r e c t e d graph, c o n t a i n i n g no c y c l e s . Except f o r the r o o t node, eve ry node i n a GRAIL program r e p r e s e n t a t i o n must have a un ique parent . T h i s r e s t r i c t i o n upon program r e p r e s e n t a t i o n s e n f o r c e s a h i e r a r c h y , which enab les GRAIL r e p r e s e n t a t i o n s t o be t r a n s l a t e d by a s imple t r a v e r s a l o f the d i r e c t e d graph from the roo t node of the program. Subgraphs w i t h i n a GRAIL r e p r e s e n t a t i o n a r e r e f e r r e d to as s t r u c t u r e s . Non - te rmina l nodes i n the graph, r e f e r r e d t o as NOp.Es are GRAIL p r i m i t i v e i n s t r u c t i o n s . There a r e 13 p r i m i t i v e i n s t r u c t i o n s , or NODE t y p e s , i n GRAIL. Each o f these p r i m i t i v e s and t h e i r a p p l i c a t i o n s w i l l be d e s c r i b e d subsequent ly i n t h i s c h a p t e r . Te rm ina l s i n the graph, r e f e r r e d to as; i l S l i s , a re s t r i n g s d e f i n e d by the language implementor or s u p p l i e d by the user's: ' s ou rce program. ; Fo r example, the language implementbr must d e f i n e l a b e l s f o r the p r i m i t i v e data t ype s such as • i n t e g e r * or " r e a l * , and the user source program s u p p l i e s o ther l a b e l s such as i d e n t i f i e r names and. cons tan t s . Each GRAIL p r i m i t i v e , wi th the e x c e p t i o n o f L IST, has a f i x e d number of descendeht nodes, . r e f e r r e d t o as a t t r l b j o t g s . 28 The p r i m i t i v e NODE i s r e f e r r e d to as the parent of each o f i t s a t t r i b u t e nodes. In t h e subsequent d e s c r i p t i o n o f GBAIL a node i n a GRAIL s t r u c t u r e i s r e p r e s e n t e d by an i d e n t i f i e r d e l i m i t e d by ang le b r a c k e t s ( e , g . , t <node-label>).... NODE t y p e s a r e c a p i t a l i z e d , and t h e a t t r i b u t e s of a NODE appear as a pa ren the s i zed l i s t f o l l o w i n g the type. A GRAIL s t r u c t u r e can a l s o be v i s u a l i s e d us ing a diagram. For example the GRAIL s t r u c t u r e r e p r e s e n t i n g a v a r i a b l e d e c l a r a t i o n INTEGER x, y would b e ; QDEFINE F i g u r e 4-1 GRAIL S t r u c t u r e In t h i s s t r u c t u r e the DEFINE node i s the r o o t , w i th t h r e e a t t r i b u t e s » i n t * , LIST and * ? * . * i n t * and •?» a r e l a b e l s , and L IST i t s e l f has two a t t r i b u t e s : •x* and * y * . i •?* i s used t o r e p r e s e n t an unde f i ned o r empty a t t r i bu te . ; : In t h i s s t r u c t u r e i t co r re sponds to an unde f ined i n i t i a l v a l u e f o r the v a r i a b l e s . LIST i s the s i m p l e s t G1AIL p r i m i t i v e . I t p rov i de s a means o f d e f i n i n g a compound a t t r i b u t e c o n s i s t i n g of a sequence of 29 s u b a t t r i b u t e s . The number o f a t t r i b u t e s of a LIST node may e i t h e r be f i x e d by t h e sou rce language or determined from the source language program. The a t t r i b u t e s o f a LIST a re u s u a l l y o f the same NODE t y p e , and they are i n the same sequence as the c o r r e s p o n d i n g s t r u c t u r e s i n the source language program. LISTs are used t o d e f i n e both s e q u e n t i a l data and c o n t r o l r e p r e s e n t a t i o n s , such as a l i s t o f s tatements o r a l i s t of i tems i n a data r e c o r d . S i n c e l i s t s can appear as a t t r i b u t e s o f both data and c o n t r o l r e p r e s e n t a t i o n s t h e LIST p r i m i t i v e has no i n t r i n s i c semantic i n t e r p r e t a t i o n . The semant ic i n t e r p r e t a t i o n of the LIST depends upon t h e con tex t i n which i t appears , rf The n o t a t i o n f o r a LIST i s o f t e n a b b r e v i a t e d from LIST <<a t t r i bu te - 1 > , <attribute-2>, , , . , <attr ibute-n>> to « a t t r i b u t e - 1 > , <attr ibute-2>, <a t t r ibute -n>) or [<attribute>} , i f the number of a t t r i b u t e s i s v a r i a b l e . Thus one a t t r i b u t e o f a procedure i n GRAIL i s a LIST of f o rma l parameters : [ <<param-decl>j < p a r a m - t y p e » } each fo rma l parameter c o n s i s t s o f a two element L IST, or p a i r , d e f i n i n g the f o rma l parameter and i t s t y p e . To s i m p l i f y d iagrams o f complex GRAIL s t r u c t u r e s a l i s t node i s r ep re sen ted as an unmarked branch p o i n t . F o r example the p rev i ou s diagram of a v a r i a b l e d e c l a r a t i o n c o u l d be s i m p l i f i e d t o : 30 x y Figure a-2 Simplified 6BAIL Structure In any intermediate representation f o r a high-level language program there has to be a f a c i l i t y f o r associating program data names with t h e i r corresponding declarations and attributes i n the representation., In some representations, such as three-address codes [Aho 77 ], t h i s i s provided by an external symbol table.. As each dataiaaie or i i e n t i l i e r i s encoaatered i n the source program i t i s entered into the symbol table, and l a t e r occurrences of the i d e n t i f i e r i n the program w i l l reference t h i s table entry. The symbol table i s usually designed s p e c i f i c a l l y f o r the source language, as block structured languages which permit nested data environments require more elaborate symbol tables than languages such as FORTBAN. The GRAIL representation for a source language program does not assume the existence of an e x p l i c i t symbol table for associating variables with t h e i r corresponding; declarations. GRAIL uses REFERENCE nodes to l i n k an i d e n t i f i e r tc i t s 31 d e c l a r a t i o n i n t h e GBAIL program r e p r e s e n t a t i o n . Every data i d e n t i f i e r i n the source language program must have a co r re spond ing GBAIL r e p r e s e n t a t i o n f o r the data d e c l a r a t i o n . Thus i m p l i c i t d e c l a r a t i o n s of v a r i a b l e types such as FORTRAN p rov ide s must be r e p r e s e n t e d by an e x p l i c i t d e c l a r a t i o n i n the GBAIL program r e p r e s e n t a t i o n . Every v a r i a b l e and con s t an t which appears i n the source language program, must be r e p r e s e n t e d i n GBAIL by a REFERENCE t o a unigue NODE r e p r e s e n t i n g the data d e c l a r a t i o n . For example, a FOBTBAN statement sequence such a s : INTEGER X, y x =•: y + • 5 would be r e p r e s e n t e d by a DEFINE node f o r the e x p l i c i t d e c l a r a t i o n of x and y, toge ther with a DEFINE node f o r the i m p l i c i t d e c l a r a t i o n o f the cons tant 5. Each o p e r a t i o n , such as ass ignment or a d d i t i o n , i s r ep re sen ted by a d i s t i n c t OPCODE node (see be low). The two OPCODES i n the GRAIL s t r u c t u r e r e p r e s e n t i n g the ass ignment statement would then r e f e r e n c e the GRAIL d e f i n i t i o n s of x , y and 5 as f o l l o w s : 32 Figure 4-3 Data Reference i n GRAIL REFERENCE nodes have only one a t t r i b u t e , the NODE which they reference. In the diagram a REFERENCE node i s represented by an arc, or broken l i n e , from the variable to i t s corresponding declaration. The p r i n c i p a l advantage of defining the data types and i d e n t i f i e r scopes e n t i r e l y within the program representation i s that symbol table manipulation and i d e n t i f i e r scoping can a l l be defined i n terms of primitive operations upon the graphical program representation. The majority of translators enforce scoping by entering a d d i t i o n a l information into the symbol table entries for i d e n t i f i e r s encountered i n parsing the source language program. Thus the scope of i d e n t i f i e r declarations i s usually i n f l e x i b l y incorporated into the design of the translator. By contrast, a translator incorporating a GRAIL representation can perform general symbol table manipulation by means of tree searching operations upon the program representation. For example, in BCPL, i d e n t i f i e r s declared as 33 global in the source language program can be attached to the outermost BLOCK of the GRAIL program representation, and l o c a l i d e n t i f i e r s can be attached to the most recent BLOCK. I d e n t i f i e r s encountered i n parsing a source language expression must then REFERENCE either a l o c a l or global declaration i n the GRAIL representation, and t h i s declaration can be located by matching the i d e n t i f i e r name against the l o c a l and global graphical representations for the declaration LISTs. Since GRAIL i s used for t r a n s l a t i n g high-level algorithmic languages, d i f f e r e n t categories of primitives are used f o r defining the data and control structure representations of the source language. GRAIL cannot be used to represent languages such as LISP which permit the user to evaluate data structures, as i t distinguishes the representation adopted f o r data structures from those adopted for control structures. 34 GB1IL Data Structures Every algorithmic language provides a basic set of simple data types, such as integers, r e a l s or booleans, that can be regarded as atomic. Such simple data types cannot be e f f e c t i v e l y represented using other simpler types, and operations upon simple types are regarded as functions provided by the target machine. The simple types are clos e l y associated with the data types and Instructions of the tarqet machine. For example, the majority of machines provide a basic i n s t r u c t i o n set including operations upon fixed length integers such as •add*, 'complement1, ^multiply* and 'divide*. Usually the machine representation of such simple data types does not influence the code that the translator generates, as an operator i n the source language such as **• w i l l be d i r e c t l y translated into an addition i n s t r u c t i o n , disregarding the p o s s i b i l t y of that an exceptional condition such as an overflow may occur at run time. Often the architecture of the machine influences the choice of simple data types; f o r example the machine i n s t r u c t i o n set may provide operations upon str i n g s of characters such as 'move* and 'mask*. In t h i s case st r i n g s could be regarded as a simple data type, instead of being represented as an arra y of characters., Some machines also provide in s t r u c t i o n s for manipulating extended precision fixed and f l o a t i n g point numbers, corresponding to declarations of long i n t and lonj^-real-variables in ALG0L68 i The source language also influences the choice of simple 35 data types; for example a suitable set of simple data types for PASCAL would be 'integer', ' r e a l ' , 'boolean' and 'char*, whereas a typeless language such as BCPL would have only one simple data type ' c e l l * . Thus for each source language and target machine there i s a suitable set of simple data types, and associated operations, that can be regarded as i n d i v i s i b l e by the tra n s l a t o r . The majority of high-level languages also provide a variety of constructs for defining and manipulating compound data types. Operations upon compound data types such as inverting a matrix or sorting a l i s t are usually performed by source language procedures written by the user, which manipulate the elements of the compound data type, rather than operators supplied by the source language. The source language provides the c a p a b i l i t y to access the elements of a compound data type, such as indexing an array, so the translator must provide both a machine representation for compound data types, and the a b i l i t y to address-elements within the compound type. GRAIL provides two primitives f o r representing compound data structures, ARRAY and CLASS. These two primitives correspond to the two methods of address c a l c u l a t i o n for elements of the compound data structure: run--time and compile-time address c a l c u l a t i o n respectively. < The forms of the array and c l a s s representations i n GRAIL are: ARRAY «a-element>,<a-label>,<a-dimensions>;<a~bounds>) CLASS(<c-t ype>,<c-la bel> *<c-form >) The attributes of an ARRAY type are the GRAIL data type of 36 the ARRAY elements, <a-element>; the ;label for the type declaration used by the source program, <a-label>; the number of dimensions, <a-dimensions>; and the l i s t of lower and upper bounds for each dimension, <a-bounds>. The attributes of a CLASS type are the label for the source language data type, <c-type>, such as "record" or "subrange"; the l a b e l for the type declaration used by the source program, <c-label>; and the GRAIL representation for members of the CLASS, <c-form>. An example of the use of the CLASS primitive i s the GRAIL representation for a PASCAL record. A PASCAL declaration such as: TYPE complex = RECORD x, y: RIAL END; i s represented in figure 1-4'.by a •record 1 CLASS, rather than ARRAY, as the address of x and y can be assigned s t a t i c a l l y by the translator., 37 CLASS Figure 4-4 Record Representation i n GRAIL There have been several approaches to formalizing the d e f i n i t i o n of data structures, such as Hoare•s composition rules [ Hulf 77 ], but these approaches have not been implemented i n programming languages. Another c l a s s i f i c a t i o n of data types i s based upon data types encountered in high-level programming languages: arrays, structures, references, subranges and enumeration [Cole 74], The c l a s s i f i c a t i o n of data types into arrays and classes i s based upon the two technigues used by a translator: run-time and cOmpile-tiffle addressing. The GRAIL primitives CLASS and ARRAY can be regarded as the standard representation f o r heterogenous and homogenous data structures i n the source language respectively. Both CLASS and ARRAY can be used recursively to represent compound data types i n high-level languages. For example i n PASCAL the declarations: TYPE asize = 1.. 10 ; 38 matrix = ARRAY [ a s i z e ] OF INTEGER ; would be translated into the GRAIL representation: CLASS ARRAY subrange asize / \ int matrix 1 asize 10 Figure 4-5 Compound Data Type Representation Neither the CLASS or ARRAY primitives define variables of t h e i r type; t h i s function i s provided by the DEFINE primitive in GRAIL. The CLASS and ARRAY primitives must be used whenever the source language program i m p l i c i t l y or e x p l i c i t l y defines a compound data type. 39 GfiAIL Control Structures GBAIL provides a small set,of p r i & i t i v e control structures that the language implementor can use to define representations for the majority of control structures encountered i n high-level algorithmic languages. Like GBAIL data structures, the GBAIL control structures can be composed to define recursive representations for complex control structures encountered i n source language programs. The simplest GBAIL control structure i s the BLOCK, which consists of a sequence of GBAIL declarations, followed by a sequence of GBAIL statements. GRAIL i s analogous to most high-l e v e l languages, as structures e i t h e r define the flow of control, or the corresponding data environment.,, Primitives which define the environment are referred to as declarations, and primitives which define the flow of control are referred to as statements. The GBAIL primitive statements are BLOCK, ENTER, EXIT, OPCODE, INVOKE and RETURN, and the GRAIL primitive declarations are BLOCK, PROCEDURE and DEFINE. Thus the syntax of a block i s : BLOCK < {<declaration>},{<statement>}) The usual i n t e r p r e t a t i o n o f a BLOCK representation i s that the; GBAIL statements are translated to a sequence of executable instructions i n the target language, and the declarations are translated to define the storage a l l o c a t i o n for the target machine. Statements which appear within the BLOCK may reference any declarations i n an enclosing BLOCK. The BLOCK representation i n GBAIL provides a general scope 40 for data declarations s i m i l a r to that defined i n ALGOL 68, since BLOCKS can be nested to an arbit r a r y depth, and a BLOCK may be used to define a new data environment at any l e v e l . Compound statements in high-level; languages, such as DO .«, ^  • END i n FL/I, can be represented by a BLOCK with a void l i s t of declarations. Thus a PL/I compound statement such as: DO x = 1 ; y=x END; i s represented by the GRAIL structure: BLOCK x 1 := y x Figure 4-6 GRAIL Representation of a Compound Statement GRAIL statements are executed i n seguence unless control i s transferred by means of the EXIT or ENTER primitives. They each have two attributes, a REFERENCE to the structure to which control i s to be transferred and an optional return value expression: EXIT (REFERENCE (<control-node>) , <return-expression>) ENTER (REFERENCE (<control-node>)h, <return-expression>) An ENTER primitive passes control to execute the <control-node> next, whereas an EXIT passes control to the node i n the program representation following the <control-node>. The 41 <return--expressioii> i s used i n expression oriented languages such as ALGOL68 and BCPL to represent the value returned by a control structure such as- -if,..then... else. & .,fx•- i n ALG0L68. Direct transfer of control, such as a GOTO in FORTRAN, i s represented by an enter. For example the program fragment: 10 x=1 GO TO 10 would be represented by: OPCODE ENTER Figure 4-7 Transfer of Control i n GRAIL EXIT and ENTER are used for i t e r a t i o n , as an EXIT to a block which represents a loop construct i s used for loop termination, whereas an ENTER i s used to represent continuation of the next i t e r a t i o n . For example the simple BCPL loop: $( c1; c2;... ; BREAK; ... $) REPEAT could be represented by the GRAIL structure: 42 BLOCK Figure 4-8 GBAIL Re p r e s e n t a t i o n of I t e r a t i o n Most a l g o r i t h m i c languages p r o v i d e a s e t o f o p e r a t o r s and f u n c t i o n s c o r r e s o n d i n g t o the simple data types of the source language., Although the m a j o r i t y of languages i n c l u d e the simple a r i t h m e t i c o p e r a t o r s such a s * + *, * - * , • * * a n d V , t h e r e i s a wide v a r i a t i o n amoung the other o p e r a t o r s d e f i n e d by the source language. Languages such as FORTRAN only provide a l i m i t e d s e t of a r i t h m e t i c , r e l a t i o n a l and l o g i c a l o p e r a t o r s , whereas C f J o h 78 ] provides an e x t e n s i v e s e t of o p e r a t o r s such as ••=• and '=•• t o f a c i l i t a t e e f f i c i e n t implementation of system software. Other o p e r a t o r s are u s u a l l y provided f o r a c c e s s i n g elements of an a r r a y , and f o r r e f e r e n c i n g and d e r e f e r e n c i n g . A l l the o p e r a t o r s s u p p l i e d by the source language ate represented by the OPCODE p r i m i t i v e i n GRAIL: OPCODE (<opcode-name>,{<operand>}) The <opcode-name> i s s u p p l i e d f o r the source language, and must s p e c i f y both the types o f i t s operands and the type c f the r e s u l t . In many languages one symbol such as i s used t o 43 denote s e v e r a l d i f f e r e n t o p e r a t i o n s depending upon the type of i t s operands. In GRAIL each o f these o p e r a t i o n s must have a separate <opcode-name> such as • • i n t ' and • • r e a l ' . The .GfiAIL r e p r e s e n t a t i o n f o r a FORTRAN expre s s i o n with i m p l i c i t type both the e x p l i c i t and i m p l i c i t o p e r a t o r s must be d e f i n e d f o r the GRAIL r e p r e s e n t a t i o n by the language implementor. The m a j o r i t y of h i g h - l e v e l languages possess c o n d i t i o n a l i • • c o n t r o l s t r u c t u r e s , such as ' i f * and 'case 1 c o n s t r u c t s , . These are represented i n GRAIL by the s e l e c t p r i m i t i v e : SELECT <<s-type>,<s-expression>,{(<s-value>,<s-control>)}) <s-type> s p e c i f i e s the data type of the branch l a b e l s or <s-value>s. An • i f * ... •then 1 ... c o n s t r u c t would be represented by a s e l e c t p r i m i t i v e with a 'boolean* <s-type> and • t r u e 1 , ' f a l s e ' <s-value>s. A 'case' c o n s t r u c t would be represented by a s e l e c t p r i m i t i v e with an ' i n t e g e r ! <s-type> and i n t e g e r e x p r e s s i o n s f o r <s-value>s. The semantic i n t e r p r e t a t i o n of a SELECT statement i s c o n v e r s i o n , X=N*Y*2, i s given i n f i g u r e 4-9, The precedence of F i g u r e 4-9 GRAIL Re p r e s e n t a t i o n of E x p r e s s i o n s 44 sim i l a r to an extended 'case! statement in PASCAL. The <s-expression> i s evaluated, then compared with each of the <s-value>s in seguence., I f a match occurs the corresponding <s-control> i s executed, and then the SELECT i s exited. Thus the GRAIL representation for a BCPL conditional statement such as: SWITCHON ch INTO $< CASE " *s" : CASE "*t«: neg : = FALSE CASE "-«•: neg := TRUE $) has the following GRAIL representation; Figure 4-10 GRAIL.Representation of a Case statement 45 GBAIL Data Declarations Every variable and constant appearing In a source language program must have a corresponding d e f i n i t i o n i n the GBAIL program representation. Such declarations are represented by the DEFINE pr i m i t i v e : DEFINE (<d-type>, {<d- name>} , f<d-value>}) The <d-type> s p e c i f i e s the data type of each of the program variables <d-name>, and the LIST {<d-value>} s p e c i f i e s t h e i r i n i t i a l values. The data type may be a simple data type such as •integer* or ' r e a l * , or a BEJEBENCE to a compound data type represented by a CLASS or ABBAY. One of the most d i f f i c u l t v h i g h r l e v e l language constructs to represent i s the procedure. Although the majority of high-level languages provide procedures there i s a wide v a r i a t i o n i n th e i r c h a r a c t e r i s t i c s , such as the mechanisms for passing parameters, returning values and establishing the data environment of the procedure. There are three p r i n c i p a l methods used f o r passing parameters: c a l l by reference, c a l l by value, and c a l l by name. Because of the semantic, complexity of implementing; c a l l by name as an i n t r i n s i c i n GBAIL, only the former two c a l l i n g environments are provided. C a l l by name can be represented by using a parameterless procedure, or "thunk" [Prat 751, which i s evaluated whenever the corresponding parameter i s used. Other parameter passing techniques such as c a l l by r e s u l t can be represented using addi t i o n a l structures for evaluating parameters upon procedure entry and exit. 46 Some author s have attempted to d e f i n e a l o w - l e v e l machine independent procedure c a l l mechanism f o r t r a n s l a t i o n ( R a i n 74], but i t i s d i f f i c u l t to p rov i de an e f f e c t i v e lows- leve l r e p r e s e n t a t i o n as hardware f a c i l i t i e s f o r s u b r o u t i n e l i n k a g e and s t a t u s sav ing vary w ide ly . The procedure r e p r e s e n t a t i o n p rov ided by GRAIL i s : model led upon PASCAL, as a PROCEDURE p r i m i t i v e i n GRAIL c o n s i s t s of a procedure name, t y p e , f o rma l parameter d e c l a r a t i o n s f o l l o w e d by the body of the p r o c e d u r e ; PROCEDURE(<proc-name>, <proc- type>, ((< pa r a m- dec 1>, < pa ra a-1 y pe>) | , B LOCK): Each fo rma l parameter c o n s i s t s of a p a i r , to d e f i n e the f o r m a l parameter, <param-decl>, and a l s o i t s c a l l i n g env i ronment , <param-type>, a s ' r e f e r e n c e * , •va lue* o r ' r e s u l t * . There i s no r e s t r i c t i o n i n t h e d e f i n i t i o n on t h e range o f GRAIL data types which a r e p e r m i s s i b l e a s f o rma l parameters.. For example complex da ta s t r u c t u r e s such as CLASSes and ARRAYS c o u l d be passed by v a l u e . The <proc-type> r e p r e s e n t s both t h e type of the value r e t u r n e d and the environment of the p rocedure . Thus the r e p r e s e n t a t i o n f o r an i n t e g e r f u n c t i o n i n FORTRAN would have a <proc-type> { ' i n t e g e r ' , ' s t a t i c ' ) t o i n d i c a t e t h a t the procedure was not r e c u r s i v e . The seguence of d e c l a r a t i o n s and s tatements i n the f u n c t i o n would be r e p r e s e n t e d by the BLOCK a s s o c i a t e d w i th t h e procedure. Thus the r e p r e s e n t a t i o n f o r the FOHTRAN program: 47 INTEGER FUNCTION lmax ( l i s t , n) DIMENSION l i s t (n) lmax=1 BET U EN 1 n t ? 1 i A „ Figure 4-11 GEAIL Representation of FOR TEA N Functions The GRAIL primitive f o r representing a procedure c a l l i s INVOKE: INVOKE (REFERENCE (<procedure>),{<actual-param>)} The actual parameters to a source language procedure c a l l may either be i d e n t i f i e r s or expressions. These are represented i n GRAIL by REFERENCES and control structures respectively. i The GRAIL primitive f o r returning from a procedure has the same form as an EXIT; however i t s semantics are more complex as the o r i g i n a l environment prior to the INVOKE of the procedure must be restored: RETURN (REFERENCE (<proC€dure-node>)><return-expression>) Because high-level language transput f a c i l i t i e s are highly dependent upon both the source language and the interface provided to the operating system environment, GRAIL does not 48 provide an independent set of primitives for transput. There are two possible approaches to representing high-level transput f a c i l i t i e s i n GRAIL. Either the language implefflentor can provide a l i b r a r y of external GRAIL procedures for applications such as formatted transput, or, i f the source language transput f a c i l i t i e s are simple and stream oriented, they can be provided by opcodes defined for the source language. Thus a PASCAL unformatted read statement READ(i, j , k) ; i could be represented by an 'input 1 OPCODE as follows: i : j k Figure 4-12 Representation of a PASCAL READ The preceding sections have described each of the GRAIL primitives and th e i r applications. Since none of the GRAIL primitives can be e f f e c t i v e l y defined as a composite of the remaining primitives, the set of primitives i s minimal. The set of primitives i s also complete, as any of the data and control structures encountered i n common high-level algorithmic languages can be represented using GRAIL. This assertion i s extremely d i f f i c u l t to v e r i f y , because of the lack of either a «9 formal d e f i n i t i o n f o r GBAIL or a formal scheme for c l a s s i f y i n g programming languages. ' The majority of authors have c l a s s i f i e d high-level languages on the basis of t h e i r area of application i [ S a w 76]; however t h i s approach i s inapplicable to the design of translators. Although both APL and PL/I are c l a s s i f i e d as general purpose programming languages, t h e i r t r a n s l a t o r s have l i t t l e i n common. In spite of the d i f f i c u l t y of verifying the completeness of GBAIL, the set of p r i m i t i v e s can rea d i l y be extended to include new programming language constructs. New programming languages have been proposed which incorporate data and c o n t r o l structures for p a r a l l e l prcgrammming and modular program design [wulf 77"}. P a r a l l e l programming constructs can be represented by primitives such as FORK, JOIN and QOIT, which could be incorporated into the present set of GBAIL primitives. Although the design of GRAIL was not based upon e x t e n s i b i l i t y , t h i s objective i s a natural consequence of the primary design goal of adaptability. i An index to the GBAIL pri m i t i v e NODE types, together with t h e i r application i n defining a representation for ALGOL Si, are given i n Appendix A, 50 Chapter V. Implementation of a Source to GRAIL Translator GRAIL provides the language implementor with an intermediate semantic representation to simplify the design and implementation of a modular translator f o r the User's Source language, referred to as the USL. However, without any assistance to simplify the process of writing a t r a n s l a t o r , an implementor already f a m i l i a r with the USL and target machine i s unlikely to be convinced to incorporate an unfamiliar intermediate representation such as GRAIL into the design of the translator. Although there have been a number of tools developed for generating translator components such as parsers, the implementor i s usually required to write the ent i r e semantic and cede generation phases of the compiler by hand, based upon the desire to get an e f f i c i e n t translator operational as soon as possible as a 'one shot 1 project. Thus, the semantic phase of the t r a n s l a t o r i s often loosely structured and d i f f i c u l t for even the implementor to debug or modify. To overcome the implementor*s natural prejudice for ad hoc semantics, i t i s necessary to provide sophisticated tools to simplify the process of implementing t r a n s l a t o r s which incorporate f l e x i b l e semantic representations such as GRAIL. Given the vast amount of i n d i v i d u a l e f f o r t devoted to writing and rewriting translators, i t i s not d i f f i c u l t to j u s t i f y a comparable e f f o r t to simplify t h i s process. 51 CjUEST has been designed as a powerful and f l e x i b l e language for specifying the translation of high-level algorithmic languages into GRAIL. The advantage of defining the t r a n s l a t i o n process using QUEST i s that a major component cf the USL trans l a t o r , which converts a USL program into an intermediate GRAIL representation, can be automatically generated using the M l f E Translator Writing system, LANCE provides the language implementor with a set of too l s to reduce the e f f o r t reguired to develop a translator for the USL, Although there i s an i n i t i a l overhead for the implementor to become fa m i l i a r with the system, t h i s i s compensated by the reduced e f f o r t reguired to modify or rewrite the t r a n s l a t o r for either a new: USL or target machine. 'A program written i n QUEST defines both the syntax and the associated GRAIL semantic representation of the USL. The syntax of the USL i s defined by means of extended BNF productions, and the semantics of the USL i s defined by QUEST semantic procedures associated with each of these syntax productions. The following sections outline how the implementor can use LANCE to generate a modular USL t r a n s l a t o r which incorporates GRAIL as an intermediate semantic representation. As a t u t o r i a l example of such an implementation using LANCE, the following two chapters both use the production of a translator from ASPLE to IBM 370 Assembler G. ASPLE i s an ALGOL-like language of intermediate complexity, developed at the University of C a l i f o r n i a , Los Angeles, as a convenient language to i l l u s t r a t e ] 52 comparative techniques of formal semantics [Clea 77]. ASPLE was used i n preference to a more widespread and sophisticated language such as PL/1 primarily because an attempt to demonstrate the application of LANCE for such a language would not be productive; the complexity and length of the grammar would obscure the o v e r a l l application of LANCE to the design of the translator. The semantics of ASPLE have been precisely and compactly defined, and ASPLE contains most of the major features present i n high-level algorithmic languages, including declarations of recursive modes, conditional and i t e r a t i o n statements for flow of c o n t r o l , and assignment and transput statements. A complete, d e f i n i t i o n of the syntax and semantics of ASPLE i s given by [Clea 77]. Appendix E contains the complete d e f i n i t i o n of the t r a n s l a t i o n of ASPLE to the intermediate GRAIL representation (using QUEST) together with the translator modules for ASPLE generated by the QUEST compiler (referred to as INQUEST) supplied by LANCE. Appendix F contains a source ASPLE program, and the GRAIL representation generated by the translator., LANCE Although LANCE i s designed to be as f l e x i b l e as possible, i t enforces a basic protocol for the design of a USL translator upon the implementor. A USL translator constructed using LANCE consists of a central Inter-Module Controller, referred to as an IMC, which co-ordinates a set of procedures, referred to as 53 I2P.2!:!S# each of which i s invoked by the IMC for a different function i n the translator. The IMC, which i s written in ALGOLW, i s a fi x e d component of any DSL trans l a t o r generated using LANCE. There are two classes of MODULES: those which are independent of the OSL, supplied as i n t r i n s i c MODULES by LANCE (e.g., the LISTER), and those which must be generated for each USL implementation, (e.g., the USL PARSER). A diagram of the MODULE organization of a USL tra n s l a t o r generated using LANCE i s contained in Appendix B, together with a brief description of the functions performed by each MODULE. The implementation of LANCE r e l i e s upon another translator writing system, TRUST, the translator writing u t i l i t y system, developed at the University of B r i t i s h Columbia. TRUST provides f a c i l i t i e s to generate l e x i c a l scanners, lookup routines, parsers and semantic MODULES which are used as component MODULES by LANCE,, An outl i n e of the MODULE generators supplied by TRUST i i s contained i n Appendix B, together with a description of the IMC communication protocol between MODULES. A more complete description i s contained in the "TRUST user's Guide" (vene 76]. The implementor uses LA8CE to generate a tran s l a t o r for some USL by the following process: ( 1 ) I n i t i a l l y , the language implementor must decide upon the GRAIL semantic representation for each of the data and control structures in the USL, following the d e f i n i t i o n of the ASPLE translator i n Appendix E and the examples from 54 appendix A. In many cases the representation i s not unique, but r e f l e c t s the target machine or implementation strategy. For example, the implementor may choose to represent strings i n the DSL either as ARRAYS of the primitive data type 'character' or as a primitive data type •string* with a single 'integer* attribute 'length*. (2) Having decided upon the GRAIL representation f o r each USL construct, the implementor must define t h i s t r a n s l a t i o n process by means of a QUEST program. The implementor can develop the QUEST program from a BNF d e f i n i t i o n of the USL syntax. Each construct i n the USL i s usually defined by a sequence of recursive BNF productions, such as a statement l i s t i n ASPLE: <stm-train> (<stm-list>,<dcl-list>) : <statement> % % ; <statement> 'semicolon* <stm-train> In ASPLE the syntax of a <stm-train>, or statement l i s t , i s recursively defined by the two productions above as a sequence of one or more <statement>s, separated by semicolons. <statement> i s a non-terminal i n the ASPLE syntax so i t appears on the l e f t hand side of a l a t e r seguence of productions. To build the GRAIL structure for each construct i n the USL, the language implementor must attach a semantic procedure written i n QUEST to the corresponding productions in the BNF d e f i n i t i o n of the syntax of the construct. For example, a program i n ASPLE i s represented as a GRAIL BLOCK: 55 <program> (<program-list>) : •begin* <dcl-train> 'semicolon* <stm-train> 'end* % structure <dcl-list>,<stm-list> ; <program-code> := 'block' (<dcl-list>:•list*,<stm-list>:»list') ; traverse <dcl-train> (<dcl-list>) ; traverse <stm-train> <<stm-list>,<dcl-list>) % . In the above example, the syntax of an ASPLE block i s defined by the production as a <dcl-train>, or declaration l i s t , followed by an <stm-train>, or statement l i s t . The QUEST semantic procedure associated with the production i s delimited by *%*s. The d e t a i l s of the functions performed by the semantic procedure are described in la t e r sections. The semantic procedure builds a BLOCK with void declaration and statement l i s t s , then descends the parse tree to generate the GBAIL structures for the declarations and statements. In many cases the BNF syntax may need to be modified or expanded to simplify building the GRAIL structure. For example, a l i s t of expressions i n the BNF syntax d e f i n i t i o n of ALGOL appears on the l e f t hand side of productions defining both procedure c a l l s and records. Although both expression l i s t s could be defined by the same syntax production, the associated semantic procedures w i l l d i f f e r . J (3) The QUEST program defining the translation of the USL into GRAIL i s compiled by INQUEST, a MODULE generator supplied by the LANCE THS. ' When the QUEST program i s compiled by INQUEST, a series of output f i l e s are generated which are used by the implementor to generate the PARSER, SCANNER, LOOKUP and SEMANTIC MODULES for the USL translator, (a) The USL Parser F i l e . This f i l e contains the BNF productions from the USL syntax s p e c i f i c a t i o n i n the QUEST program. The Parser 56 f i l e must be compiled by the TRUST Parser Generator to produce the PARSER MODULE for the USL. The TRUST Parser Generator also produces a l i s t of the terminal symbols, and their i n t e r n a l numeric codes i n the USL translator. The implementor must write a scanner for the USL which generates the corresponding numeric codes f o r each input lexeme i n a USL program. The scanner can be produced using the TRUST Scanner Generator, which provides the language implementor with the means of defining l e x i c a l symbols as regular expressions over an alphabet of characters f Vene 75]. (b) The USL Lookup F i l e . This f i l e contains a l i s t of each of the strings i n the BNF syntax for the USL. The f i l e i s used to construct a lookup table for reserved words i n the USL transl a t o r , by associating the numeric parser code with each of the lookup terminal symbols i n the USL syntax. The f i l e i s then input by the implementor to the Lookup Generator to produce the LOOKUP MODULE f o r the USL tra n s l a t o r . (c) The USL Semantic F i l e . This f i l e contains the semantic procedures associated with each of the productions i n the USL Parser F i l e . This f i l e must be compiled by the TRUST Semantics Generator, to produce the SEMANTIC MODULE for the USL translator. i 57 (4) The USL translator i s composed of the IMC, concatenated with the f i l e of l i b r a r y MODULES, together with the MODULES generated for the USL by the preceding process. ft USL program which i s input to the translator w i l l be converted into the corresponding i n t e r n a l GRAIL semantic representation provided that no f a t a l syntax errors are detected by the parser. The GRAIL semantic representation i s then translated into output for the target machine by the code generation phase, described i n the subseguent chapter. QUEST Description This section outlines QUEST from the viewpoint of the language implementor f a m i l i a r with the USL and GRAIL, and the structure of the modular USL tran s l a t o r generated by LANCE (Appendix B). QUEST i t s e l f i s a programming language s p e c i f i c a l l y designed for defining the syntax and semantics of the USL. QUEST programs are input to INQUEST, the QUEST compiler supplied by LANCE, to bui l d the MODULES for the USL translator., Although the implementor must learn *yet another language* before using QUEST to implement a USL tr a n s l a t o r , there i s strong motivation for developing such a new language to define the USL semantics: (1) QUEST i s a simple language which has l i t t l e i n common with the majority of high-level algorithmic languages f a m i l i a r 58 to the USL implementor. The majority of data and control structures available in high-level languages are unnecessary i n QUEST, and i f available i n QUEST would encourage the implementor to define more complex GRAIL semantic representations. QUEST constrains the language implementor to adopt a simple GRAIL semantic representation for the USL. (2) Because QUEST i s d i r e c t l y defined i n terms of str i n g s in the low-level macroprocessing language TOSI fAbra 76], i t i s possible to define a QUEST translator which i s independent of the p a r t i c u l a r implementation. I f a high-level language was used in place of QUEST, then the GRAIL semantic representation of the USL would be dependent upon the implementation of the high-level language. Since GRAIL i s designed to simplify the semantic d e f i n i t i o n of translators for high-level languages, i t i s d i f f i c u l t to j u s t i f y adopting a high-level language of comparable complexity to define the tr a n s l a t i o n process. .„ The complete BNF syntax for QUEST i s contained i n Appendix D. QUEST programs incorporate both the syntactic and semantic d e f i n i t i o n of the USL, Although the syntax and semantic d e f i n i t i o n are c l o s e l y linked, i t i s possible to separate consideration of these two aspects of language d e f i n i t i o n in QUEST. I n i t i a l l y , the implementor using QUEST defines the BNF syntax of the language, then the semantic procedures can be written to generate the GRAIL structures corresponding to the 59 DSL constructs. This process of separating the syntactic and semantic d e f i n i t i o n allows the language designer to consider the external representation of the language constructs independently of their semantics., In the process of extending the syntactic d e f i n i t i o n to include the semantics, i t may be convenient or necessary to expand the set of syntax productions. Freguently a syntactic construct such as a l i s t of i d e n t i f i e r s may have several d i f f e r e n t semantic interpretations depending upon context. Because QUEST does not provide a large set of control structures, i t i s d i f f i c u l t to b u i l d complex semantic procedures for concise sets of syntax productions. Thus the language designer should not necessarily attempt to minimize the number of productions i n the syntactic d e f i n i t i o n of the USL, as t h i s may increase the complexity of the associated semantic procedures unnecessarily. An opposite problem occurs when a number of constructs i n the source language with different syntax share a common semantic interpretation, f o r example while do and d,o„ u n t i l . • • In t h i s case the common code i n the semantic procedures associated with the separate syntax productions can be u n i f i e d i n a global semantic procedure. For example, in the d e f i n i t i o n of the ASPLE tra n s l a t o r , there are several occasions i n which the compiler must search f o r an i d e n t i f i e r i n the declaration l i s t to check for duplicate i d e n t i f i e r s i n generating code for declarations 60 and in establishing references to i d e n t i f i e r s and t h e i r modes i n generating code f o r statements. ; Thus, the QUEST d e f i n i t i o n of the ASPLE translator defines the global procedure search-id: <search-id> (<id-mode>, <id-ref >, <dcl-list> ,<id-name>) : % case <dcl-list> • s u b l i s t ' (•define* (<id-mode>:, •sublist* (<id-ref>:<id-name>))) : e x i t e l s e : <id-mode> := *undef* ; <id-ref> := 'undef* ; e x i t endcase S . The semantic functions performed by t h i s procedure are explained i n d e t a i l in subsequent sections. Syntax Definition Using QUEST The syntax of QUEST i s represented below i n a variation of BNF. QUEST keywords are boldfaced, non-terminals are ca p i t a l i z e d and explanatory comments are enclosed i n quotation marks ("). The notation for QUEST BNF productions i s described i n Appendix D. Since QUEST i s used to define the syntax of a USL, there must be a mechanism for d i f f e r e n t i a t i n g the syntax of a USL from that of QUEST. Thus, non-terminals, or i d e n t i f i e r s , of the USL consist of any s t r i n g of characters delimited by angle brackets (< >). Terminal symbols and reserved words, referred to as 61 st r i n g s , are delimited by apostrophes (*), and t h e i r external representation i s output to the lookup f i l e . The QUEST BNF d e f i n i t i o n of USL syntax productions i s as fellows: PRODUCTION-SPECIFICATION ::= id ("semantic parameters") : {PRODN-RHS ;} PRODN-RHS . PEODN-RHS ::= r ELEMENT T U r U %"semantic procedure"5£ «- ( (ELEMENT} ) | | i d -» L * J ELEMENT ::= r s t r i n g T I i d I i d . i d J The format of productions i n QUEST follows the usual conventions for the BNF d e f i n i t i o n of a context-free grammar. Each non-terminal i d e n t i f i e r i n the USL grammar i s defined by i t s appearance as the l e f t hand side of a QUEST production s p e c i f i c a t i o n . The rig h t hand side of the production s p e c i f i c a t i o n consists of alternative USL production right hand sides, each of which consists of a l i s t , possibly void, of production elements. Production elements may either be i d e n t i f i e r s , strings or l i s t elements. Strings must be terminal symbols or reserved words i n the USL syntax, and cannot be referenced by the semantic procedures. I d e n t i f i e r s are either USL non-terminals, or terminal symbols which are referenced by the semantic procedure such as a USL * i d e n t i f i e r * whose external 62 representation i s used by the semantic procedure. L i s t elements consist of a r e p e t i t i o n of a sequence of i d e n t i f i e r s or st r i n g s that occur i n the USL program syntax. The r e p e t i t i o n factor associated with the l i s t may be *** (0 or more) or *•* (1 or more), and the sequence i s i d e n t i f i e d in the semantic procedure by means of the l i s t i d e n t i f i e r following the r e p e t i t i o n symbol. L i s t elements provide a compact alternative representation to the recursive BNF d e f i n i t i o n of repetition. For example, the QUEST syntax production for an ASPLE declaration t r a i n could be incorporated Into the syntax production f o r an ASPLE program: <program> : •begin* (<declaration> *;*)*<decls> <stm-train> *end* The l i s t i d e n t i f i e r <decls> i s a 'dummy* non-terminal symbol in the syntax; It i s used to i d e n t i f y the l i s t of <declaration>s in the semantic procedure following the production and i s not defined by other BNF productions as are <declaration> and <stm-train>. L i s t elements are expanded into a further set of productions in the output f i l e f o r the Parser Generator, and the l i s t i d e n t i f i e r forms a new non-terminal symbol. Multiple occurrences of the same non-terminal in a USL production can be semantically distinguished by compound i d e n t i f i e r s , f o r example, the QUEST syntax production f o r an ASPLE conditional statement: • i f * <exp> *then* <stm-1:rain>. <t> •else* <stm-train>.<f> * f i * <stm-train>.<t> and <stm-train>,<f> are compound i d e n t i f i e r s i n the above syntax production. Although they both have the same syntax d e f i n i t i o n as a <stm-train>, they must be distinguished i n the semantic procedure which follows. Any 63 reference to a <stm-train> i n the semantic procedure would be ambiguous, and so each reference must be q u a l i f i e d as either <stm-train>.<t> or <stm-train>.<f>. An i d e n t i f i e r of the form i d 1 . i d 2 i s output to the syntax module as i d * , but must be referenced in the semantic procedure as i d * . i d 2 . A example of the use of a QUEST syntax production s p e c i f i c a t i o n i s the b r i e f syntax d e f i n i t i o n of a record class declaration in ALGOLW: <record-dec> : •record' <identifiet> ' (• <var-dec>. £1> {»;» <var-dec>. <Cn>)*<dec-list> " ) • - . The <record-dec> syntax d e f i n i t i o n defines an ALGOLS record composed of one or more variable declarations, <var-dec>, separated by ';• and enclosed i n matching parentheses. The f i r s t variable declaration i s associated by the semantic procedure with <var-dec>.<1> and variable declarations within the l i s t are associated with <var-dec>.<n>, Semantic Definition Using QUEST The QUEST semantic procedures associated with the syntax productions for the USL perform two basic functions: building GRAIL structures and c o n t r o l l i n g the t r a v e r s a l of the USL program parse tree. Thus, QUEST control structures consist of a simple set of statements for building and pattern matching GRAIL structures, together with the a b i l i t y to invoke the semantic procedures associated with the nodes of a source program parse 64 i tree. GRAIL structures are the only data type provided in QUEST and a l l variables within a semantic procedure consist of references to terminal symbols ( i . e . , LABELS) or non-terminal NODEs. GRAIL structures within a semantic procedure may be part of the USL program representation, or they may be temporary structures b u i l t during the process of pattern matching, referred to as templates. The QUEST pattern matching f a c i l i t y i s used for functions such as searching a GRAIL structure and selecting attributes of NODEs. Thus, th i s general pattern matching f a c i l i t y provides the functions usually associated with symbol table manipulation i n a compiler. For example, in the QUEST d e f i n i t i o n of the ASPLE translator, the global procedure procedure <search-id> uses pattern matching to locate an i d e n t i f i e r in the declaration l i s t . A QUEST program can be decomposed i n t o two components: i ( 1 ) The prelude defines the global environment f o r the tr a n s l a t i o n process. The prelude includes the following: (a) the d e f i n i t i o n and i n i t i a l i z a t i o n of any global structure variables. , Global variables may be used during compilation for maintaining l i s t s of i d e n t i f i e r s or forward references, to enable the tra n s l a t i o n of the USL into GRAIL to be performed i n a single pass. In the t r a n s l a t i o n of ASPLE to GRAIL, the program-code i s defined as a global variable: 65 structure <prcgram-code> := ' l i s t * ; The <program-code> i s the the ASPLE translator. It i s with no attributes, and may be procedures which follow. only global variable defined by i n i t i a l l y defined as a GRAIL LIST referenced by any of the semantic (b) the d e f i n i t i o n of any global QUEST procedures. These are used when a number of productions in the semantic d e f i n i t i o n perform a common operation upon the semantic tree, such as searching for the d e f i n i t i o n of an i d e n t i f i e r i n the ASPLE global procedure <search-id>. Global procedures are also used to define external functions. I f the body of the global procedure consists of an external declaration, then when the global procedure i s ca l l e d the semantic procedure i s exited and the string associated with the c a l l i s passed to the IMC. The language implementor j • can use external procedures as a convenient means to extend the functions provided by QUEST. (2) The charter defines both the syntax productions of the USL and the associated! semantic procedures. v. The format of the syntax productions was defined i n Section V.3, and the description of the associated QUEST procedures follows. A l l QUEST procedures have an associated l i s t of formal parameters. The actual parameters to a QUEST procedure c a l l i 66 must be GRAIL structures, which are passed by reference when the procedure i s invoked. A l l structure variables used within QUEST procedures must be defined and are of three types: (1) l o c a l structure variables. These are declared i n the structure ;statement preceding the executable statements of the procedure, and only defined for the scope of the procedure. Thus, i f a semantic procedure c a l l s another by means of a traverse or c a l l statement, the l o c a l structure variables are not available to the c a l l e d procedure unless they are passed as parameters to the c a l l . (2) global structure variables. These are declared and i n i t i a l i z e d by structure statements i n the QUEST prelude, such as the program-code variable defined i n the ASPLE compiler. Global structure variables may be referenced within any of the semantic procedures. (3) structure parameters. These are the formal parameters to the QUEST procedure, which are bound by reference to the actual parameters of the procedure when i t i s entered by a traverse or c a l l statement. The ASPLE productions which define modes provide a simple example of the application of the use of structure variables i n QUEST, A <mode> i n ASPLE may be a simple mode: •bool 1 or »int«, or a *ref» to <mode>. Thus, the va l i d modes in ASPLE are «bool», ' i n t ' , ' r e f *bool«, *ref* * i n t * , 'ref* ' r e f »bool», etc. 67 <mode> (<mode-code>) : »bool« % <mode-code> := 'bool* % ; »int» % <mode-code> := * i n t ' % ; • r e f <mode> % structure <deref-code> ; traverse <mode> (<deref-code>) ; <mode-code> := *list» (»ref *, <deref-code>) % . <mode-code> i s a structure parameter to each of the procedures associated with the three productions for <mode>. <mode-code> i s returned as the GRAIL representation of the ASPLE mode. The f i r s t two semantic procedures assign a s t r i n g to t h i s parameter. The t h i r d procedure defines a l o c a l variable <deref-code> for the GRAIL representation of the mode which i s referenced. The procedure associated with the dereferenced mode i s then c a l l e d by traverse, and the representation of the dereferenced mode i s returned i n <deref-code>. <mode-code> i s then assigned the GRAIL representation of an ASPLE *ref*, namely • l i s t * < fref»,...). The concept of a structure i n QDEST corresponds c l o s e l y with the concept of expressions i n an algorithmic language. Constants i n an algorithmic language correspond to terminal GRAIL structures, or LABELS. LABELS are defined in a QUEST program by strings, i . e . , a seguence of characters delimited by single apostrophes ('!•).... Structure variables return the GRAIL structure which they reference. Compound structures, i . e . , GRAIL NODEs, are defined by the NODE type followed by a parenthesized l i s t of i t s a t t r i b u t e s . , For example, i n the QUEST procedure above, <mode-code> i s assigned the structure ' l i s t ' (*ref •,<deref-code>) . This i s a GRAIL LIST, whose attributes are the LABEL 'ref* and the structure referenced by the l o c a l variable <deref-code>. 68 In addition to these methods of defining and building structures, there are two special f a c i l i t i e s f o r structure d e f i n i t i o n i n QUEST. lookup provides access to the source program representation for a terminal element i n the syntax, which i s attached to the parse tree leaf. For example, the USL implementor may wish to r e t a i n the names of source program variables i n the G8AIL semantic representation. I f <variable> i s a terminal i n the syntax production, then lookup(<variable>) w i l l return the LABEL from the parse tree leaf for the terminal. In some cases the implementor may wish to ascend the GRAIL structure. For example, the implementor may translate the USL program statement GOTO i d into a GRAIL representation as ENTER(REFERENCE(<label-structure>)), where <label-structure> i s the GRAIL structure for the USL statement labelled by i d . The QUEST procedure associated with the syntactic d e f i n i t i o n of the GOTO must search the GRAIL structure to resolve the reference, I f <var> i s a structure variable, then up (<var>) w i l l return the parent of the structure which i s referenced by <var>. 69 The detailed syntax of structure expressions i s as follows: STRUCTURE : := r s t r i n g j "creates a GRAIL LABEL, i . e . , a GRAIL NODE | with a NODE type s t r i n g , and no at t r i b u t e s " I i d J "reference to a QUEST structure v a r i a b l e " 1 up ( i d ) | "reference to the parent of the structure i variable " I lookup ( i d ) I " i d must appear as an element i n the | production right hand side. A GRAIL LABEL i s | created with the s t r i n g from the parse tree I node for the element. ** «• GRAIL-NODE ( {L-STRUCTURE ,} L-STRUCTURE ) "A new NODE i s created with a NODE type GRAIL-NODE and a l i s t of a t t r i b u t e s formed from the l i s t of l a b e l l e d structures, L-STRUCTUREs, enclosed in parentheses. Thus, the new NODE becomes the parent of each of the l a b e l l e d structures, conseguently any previous binding of the l a b e l l e d structures as attributes of another GRAIL NODE are undone." L-STBUCTURE ::= pid :^ ?STRUCTURE^ "A l a b e l l e d structure consists of a GRAIL STRUCTURE and a structure variable, i d , to which the structure i s assigned. ,. I f the STRUCTURE i s omitted i t defaults to a new GRAIL NODE, with an * undefined' LABEL and no attributes. Thus, ' l i s t ' O defines a GRAIL LIST NODE with an attr i b u t e l i s t consisting of an 'undefined' structure. The structure variable id i s assigned to the STRUCTURE unless the L-STRUCTURE i s a component of a template structure i n a case statement. Attribute assignment i s only performed within a template i f the pattern match succeeds f o r the entire template.'? 70 The set of GRAIL primitive NODE types are l i s t e d below: GBAIL-NODE ::= r ' l i s t * I 'c lass* I 'array* I *block* j *enter* I * e x i t « j 'opcode* I 'de f ine ' I ' se lect* I 'procedure * j * invoke * j * re turn * I ' re ference ' I ' s u b l i s t ' I ' se t ' j "The l a t t e r two NODE types are reserved for | templates, as they have s p e c i a l e f f e c t s i n | pattern matching., In general, two structures I match i f either has an undefined type, | otherwise both NODE types must be i d e n t i c a l | and the attribute l i s t s must match. I f a 1 NODE type i s a reference, then i t i s | 'dereferenced' to the attribute structure | which i t references before matching. I f the | template NODE type i s a s u b l i s t , then the | match succeeds i f the structure NODE type i s I a l i s t , and each attribute of the s u b l i s t ! matches some attribute of the l i s t . I f the | template NODE type i s a s e t , then the match | succeeds i f the structure matches some | attribute of the set . •* «- s t r i n g "The language implementor can define new GRAIL primitives by building structures with NODE J types that are strings. " 71 A simple example of GRAIL structures and pattern matching i n QUEST i s the d e f i n i t i o n of the ASPLE global procedure <search-id>: <search-id> (<id-mode>,<id-ref>,<dcl-list>,<id-name>) : % case <dcl-list> *s u b l i s t * (•define* (<id-mode>:, •sub l i s t * (<id-ref >: <id-name>)}) : e x i t e l se : <id-mode> := *undef ; <id-ref> : = 'undef* ; e x i t endcase % . <search-id> attempts to f i n d <id-name> i n the declaration l i s t , <dcl-list>. The search i s performed using the pattern matching f a c i l i t y provided by the case statement. ., Pattern matching in QUEST i s a 'top-down* process. A pair of GRAIL NODEs are matched by f i r s t matching the NODE types, then matching each of the corresponding att r i b u t e s of the NODEs. In the example above the case defines <dcl-list> as the structure which i s to be matched.v The f i r s t template i n the case l i s t (•sublist*... ) t r i e s to f i n d <id-name> within some DEFINE node in the declaration l i s t . I f the <id-name> i s located then the <id-mode> and <id-ref> parameters are assigned to the substructures i n the <dcl-list>. If the f i r s t template f a i l s to match, then the next template in the case l i s t , e l s e , i s matched against <dcl-list>. Since e l se succeeds, <id-mode> and <id-ref> w i l l be assigned to the LABEL *undef*. The GRAIL representation of the ASPLE declaration l i s t i s : •list*('define*(<id-mode>,* l i s t * (<id-name>),<id-value>)) Thus an ASPLE declaration * i n t * x, y i s represented i n GRAIL by: •define'(*int','list'(*x»,'y»),* #undef#«) The template which locates an <id-name> uses a 'subset* node to match the <id-name> within the <id-list>, and again to match the •define* within the <dcl-list>. Each QUEST procedure associated with a syntax production consists of a declaration of a l l the l o c a l structure variables used within the procedure, followed by a l i s t of the executable statements forming the body of the procedure. 72 QUEST-PROCEDOBE ::= (•Structure {id , } i d ;«| [STATEMENT ;} STATEMENT If the procedure associated with a production i s void, then the QUEST procedure associated with the production defaults to a descent of the syntax tree, i m p l i c i t l y passing a l l parameters to the production. Implicit procedures are useful when the syntax d e f i n i t i o n of the USL introduces non-terminals which have no GRAIL semantic representation, such as <factor>s or <primary>s in ASPLE expressions: <exp> {<exp-code>,<exp-mode>,<dcl-list>) : <factor> % % ; In the syntax d e f i n i t i o n of ASPLE an expression <exp> may be <exp> •+» <factor> or <factor>. Si m i l a r l y a factor <factor> may be <factor> •»*• <primary> or <primary>. These unary productions are introduced to resolve operator precedence, and have no associated semantics. Thus, the QUEST semantic procedures for these productions i m p l i c i t l y descend to the next l e v e l i n the t r e e . ; 73 The basic set of executable statements are l i s t e d below: STATEMENT : := r i d := STBOCTUBE | "The structure variable i d i s assigned to 1 reference the structure." | append STRUCTUBE* to STRUCTURE* I delete STBOCTOBE* from STR UCTURE2 1 "In both these statements STBUCTUBE2 must be | a GBAIL LIST NODE. The LIST i s modified by | adding or deleting STBUCTUBE* from the I a t t r i b u t e s of the LIST." I traverse i d { {L-STBUCTUBE ,} L-STBUCTUBE ) I "This statement controls t r a v e r s a l of the | syntax tree by c a l l i n g the semantic procedure | defined for the production associated with | the l e f t hand side non-terminal i d . The | parameters to the c a l l consist of a l i s t of | l a b e l l e d structures. The structure | expressions are evaluated, then passed by | reference to the semantic procedure c a l l . " | c a l l i d { [L-STRUCTUB E ,} L-STBOCTUBE ) I "This statement i s used analogously to I t raverse to c a l l a global procedure. •? | f or i d do {STATEMENT ,} STATEMENT od \ " i d must be a l i s t i d e n t i f i e r i n the l e f t I hand side of the production. The seguence of I for statements i s executed f o r each | occurrence of the l i s t in the source program j parse tree. I f <element> i s a non-terminal | i n the l i s t , then <element>.<id> w i l l | reference the sub-branch of the syntax tree I indexed by the current value of i d . Thus | t h i s ; statement can be used to generate the I GBAIL representation for each USL statement I within a l i s t . " I e r r o r { str i n g ) | "This statement i s used within a QUEST | procedure to output error messages tc the | f i l e containing the USL source program | l i s t i n g . Many semantic errors are not \ detected u n t i l the tran s l a t i o n of the USL I into GRAIL by the QUEST procedure. 1! I d i sp lay ( id ) | "This statement i s used to display the GRAIL \ structure referenced by the structure | variable i d , as a debugging a i d . " «- case STRUCTURE© {STRUCTURE* : CASE-LIST} {•else : C A S E - L I S T ^ endcase 74 CASE-LIST ::= r e x i t -i £ {STATEMENT ;} f [•STATEMENT^ | J «- reaatch J "The case: statement attempts to match the case structure, STRUCTURE©, against each case l i s t template STRUCTURE* i n succession. Template structures are only b u i l t for pattern matching, and labelled structures within a template are not assigned unless the match succeeds. When a match succeeds, the corresponding CASE-LIST of statements i s executed. Each statement i n the c a s e - l i s t i s executed, and i f the c a s e - l i s t i s terminated by reaatch, the entire case statement i s re-executed, commencing with the evaluation of the STRUCTURE^. If the c a s e - l i s t i s terminated by exi t , the case statement i s exited. The els e option w i l l unconditionally match the STRUCTUREO." 75 An example of a QUEST procedure which uses most of the statements above i s the translation of the ASPIE 'input 1 statement: 'input* <id> % structure : <input-code>,<id-narae>,<id-ref>,<id-mode> ; <id-name> := lookup (<id>) ; /* Search for the id i n the declaration l i s t . V c a l l <search-id> (<id-mode>,<id-ref>,<dcl-list>,<id-name>) ; <input-code> := 'reference' (<id-ref>) ; case <id-mode> /* Is the i d e n t i f i e r undefined? */ •undef : e r r o r ('undeclared i d e n t i f i e r * ) ; e x i t /* Otherwise i s the i d mode primitive? */ •set 1 ('int*,'bool') : append 'opcode' {'input*,'reference* {<id-mode>),<input-code>) to <stm-list> ; e x i t /* Dereference and rematch., * / • l i s t ' (»ref',<id-mode>:) : <input-code> := •opcode* (»dref ,'reference * (<id-mode>) , <input-code>) ; renatch endcase % ; The procedure has two structure parameters associated with the l e f t hand side of the production: <stm-list> and <dcl-list>. There are four l o c a l sructure variables i n the s tructure declaration at the head of the procedure: <input-coda>, <id-name>, <id-ref> and <id-mode>. The <id-name> i s assigned a LABEL from the parse tree s t r i n g associated with the terminal <id> using the lookup. The global procedure <search-id> i s then c a l l e d to locate the <id-name> i n the l i s t of declarations <dcl-list> passed as a parameter to the procedure. I f the <id-name> i s undefined, an error i s generated, otherwise i f the <id-mode> i s »int' or 'bool', a GBAIL OPCODE representation for 'input* i s generated and added to the l i s t of statements. I f the <id-mode> i s an ASPLE •ref*, then a GBAIL OPCODE for dereferencing i s assigned to the <input-code> and the case statement i s re-executed with the dereferenced <id-mode>. 76 Chapter VI. Code Generation from GBAIL There are two separate phases i n the tran s l a t i o n of the GBAIL program representation into code for the user target machine. I n i t i a l l y , the GBAIL representation i s translated into the low-level intermediate language, STACODE. Then t h i s i s translated into assembly language for the target machine. Since the tr a n s l a t i o n of the GBAIL representation into STACODE i s not dependent upon the p a r t i c u l a r USL or target machine, i t i s provided by a l i b r a r y module of LANCE. The module performs code generation by a sin g l e recursive descent of the GBAIL representation. The module determines the NODE type of the current NODE i n the repesentation, then c a l l s the corresponding code generation routine. The code generation routines control further descent of the GRAIL representation and produce STACODE instructions for the NODE. For example, the code generation routine f o r a GBAIL BLOCK i n i t i a l l y outputs a LABEL to allow for ENTEB branches to the block. The routine then selects the declaration l i s t a ttribute. If there are any declarations, the block l e v e l i s incremented and the code generation module i s cal l e d recursively for each declaration. The routine then selects the statement l i s t a t t r i b u t e and c a l l s the code generation module for each statement i n the l i s t . F i n a l l y , the BLOCK routine outputs a LABEL to allow f o r EXIT branches, then returns to the c a l l i n g routine. 77 Because STACODE code generation i s performed by a single recursive descent, l i t t l e code optimization i s attempted. Optimization of code generation for expressions such as removal of common subexpressions and evaluation of constant expressions could either be performed by a separate t r a v e r s a l of the GBAIL program representation or by increasing the so p h i s t i c a t i o n of the STACODE code generation module. The f i n a l phase in the tr a n s l a t i o n process i s the tra n s l a t i o n of the STACODE output into code for the target machine. This phase i s highly dependent upon the architecture and in s t r u c t i o n set of the target machine, so the code generator implemented for the IBM 370 needs to be extensively rewritten for any other machine. Nevertheless, because STACODE i s machine oriented, l i t t l e e f f o r t i s reguired to translate the STACODE instruction set i f a reasonably sophisticated macroassembler i s available for the target machine. The STACODE translator which was implemented i n IBM Assembler G reguired at most a dozen conditional macroinstructions for each i n s t r u c t i o n i n the STACODE inst r u c t i o n set. STACODE STACODE i s the assembly language for a simple stack oriented i d e a l machine, referred to as STACKER. The architecture of STACKER i s designed to be as simple as possible, while providing the co n t r o l environment for e f f i c i e n t l y 78 executing programs written i n block structured high-level languages. There are four primitive data types provided by STACKER: INT, REAL, BOOL and ADR, each of which occupies a single unit of storage referred to as a CEJ,L.* Compound data types i n GRAIL, such as ARRAYS and CLASSes, must be represented as aggregates of CELLs. A l l STACODE data operations are performed upon operands loaded onto a run-time execution stack. When a data operation i s executed, the operands are popped o f f the execution stack, and the r e s u l t of the operation i s pushed onto the stack. The execution environment f o r STACODE i s a pushdown bounded stack of c e l l s , which i s i n i t i a l l y empty before the STACODE program i s executed. A c e l l i n the execution stack may contain either a value of type INT, REAL or BOOL or an i n d i r e c t address, ADR. Direct data addresses i n the stack are defined by a pair of integers, [L:Q], referred to as the l e v e l and offset respectively. The l e v e l corresponds to the s t a t i c l e x i c a l l e v e l , or nesting, of the GRAIL block containing the data declaration and the o f f s e t corresponds to the l o c a t i o n of the declaration within the block, for example, the t h i r d l o c a l variable defined i n a GRAIL block nested i n the outer program block would be d i r e c t l y addressed as [2:3]. The control environment for dir e c t addressing i s provided by the STACKER display stack. During execution of a STACODE program, STACKER maintains the current execution stack base 79 address of each program l e v e l i n the display stack c e l l indexed by that l e v e l . The display stack i s not accessed d i r e c t l y by STACODE instructions, but i s updated by STACKEB upon entry and exi t to blocks and procedures. Aside from the execution stack, there i s a global pool of c e l l s , referred to as the heap, which i s provided for the all o c a t i o n of compound data types such as arrays and strings. Indirect stack addresses i n the display stack may either be 'descriptor's for accessing the heap, or references to within the display stack i t s e l f . Garbage c o l l e c t i o n within the heap i s provided automatically by STACKER. i STACODE Instructions The STACODE in s t r u c t i o n set consists of the minimum set of primitive instructions required to execute GBAIL programs. In many cases seguences of STACODE instructions could be more e f f i c i e n t l y implemented as a s i n g l e compound i n s t r u c t i o n . For example: LOAD fL*:OM, STOBE [ L 2 : 0 2 ] could be abbreviated as: MOVE £L*:0»], [ L 2 : 0 2 ]. These optimizations are highly dependent upon the particular target implementation of STACODE, and can usually best be provided by t h e i t r a n s l a t o r for STACODE written for a part i c u l a r machine. 80 The source format of STACODE programs i s designed to be compatible as input to a simple l i n e oriented macroprocessor. Each STACODE in s t r u c t i o n occupies a single l i n e , and consists of three f i e l d s : the l a b e l , opcode and operands, separated by blanks and optionally followed by comments. The l a b e l f i e l d i s optional, but must commence i n the f i r s t column i f i t i s present, A *** in t h i s column i s used to f l a g the instruction as a comment. The opcode cannot be omitted, but the l i s t of operands, separated by commas, may be void. Notation: The semantics and syntax of the STACODE i n s t r u c t i o n set i s defined using the following conventions: TOSB Top of Stack Address Register BODE Base of Display Address Register SO Scratch Address Register (R) Value of reg i s t e r R ((R)±N) Value of CELL whose address i s <S)+N DATATYPE A type name operand, either INT, REAL or BOOL TYPE A type name operand, either a DATATYPE or ADR, an i n d i r e c t address (1) Load and store instructions., LOAD TYPE,L,0 This i n s t r u c t i o n loads the value from the stack address { L : 0 ] : (TOSR) := (T0SR) + 1, increment the TOSR 81 (SO) : = ((80DR)+L), fetch the base address ((TOSR)) : = ((S0)+O), add the of f s e t and load the stack LOADI TYPE This i n s t r u c t i o n loads the stack i n d i r e c t l y , overwriting the address i n the top of stack by the value fetched from that location: {(TOSR)) := ((TOSR)) CONST DATATYPE,CON STVALUE This i n s t r u c t i o n loads the constant value CONSTVALUE onto the stack. . I f DATATYPE i s BOOL then CONSTVALUE can be either 'true* of ' f a l s e ' and i f DATATYPE i s INT then CONSTVALOE i s a signed d i g i t s t r i n g . I f CONSTVALUE i s undefined then the stack i s loaded with an 'undefined' value. The semantics are: (TOSR) := (TOSR) +1 {(TOSR)) := CONSTVALUE CONST ADR,L,0 This in s t r u c t i o n loads the current i n d i r e c t address of [L:C] in the execution stack: (TOSR) := (T0SR)+1 (SO) := ((BODR)+L), fetch the base address ((TOSR)) := (SO) *0, add the o f f s e t and load the address GET AD ; This in s t r u c t i o n i s used to obtain space i n the heap. The number of consecutive c e l l s reguested must be 82 loaded onto the execution stack, and the operator replaces t h i s by the address of the f i r s t c e l l allocated in the heap. The space i n the heap i s deallocated automatically by STACKER. DOPL This i n s t r u c t i o n duplicates the top of stack: (TOSR) := (T0SR)*1, increment the TOSR ((TOSR)) := ((T0SR)-1) , add the offset and lead STORE TYPE,L,0 This i n s t r u c t i o n stores the value from the stack: (SO) := ((BODR)+L), fetch the base address ((S0)+O) := ((TOSR)), store into the stack (TOSR) := (T0SR)-1, decrement the TOSR STOBI TYPE This in s t r u c t i o n stores i n d i r e c t l y , the value and addresss are taken from the top of the execution stack: ((TOSR)-1) := ((TOSR)), fetch the value HOVEI This i n s t r u c t i o n moves data i n d i r e c t l y and i s used to a l l o c a t e and copy storage within the heap. The address of the f i r s t c e l l i n the source and destination, and the number of c e l l s to be copied are taken from the execution stack: (((TOSR)-1)+n) := ( ((TOSR)) *n) , for n := 1 to ((TOSR)-2) (TOSR) := (TOSR)-3 83 Label declarations. Entry points f o r control transfer i n s t r u c t i o n s i n STACODE are declared by means of the LABEL i n s t r u c t i o n , which generates no executable operation. IDLABEL LABEL Control Transfer instructions. Branch i n s t r u c t i o n s i n STACODE are used to simulate the ENTER and EXIT ins t r u c t i o n s i n GRAIL. Branch instructions do not reset th'e display stack as the addressing environment i s unchanged. BRANCH UNCOND, IDLABEL This i n t r u c t i o n generates an unconditional branch to the LABEL IDLABEL. BRANCH BOOLCONST,IDLABEL This i n s t r u c t i o n performs a conditional branch on the value on the stack, which must be of type BOOL. If the top of the stack matches the BOOLCONST {i.e., •true* or 'false') then the branch i s performed; otherwise the next STACODE inst r u c t i o n i s executed. In either case the BOOL value i s popped from the stack. Procedure c a l l s i n GRAIL are simulated by a seguence of STACODE i n s t r u c t i o n s : MKSTK L This i n s t r u c t i o n i n i t i a l i z e s the execution stack prior 84 to entry of the procedure at le v e l L. The TOSR i s incremented to allocate a c e l l for the return address, then L i s pushed onto the stack, followed by the address of the base of the current l e v e l to enable the display stack to be restored upon RETURN from the procedure. Following MKSTK, the address or value of each of the parameters to the procedure i s loaded onto the stack, then an ENTRY ins t r u c t i o n i s executed. i ENTRY IDLABEL This in s t r u c t i o n saves the current STACODE address in the c e l l reserved by MKSTK, resets the display stack to l e v e l L, and branches to IDLABEL. i The GRAIL procedure RETURN is simulated by the STACODE instr u c t i o n EXIT. EXIT TYPE This i n s t r u c t i o n resets the display stack to i t s state prior to the l a s t MKSTK instru c t i o n . Typed procedures in GRAIL reserve the f i r s t c e l l at the current l e v e l for the procedure value. I f the TYPE i s not omitted, then t h i s c e l l i s retained on top of the execution stack as the return value. (4) Data Operators^ A l l data operations in STACODE must be performed by loading the operands onto the stack. The data operator pops the operands from the stack and pushes the r e s u l t onto the stack. Thus, a binary operator such as AND i s defined as 85 follows: ((T0SB)-1) := {{TOSB}) fand» <<TOSR)-1) (TOSB) := (TOSB)-I Data operators are of three types: (i)Arithmetic operators, UNABY-OP TYPE BINABY-OP TYPE The TYPE of the operands, and of the r e s u l t of the operator, must be either either INT or BEAL. Binary arithmetic operators include MUL, DIV, ADD and SOB, and unary arithmetic operators include ABS and NEG. , ( i i ) Relational operators. BIN ABY-OP TYPE The TYPE of the operands must be either either INT or BEAL and the r e s u l t of the operator i s BOOL. Binary r e l a t i o n a l operators include EQL, NEQ, GTB and GTE. ( i i i ) L o g ical operators. NOT BINABY-OP The type of the operands and re s u l t i s BOOL. Binary l o g i c a l operators include AND and OBB, 86 S ince compound data t ypes such as s t r i n g s a re not s t o r e d i n the e x e c u t i o n s t a c k , t h e r e a re no STACODE i n s t r u c t i o n s d e f i n e d f o r d i r e c t l y per forming data o p e r a t i o n s upon such data t y p e s , N e v e r t h e l e s s , the STACODE i n s t r u c t i o n s e t c o u l d r e a d i l y he extended i n an imp lementat ion f o r a p a r t i c u l a r t a r g e t machine. For example, s t r i n g comparison c o u l d be d i r e c t l y performed by a s t ack o p e r a t o r whose operands on the s t a c k are the addresses o f the two s t r i n g s and t h e i r l e n g t h , ana logous to MOVE. 87 Chapter VII. Summary and Conclusions The previous chapters have dealt with the design of GBAIL, and the implementation of GBAIL within a translator writing system. The TWS was developed to enable adaptable t r a n s l a t o r s for a source language to be generated from a d e f i n i t i o n of the GBAIL structures representing each of the source language constructs. Code produced by the translator i s i n a low-level intermediate representation, STACODE, which can either be translated into code f o r a target machine by a macroassembler [Brow 74], or interpreted d i r e c t l y . Source language translators generated by the TWS use both GBAIL and STACODE as intermediate representations. The operation of the ASPLE translator generated by the TWS i l l u s t r a t e s the use of these i n t e r n a l representations ( f i g . 7-1). GRAIL Structure Assembler ASPLE I f GRAIL STACODE I I - r i i to GRAIL >' to STACODE i to IBM370 ASSEMBLER i ASPLE Source i STACODE Program Figure 7-1 ASPLE Translator Generated by the TWS 88 Implementation Results The TWS was used to implement a translator for the programming language ASPLE, generating code for the IBM370. Over a dozen ASPLE programs were compiled by the translator, using most combinations of the ASPLE data and control structures. The translator was also tested with a series of ASPLE programs containing semantic errors such as undeclared i d e n t i f i e r s , mismatched modes, and i n v a l i d dereferencing. A l l these semantic error checks were incorporated into the d e f i n i t i o n of the GRAIL representation for ASPLE. , The STACODE programs output by the translator were reasonably e f f i c i e n t . For example the STACODE generated for the ASPLE program i n Appendix F could not be improved by l o c a l optimizations such as eliminating duplicate constant d e f i n i t i o n s , redundant store load pairs, or unnecessary jump instr u c t i o n s . The ASPLE translator did not attempt any transformations such as constant folding or eliminating common subexpressions [Wait 76], but these could be performed upon the GRAIL representation, independent of the source language or target machine, by a separate phase incorporated into the design of the translator (f i g . 7.2) [ Aho 77]., 89 ASPLE Source O p t i m i z e d GRAIL S t r u c t u r e Assembler 1 1 ASPLE GRAIL 1 K GRAIL to GRAIL to STACODE : p> 1 OPTIMIZER STACODE t o IBM370 ASSEMBLER GRAIL S t r u c t u r e STACODE Program Figure 7-2 Optimizing ASPLE Translator STACODE i t s e l f i s not an e f f i c i e n t representation, since a l l expressions are evaluated upon the stack, rather than using registers. A translator using GRAIL could generate e f f i c i e n t object code for a wide range of object machines by incorporating a more powerful low-level intermediate representation such as JAN OS [Cole 74] into the design of the tran s l a t o r ( f i g . 7-3) . GRAIL S t r u c t u r e ASPLE to GRAIL GRAIL to JANUS Assembler I JANUS to IBM370 ASSEMBLER 1* ASPLE Source JANUS Program Figure 7-3 Adaptable ASPLE Translator The primary design goal of GEAIL was adaptability, and the ASPLE tran s l a t o r generated by the TSS reflected t h i s . Using the TWS i t was comparatively easy to make major modifications to the syntax and semantics of ASPLE, and then generate a corresponding 90 translator. For example, the d e f i n i t i o n of the semantics of dereferencing i n ASPLE was modified several times during development of the ASPLE tr a n s l a t o r . The syntax and semantics of ASPLE were l a t e r extended to include recursive procedures, and the TWS was used to generate a translator for t h i s extended language .{-John 78]. The translator was also modified to generate code for a dif f e r e n t machine, SPAM [John 78], The majority of the d i f f i c u l t i e s encountered i n adapting the ASPLE compiler were due to l i m i t a t i o n s inherent in the T«S [Abra 76, 77] which were not discovered u n t i l a f t e r the implementation was well under way. 91 Evaluation of GRAIL The ASPLE translator demonstrated the f e a s i b i l i t y of incorporating a high-level semantic representation i n t o a translator writing system to simplify the development of adaptable translators for high-level algorithmic languages. Other authors have demonstrated the advantages of using low-level intermediate representations to achieve machine independence. GRAIL achieves corresponding source language independence. The primary advantage of using GRAIL i s adaptability, as the d e f i n i t i o n of the source language semantics i s independent of the low-level intermediate representation or target machine. Chapter 4 showed that GRAIL was capable of representing the common data and control structures of high-l e v e l algorithmic languages, without discarding any of the information necessary f o r generating e f f i c i e n t code f o r a target machine. In p r i n c i p l e , GRAIL could be used i n implementing an adaptable optimizing translator for a range of source languages, though t h i s project has not yet been undertaken. As Chapter 2 demonstrated, i t i s almost impossible to develop a tran s l a t o r t without some compromises i n the design goals. Thus there are several disadvantages to using GRAIL i n the development of a translator. Perhaps the major disadvantage i s i n the e f f i c i e n c y of the translator i t s e l f . Using an intermediate representation almost i n v a r i a b l y increases the time required to perform the t r a n s l a t i o n , so i f the primary goal i s translator e f f i c i e n c y , then GRAIL i s unsuitable tc the development of the translator. 92 Another disadvantage of using GRAIL i s the need f o r tools and technigues for manipulating intermediate representations. Unless a translator writing system i s available, and the language implementor i s f a m i l i a r with GRAIL, then the tr a n s l a t o r could i n i t i a l l y be implemented more quickly without using GRAIL. This argument ignores the problem of maintenance; a f t e r the translator has been implemented i t must be debugged and updated. On many large software projects maintenance can consume considerably more e f f o r t than the i n i t i a l implementation, and using a tool such as GRAIL to structure the development of the translator can reduce the maintenance e f f o r t considerably. GRAIL i s also r e s t r i c t e d to the class of high-level algorithmic languages. Translators for languages such as SNOBOL could not be developed using GRAIL, although the approach of using an intermediate semantic representation could be applied to ether classes of languages such as inter p r e t i v e s t r i n g processing languages. Chapter 2 introduced several possible objectives i n a compiler development project: correctness, a v a i l a b i l i t y , adaptability, helpfulness and e f f i c i e n c y . Of these, GRAIL i s unsuitable only for the p r i n c i p a l objective of compilation e f f i c i e n c y . GRAIL i s primarily designed to achieve adaptability which i s fundamentally incompatible with the goal of a highly e f f i c i e n t and specialized compiler [Horn 76]. 93 O r i g i n a l Contributions and Future Research The major contribution of t h i s thesis i s i n the f i e l d of software development. GRAIL i s an o r i g i n a l approach to the development of adaptable translators f o r high-level languages. The differences between GRAIL and other high-level intermediate representations used i n translators r e f l e c t GRAIL'S design c r i t e r i a : generality and adaptability without loss of code optimization potential. The generality of GRAIL was demonstrated i n Chapter a, by the range of high-level data and control structures which could be e f f e c t i v e l y represented using GRAIL. The adaptability of GRAIL was demonstrated by i t s incorporation i n t o a translator writing system, which enabled translators using a GRAIL representation to be generated from a description of the representation for each of the source language constructs. The code optimization p o t e n t i a l of GRAIL r e l i e s upon the assertion that there i s no relevant semantic information discarded i n converting a source language program int o i t s corresponding GRAIL representation. There are several areas of future research suggested by t h i s t h e s i s . The translators generated by the translator writing system were very i n e f f i c i e n t , because of i t s inherent l i m i t a t i o n s . Nevertheless i t should be possible to develop tr a n s l a t o r s incorporating GRAIL which are of comparable size, speed and e f f i c i e n c y to hand-written t r a n s l a t o r s . Thus one future area of research i s a d i r e c t comparison of translators incorporating GRAIL with available hand-written translators. Another area of future research i s i n language d e f i n i t i o n : 94 how should the wide range of high-level languages and d i a l e c t s be c l a s s i f i e d ? The commonly adopted approach i s to c l a s s i f y languages by area of application f Samm 76']. However a broad category such as "numerical s c i e n t i f i c " includes such diverse languages as ALGOL60, API, FORTRAN and SPEAKEASY. The approach suggested by t h i s thesis i s to c l a s s i f y languages according to an underlying set of semantic primitives., Another major problem i s the development of t o o l s to simplify the production of translators incorporating intermediate representations such as GRAIL. Parser generation has been automated through the formalism of context free grammars; however graphical representations for semantics cannot be manipulated or defined by technigues as limited as context free grammars., If a formalism analogous to context free grammars could be developed f o r a r b i t r a r y graph structures then i t could provide a means for generating and manipulating intermediate representations [Knut 71]., Perhaps the major i d i r e c t i o n for future research i s i n the development of technigues for formal language and machine d e f i n i t i o n which can be e f f e c t i v e l y applied to the design of translators. 95 References and Annotated Bibliography [ Abr 76] Abrahams, P.W. The D e f i n i t i o n of the PL/I Standard. In [Dewa 76], pp. 99-123. Discusses the new PL/I standard, and the role of formal d e f i n i t i o n s for programming languages. [Abra 73] Abramson, H.D. Theory and Application of a Bottom-Up Syntax-Directed Translator. Academic Press, 1973. [Abra 74] Abramson, H.D. A Syntax Directed Macro Processor. BIT X 14:3 (1974) , pp. 261-272 Describes a syntax-directed macroprocessor, based upon Strachey's GPM, as a t o o l for attributed t r a n s l a t i o n . [Abra 76] Abramson, H. D., Rushworth, T., and Venema, T. HTOSI: A Tree oriented String Interpreter f o r the Design and Implementation of Semantics"., Technical Report 76- 5, Department of Computer Science, University of B r i t i s h Columbia, 1976, 64p. Describes the implementation of a tree touring macroprocessor for syntax-directed t r a n s l a t i o n [Abra 73 74], and i t s applications i n designing t r a n s l a t o r s . [Abra 77] Abramson, H.D.., Appelbe, W.F. , and Johnson, Mark Scott. "Assulting the Tower of Babel: Experiences with a Translator Writing System". Technical Report 77- 12, Department of Computer Science, University of B r i t i s h Columbia, 1977. 19p. C r i t i c a l l y reviews the TRUST translator writing system, [Aho 72] Aho, A.V., and Uhlmann, J.D. The Theory o f P a r s i n g ^ -Translation and Compiling. Prentice-Hall, Volume 1: Parsing (1972), ~ and Volume 2: Compiling (1973). 1002p. [ISBN 0-13-9 14564-8] Develops the general theory of LR(k) parsing, and syntax directed t r a n s l a t i o n . [Aho 77] Aho, A.V., and Uhlmann, J.D. P r i n c i p l e s of Compiler Design._ Addison-Wesley, 1977? 604p. [ISBN 0-201-00022-9] [Baue 76] Bauer, F. L., and E i c k e l , J . Compiler Construction. Springer-Verlag, 1976. 638p. flSBN 3-54CH08046-5] [Boch 75] Bochmahn, G.V., and Ward, P.r "A Compiler Writing System for Attribute Grammars". Publication 199, Departement d»Informatigue, Universite de Montreal, Montreal, July 1975. A portable translator writing system based upon att r i b u t e grammars. 96 [Book 70] Book, E. The CWIC/360 System; A Compiler for Writing and Implementing Compilers. SIGPLAN Notices^ 5:6 (1970 June), pp. 11-29. Outlines the implementation of a TWS for the IBM360 which uses special languages to define the syntax and semantics of the source language. [Brow 72] Brown, P.J. Levels of Language for Portable Software. Communications of the ACM, 15:12 (1972 December), pp. 1059-1062., Describes the low-level machine independent language LOWL, and j u s t i f i e s i t s design and applications. [Brow 74] Brown, P.J. Macro grocessors and Technjgues f o r Reliable Software,. ., John Wiley and Sons, 1974, 244p. [ISBN 0-471-11005-1] Surveys macro processors, including GPM» STAGE2, and ML]; t h e i r applications and l i m i t a t i o n s , [Capo 72] Capon, P.C., Morris, D., Sohl, J.S., and Wilson, I. The MU5 Compiler Target Language and Autocode. .. Coi^uter Journal,. 15 (1972), pp. 109-112. Describes the parametric compiler target language, CTL, f o r the t h e o r e t i c a l MD5 machine. CTL i s based upon a set of high-level code generation procedures. [Clea 77] Cleaveland, J.C. , and Oxgalls, B.C. Grammars for Programming Languages. Elsevier North-Holland, New York7 1977. ~154p. [ISBN 0-444-00187-5] Introduces the programming language ASPLE, and defines i t s syntax and semantics using S-grammars. [Cole 74] Coleman, S.S., Poole, P.C., and Waite, W.M., The Mobile Programming System, JANOS. Software - Practice and Experience, 4 (1974) , pp. 5-23. Describes the design of the extensible intermediate language JANUS, and discusses i t s application to a range of source languages and target machines. [Dewa 76] Dewar, B.B. K. (ed.) . Proceedings^ fourth International Conference on the Design and Implementation of Algorithmic Languages^, Courant In s t i t u t e of Mathematical Sciences, New York University, 1976 June. 273p. [Elso 70] Elson, M., and Bake, S.T. Code Generation Technigues for Large Language Compilers. IBM Systems Journal, 3 (1970), pp. 166-188. Outlines the procedural approach used i n the IBM 3 60 PL/I compiler. 97 [Engl 71] Engler, E. (ed.) Symposium on the Semantics of Algorithmic Languages, Springer-ferlag, B e r l i n , 1971. 3 7 2 p 7 ~ flSBN 3-540-05377-8] [Fel d 68] Feldman, J. , and Gries D. Translator Writing Systems. Communications of the ACS. 11:2 (1968 February), pp. ,77-113. Review of early e f f o r t s to automate the development of translators and compilers. [Gard 77] Gardner, P.J. A Transportation of ALG0L68 C. SIGPLAN Notices,. 12:6 (1977 June), pp. 95-101. Describes the problems encountered i n transporting an ALGOL68 C compiler, using ZCODE as a low-level intermediate language. [ G r i f 76] G r i f f i t h s , M. Introduction to Compiler-Compilers. In [Baue 76], pp. 356-364. Surveys recent translator writing systems: t h e o r e t i c a l approaches, implementations, applications, and potential developments. [Hoar 73] Hoare, C.A..B., and Lauer, P.E. Consistent and Complementary Formal Theories of the Semantics of Programming Languages. &<eta Informatica, 3 (1974), pp. 135-153.. Categorizes formal semantic theories into constructive and i m p l i c i t methods, and compares the application of several technigues to a simple programming language, [Horn 76] Horning, J.J. Structuring Compiler Development, In [Baue 76], pp. 498-512. Describes the design goals of compiler development, and surveys the tools \ and technigues available. [John 78] Johnson, Hark Scott. The Design and Implementation of a Run-time Analysis and Interactive Debugging Environment. Doctoral Dissertation, in preparation: Department of Computer Science, University of B r i t i s h Columbia, 1978. Uses GRAIL as an implementation tool to translate ASPLE into code for the SPAH debugging environment., [Joh 78] Johnson, S.C. A Portable Compiler: Theory and Practice. Proceedings of the F i f t h Annual ACH Symposium on P r i n c i p l e s of Programming Languages,--Tucson,"Arizona (1978) , pp. 97-104. Review of a portable C compiler. 98 [Knut 71] Knuth, D» E. Examples of Formal Semantics. In [Engl 71 ], pp. 212-235. Synthesized and inherited a t t r i b u t e s are used to define the semantics of TURINGOL and TL/I. [Kost 73] Koster, CH. A. Portable Compilers and the ONCOL Problem, In [ Vand 74], pp. 253-264. [Lane 72] Lancaster, R.L. Semantic Primitives f o r Quick Implementation of a Family of Procedural Languages. Doctoral Dissertation: Department of Computer Sciences, Purdue University, 1972 December, [OMI Order Number PB-213 239] [Lane 76] Lancaster, P.L., and Schneider, V.B. Quick Compiler Construction Using Uniform code Generators. Software - practice and Experience. 6 (1976), pp. 83-91. Describes a c o l l e c t i o n of semantic subroutines used as a basis for implementing compilers f o r simple algorithmic programming languages. [Land 65] Landin, P. A Correspondence Between ALGOL6 0 and Church's Lambda-notation. Communications of the ACM, 8:2 (1965 February), pp.~89-101, and 8:3 (1965 March) , pp. ,158-165. Provides the f i r s t formal d e f i n i t i o n of the semantics of ALGCL60. [Ledg 75] Ledgard, H.F. "Production systems; a Formal Notation for Defining the Syntax and Translation of Programming Languages". COINS Technical Report 75A-4, Computer and Information Science Department, University of Massachusetts. 1975 August, 31p. Introduces production systems as a semantic notation emphasising c l a r i t y and naturalness. [Lee 71] Lee, C.J.C. "Source to Source Translation". Master's Thesis: Department of Computer Science, University of Toronto, 1971 January. 124p. Describes a semantic formalism for source to source translation using an abstract semantic tree, and transformations to convert such trees into a canonical form. [Lewi 68] Lewis, P.M., and Stearns, R.E. Syntax Directed Transduction. Jo2££al of the Association f o r Commuting Machinejr^ 1 5 7 3 (1968) pp. 46 5-488. i 99 [Lewi 74 ] Lewis, P.M., Bosenkrantz, D.J., and Stearns, R.E. Attributed Translations. Journal of CojjButer and Systems Sciences, 9:3 (1974), pp. ,279-307, Defines a formal automaton capable of performing attributed translations for r e s t r i c t e d classes of attribute grammars, [McKe 76] McKeeman, H.M. , Compiler Construction, Programming Language Design., In [ Baue 76], pp. 1-36, 514-524. Introduces a model for compiler construction based upon treating the semantic phase as a sequence of tree transformation. ...... [Marc 76] Marcotty, M. , Ledgard, H. F., and Bochmann, G. ?. A Sampler of Formal Definitions. ACM Computing Survey^ 8:2 (1976 J u l y ) , pp. 191-276. Develops c r i t e r i a for evaluating formal d e f i n i t i o n s , and applies them to four formal d e f i n i t i o n s of ASPLE. [Mock 58] Mock, O., Olsztyn, J . , S t e e l , T., T r i t t e r , A., and Wegstein, J. The Problem of Programming Communications with Changing Machines: A Proposed Solution. Communicat j . o n s o f the ACMX 1:8-9 (1958), pp. 12-18. Report of the SHARE adhoc committee on universal languages; proposing ONCOL and the generator and translator languages POL and ML. [Naur 63] Naur, P. (ed.) Report on the Algorithmic Language ALGOL 60. Communications of the ACM, 6:1 (1963 January), pp. 1-17, Introduces the BNF d e f i n i t i o n for ALGOL60. [Newe 72] Newey, M.C, Poole, P. C., and Waite, 8.M. Abstract Machine Modelling to Produce Portable Software - A Review and Evaluation., Software Practice and Experience. 2 (1972 April-June), pp. 107-136, Discusses the use of abstract machine modelling and macrcprocessors as an implementation tool. The design and applications of several abstract machines, including FLOB, TEXED, and AIM1, are compared, [Nori 74] Nori, K. V. , Amman, 0,, Jensen, K. , and Nageli, H. H. "The PASCAL 'P' Compiler: Implementation Notes". Technical Report Number 10, Berichte des I n s t i t u t s fur Informatik, Eidgenossische Technische Hochschule, Zurich, 1974 December. 57p. Describes the use . of PCODE i n developing a portable PASCAL compiler. 100 [ O l i o 74] Ollongren, A., D e f i n i t i o n of programming Languages hi Interpreting Automata* Academic Press, 1974. 284p. Develops VDL as a t o o l for defining abstract syntax, interpreters, and programming language semantics. [Pool 76] Poole, P. C. Portable and Adaptable Compilers. In [Baue 76], pp. 427-497. Introduces the concept of adaptability, and surveys some approaches to the problem. [Prat 75] Pratt, T. W. Programming Languages; Design and-Implementation. Prentice-Hall, (1975). 530p. [ISBN 0-13-730432-3] Surveys programming language concepts: data structures, control structures, and runtime environments, [Baih 77 ] Baiha, K, "On Attribute Grammars and th e i r use in a Compiler Writing System",,. Report A-1977-4, Department of Computer Science, University of Helsinki, Finland, 1977, 90p. [ISBN 951-45-113-1] Surveys a t t r i b u t e grammars and introduces an extended formal d e f i n i t i o n suitable f o r implementation i n a translator writing system. [Bain 74] Bain, M, A Possible Resolution of the Subroutine Interface Problem. In [Vand 74], pp. 265-272. Describes a machine independent approach to subroutine c a l l s , developed for the machine oriented language MARY. [Rich 69] Richards, M. BCPL: a Tool for Compiler Writing and System Programming. Proceedings of the AFIPS J969 SJCC, Volume 34, AFIPS Press, Montvale, New Jersey, (1969) , pp. ,557-566. Introduces BCPL and the structure of the BCPL compiler. [Bich 71] Bichards, H., The P o r t a b i l i t y of the BCPL Compiler. Software - Practice and Experience^ 1:1 (1971), pp7~135-146. Discusses various approachs to achieving software p o r t a b i l i t y , and outlines the design of OCODE, the machine independent intermediate language used tc transport the BCPL compiler. [Bich 74] Bichards, tt. Bootstrapping the BCPL Compiler. In [Vand 74], pp. 265-272. Describes the use of INTCODE, a simple low-level intermediate language, used to bootstrap the BCPL compiler. 101 [Bust 70] Bus t i n , B. , (ed.) Formal Semantics of Programming £iin3MJ3es«.; Second Courant Computer Science Symposium, Prentice-Hall, 1970 September. [ISBN 0-13-329060-3] [Samra 76] Sammet, J.E. Roster of Programming Languages for 1974-1975., Communications of the A.CMX 19:12 (1976 December), pp. 655-66 9. Roster of current, implemented U.S. high-level languages. [ S i b l 61] Sibley, B.A. The SLANG System. Communications of the AGM^ 4:1 (1961), p. 75. Describes the SLANG compiler writing system, based upon a f l e x i b l e set of intermediate l e v e l i n s t r u c t i o n s , EMIls, which are selected for a particular source language and target machine. [Snyd 75] Snyder, A., "A Portable Compiler for the Language C". Project a AC Report TR-149, Massachusetts In s t i t u t e of Technology, 1975 May. 74p. An intermediate abstract machine i s used to define the code generation phase , of the compiler. , Code generation i s defined by a machine description of the re g i s t e r classes and the corresponding machine instructions. [Stee 61] Steel, T.B., J r . UNCOL: the Myth and the Fact. Annual Review i n Automatic Programming, 2. > Pergamon Press, 1961, p.325. Review of UNCOL and i t s potential. [Vand 74] Van Der Poel, a.I., and Maarssen, I. (eds.) Maehine-Orientgd Higher Level Languages. 7 North-Holland and American E l s e v i e r , 1974. 533p. [ISBN 0-7204-2802-5] Proceedings of the IFIPs working conference on machine oriented languages, Trondheim, 1973., [Vene 76] Venema, T. "TRUST User's Guide". Technical Manual, Department of Computer Science, University of B r i t i s h Columbia, 1976, 64p. Describes the design and applications of the translator writing u t i l i t y system, a modular TWS., [Wait 70] Waite, W.M. The Mobile Programming System: STAGE2. Communications of the ACM,. 13:7 (1970 J u l y ) , pp. 421-4 29., Describes the aim, implementation, applications and e f f i c i e n c y of the STAGE2 macroprocessor. 102 [ Bait 76] Waite, W. M. Optimization. In [Baue 76], pp. 549-602. ' [ f a s i 72] Wasilew, S.G. "A Compiler S r i t i n g System with Optimizing C a p a b i l i t i e s for Complex Object Order Structures", Doctoral Dissertation; Department of Computer Science, Northwestern University, 1972 June. [Rein 71] Weinberg, G.M. , The Psychology of Coaputier Programming. J Van Nostrand, 1971. [Whit 76] White, J.R. • Towards a New Methodology for Translator Development. • In £ Dewa 76], pp. 2-13. Describes a decomposition of the semantic phase into l e v e l s of abstract data and control types. [ S i l n 71] Wilner, «.T, ; "Declarative Semantic Defi n i t i o n s as I l l u s t r a t e d by a D e f i n i t i o n of SIMULA 67". Doctoral Dissertation: Stanford computer Science Department, 1971. 211 p. [UMI Order Number 72-6024] A formal d e f i n i t i o n of SIMULA, using the technique of attribute functions associated with the BNF d e f i n i t i o n of the syntax. I [Wirt 71] f i r t h , N. The Design of a PASCAL Compiler. Software- - Practice a nd - - Exp er ience^ 1:2 (1971 A pri 1-June), pp. 135-146. The organization and implementation of a PASCAL compiler, written i n PASCAL. £ Wulf 77] Wulf, W.A, Languages and Structured Programs. In £ Yeh 77], pp. 33-60. Discusses the r e l a t i o n s h i p between programming languages and structured approaches to program design, [Yeh 77] Yeh, R. (ed.) Current - Trends in Programming Methodolojgij. Prentice-Hall, (1977) . "volume 1, 275p. [ISBN 0-13-195701-5] Appendix A. GBAIL Bepresentations GRAIL Primitive NODEs LIST, abbreviated to (<attribute-1>, <attribut e-2>, .,., <attribute-n>), or {<attribute>}, i f the number of attributes i s variable. REFERENCE CLASS(<class-type>,<class-label>,<data-type>) ABB AY (<a-type>,<a-label>,<a-dimension>,<a-bounds>) BLOCK (f Environment-node>}, {<ccntrol-node>}) EXIT(BEFEBENCE(<control-node>) ,<return^expression>) ENTEB (BEFEBENCE(<control-node>) ,<return-expression>) OPCODE (<opcode-name>, {<operand>} ) SELECT{<s-type>j<s-expression>,f(<s-value>,<s-control>)}) DEFINE(<d-type>, f<d-name>) ,<initial-value>,<data-options>) PROCEDURE(<proc-name>,<proc-type>, {{<p ar m-de cl>,<p ar m^t ype>)\,BLOCK) INVOKE (BEFEBENCE <<procedure-node>) , {<actual- parameter>) ) BETUBN(BEFEBENCE(<procedure-node>),<return-expressions*) 104 Before using the L&NCE translator writing system to develop a USL translator, the implementor must decide upon the GRAIL representation for the USL data and control structures. This appendix describes the GRAIL representation for some of the more unusual data and control structures of ALGOLS. , These representations provide a guideline for the language implementor wishing to define a GRAIL representation for some USL. In many cases, the representation i s not unique. The GRAIL representation should be as simple as possible without r e s t r i c t i n g the semantic d e f i n i t i o n of the USL or the range of possible target machines. ALGOLW Data Structures The most unusual data structure i n ALGOLW i s the typed pointer, or reference to a record c l a s s . A record class i n ALGOLW i s defined by a l i s t of variable declarations. Records are created at run-time, and can only be accessed by means of reference variables. A record class can be represented by a GRAIL CLASS as follows; CLASS <*record»,<record-name>,f<variable-dec>}) where <variable-dec> i s the GRAIL representation f o r an ALGOLW variable declaration and <record-name> the record class name from the source program. In ALGOLW, references must always be to a par t i c u l a r record, or the global constant n a i l . In a reference 105 declaration, the reference i d e n t i f i e r i s defined together with a union of record classes which the pointer may reference. Any attempt to assign a reference t c a record whose c l a s s i s not i n the reference declaration w i l l r e s u l t i n an error. Because references may be assigned at run-time to any record whose class i s i n the declaration l i s t , i t i s impossible to detect a l l i n v a l i d references at compiler-time. For example, suppose RECORD-A and RECORD-B are ALGOLW record classes and REFERENCE-A, REFERENCE-B and REFERENCE-AB are defined as follows: reference (RECORD-A) R E F E R E N C E - A ; reference (RECORD-B) R E F E R E N C E - B ; reference (RECORD-A, RECORD-B ) R E F E R E N C E - A B ; then REFERENCE-AB := REFERENCE-A; i s a vali d assignment REFERENCE-A : = REFERENCE-AB; may be a v a l i d assignment, but i n general cannot be decided at compile-time, and REFERENCE-A := REFERENCE-B; i s an i n v a l i d assignment. The record class l i s t associated with a reference i d e n t i f i e r must be maintained at run-time, and an ALGOLW reference cannot be represented as a simple primitive data type or 'address 1 as run-time type checking must be performed. A suitable GRAIL representation for an ALGOLW reference i s : CLASS(«ref',<ref-identifier>,{REFERENCE(<record-name>)}) where <ref-identifier> i s the reference i d e n t i f i e r and REFERENCE(<record-name>) i s a GRAIL REFERENCE to the representation of the record c l a s s declaration. 106 This representation allows the ALGOLW to GBAIL translator to perform a l l possible compile-time semantic checking of assignment. The run-time representation must also include the address and record class of the record currently referenced by the i d e n t i f i e r . ALGOLW Control Structures Perhaps the most complex ALGOLW control structure to represent i n GRAIL i s the for statement. The syntax of the most complete form of the for statement i s : FULI-FOB-STATEMENT : for FOB-ID := INT-EXPO step INT-EXP* until INT-EXP 2 do STATEMENT FOB-ID i s the f o r variable and INT-EXP°, INT-EXP* and INT-EXP2 are the ALGOLW expressions for the i n i t i a l value, increment and l i m i t for the FOB-ID variable, respectively. The scope of the FOB-ID i s r e s t r i c t e d to STATEMENT and i t s value cannot be modified within t h i s scope. Thus, FOB-ID i s a di f f e r e n t primitive type than an ALGOLW in teger , and may be DEFlNEd i n GBAIL as a new primitive data type *for-int», or as an •integer* primitive type with a modifier 'read-only*. Each of the expressions INT-EXP°, INT-EXP* and INT-EXP2 are computed once upon loop entry; thus, l o c a l variables must be created to save the values of INT-EXP* and INT-EXP 2. 107 The semantic d e f i n i t i o n of the termination condition i s that the loop i s re-excuted i f : FOR-ID*sgn{INT-EXP») < INT-EXP2*sgn{INT-EXP*), where *sgn* i s an i n t r i n s i c function which returns the sign of i t s integer argument. Thus, i t i s convenient to create another l o c a l variable, INT-SGN, for the value of sgn{INT-EXP*). Suppose the GRAIL representations for INT-EXP 0, INT-EXP* and INT-EXP2 are STRUCT°, STRUCT* and STRUCT2 respectively, and the representation for the STATEMENT i s STM-STRUCT., 108 A suitable GRAIL representation for the FULL—FOR-STATEMENT i s : BLOCK (DEFINE ('for-int', (FOR-ID), «#undef#»)f DEFINE ('integer 1, (INT-EXP*, INT-SGN, INT-EXP 2), »#undef#*)) (OPCODE (*int-assign', REFERENCE (FOR-ID), STRUCT0) , OPCODE ('int-assign', REFERENCE (INT-EXP*), STRUCT*), OPCODE (»int-assign», REFERENCE (INT-SGN), OPCODE (»sgn», REFERENCE (INT-EXP*))), OPCODE (»int-assign •, REFERENCE (INT-EXP 2), OPCODE (•int-fflul*, REFERENCE (INT-SGN), STRUCT 2)), SELECT («bool», OPCODE (»int-leg«, OPCODE ('int-raul», REFERENCE (FOR-ID), REFERENCE (INT-SGN)), REFERENCE (INT-EXP 2)), (('true', BLOCK (, (STM-STRUCT, OPCODE ('int-assign*, REFERENCE (FOR-ID), OPCODE ('int-sum', REFERENCE (FOR-ID), REFERENCE (INT-EXP*))), ENTER (REFERENCE (SELECT),))))))) The BLOCK consists of two GRAIL declarations for the l o c a l variables and a sequence of i n i t i a l i z a t i o n statements followed by the SELECT loop. The SELECT loop computes the loop condition and i f i t i s 'true* executes the for statement, assigns the next value to the FOR-ID and re-enters the SELECT. Appendix B. LANCE Translator Writing System QUEST Program f o r translationn »•••»••*»••<> of the USL to GBAIL, written !->• Interface by the language implementor J • ••••••-,.•»»» A I I I I ? Y i ~ I Lookup ! F i l e INQUEST (QUEST Translator) I I I I T I T De f i n i t i o n of the SCANNEB and LOOKUP for the USL, writ by the implementor from the Parser and Lockup F i l e s r - i i • " i 1 Parser I 1 Semantic I I F i l e 1 1 F i l e 1 t i i 1 1 1 \ T T • Parser • • Semantic « • Generator * • Generator • • • • • • a-j-« • • • • * • < 1 1 1 i \ T i i i 1 1 PABSEE I 1 SEMANTIC ! 1 MODULE 1 1 MODULE 1 _ j - L. _ i I I T • Scanner • • GENEBATOB • • • • I I T I SCANNIB MODULE I I i T • Lookup • • GENEBATOB • • * * • * * T * * * * * * I I T j LOOKUP ] | MODULE | i i B-1 Using LANCE to Generate a Translator 110 USL Generated Modules | SCA N NEB | MODULE i A I i I I LOOKUP MODULE A I I I PABSEB MODULE i SEMANTIC | MODULE 1 A \ LANCE IMC I T BSC MODULE LISTEB MODULE 1 T ••••••••••••• • TBEE « • MODULE • ! T CODE MODULE LANCE I n t r i n s i c Modules B-2 Structure of a DSL Translator Notation: In the diagrams above boxes enclosed by s o l i d l i n e s are generated by the LANCE for the USL, vheras boxes enclosed by dotted l i n e s are independent of the USL.. 111 Structure and Generation of a USL Translator A USL translator generated using LANCE i s composed of a an Inter-Module Controller, or IMC, together with a c o l l e c t i o n of MODULES, each of which performs a specialized function i n the process of tr a n s l a t i n g a USL program into code for the target machine. The primary functions of the IMC are as follows: (1) When the trans l a t o r i s run, the IMC f i r s t i n i t i a l i z e s each of the MODULES i n seguence, then transfers control tc the PARSES to input the USL program. Each MODULE i n the system returns control to the IMC whenever the MODULE terminates or reguests a function provided by another MODULE. Entry and exit to a MODULE i s referred to an as event. (2) The IMC also contains a l l the global data areas required by the MODULES, and provides f a c i l i t i e s f o r error recovery, tracing the seguence of entry and e x i t events, and producing snapshot dumps. The MODULES composing the USL tr a n s l a t o r can be separated into two classes; those which are independent of the USL, supplied by LANCE and referred to as i n t r i n s i c MODULES, and those which are generated for the USL by the language implementor, referred to as generated MODULES. MODULES are generated using the TRUST TWS module generators, together with the QUEST compiler INQUEST. 112 I n t r i n s i c MODULES There are four i n t r i n s i c MODULES supplied by LANCE: the MSC, LISTER, TREE-MAJAGER and CODE modules,, (1) MSC (Module System Communicator). , The MSC functions as the operating system interface for the USL translator. When another MODULE returns an event reguesting an operating system function, such as al l o c a t i n g and releasing space during execution, the IMC c a l l s the BSC, then returns to the MODULE which generated the event. (2) LISTER. The LISTER provides a l l routine output requests for other MODULES in the USL tran s l a t o r . A c a l l to the LISTER has the f o l l o w i E g format: LISTER ( F i l e , String, Length, Presend, Postsend) F i l e i s the f i l e number, i n the range 1 through 10. String i s the address of the output s t r i n g , of s i z e Length, which i s appended to the output buffer for that F i l e . I f the Presend f l a g i s set the buffer i s output to the f i l e and emptied before the st r i n g i s moved to the buffer. S i m i l a r l y i f the Postsend f l a g i s set the buffer i s output after the s t r i n g has been moved i n . (3) TREE-MANAGER. The TREE-MANAGER module i s ca l l e d by PARSER events to build the i n t e r n a l parse tree and by SEMANTIC events to traverse the parse tree. There are four primary functions i n the TREE-MANAGER MODULE: 113 Build-tree-node (Production, LHS-size) Build-tree-terminal (Terminal-name) Descend-tree (Branch, Sub-branch) Ascend-tree Because of the structure of the translator, the entire parse tree i s maintained i n t e r n a l l y . The LB (k) parser builds the parse tree bottom-up, whereas the SEMANTIC MODULE traverses i t from the top downwards, (tt) CODE. The CODE MODULE i s the f i n a l MODULE c a l l e d during the translation of a USL program., I t generates STACODE by traversing the GBAIL semantic representation of the USL program. Control i s transferred to the CODE MODULE when the SEMANTIC MODULE terminates and the entire GBAIL representation has been b u i l t f o r the USL program., The code phase i s invoked by the procedure c a l l : Code-phase (Grail-structure-array, Structure-head, Structure-size, Trace-option). . Generated MODULES Before using LANCE to generate a trans l a t o r , the language implementor must be f a m i l i a r with the f a c i l i t i e s provided to generate the PABSEB, SCANNEB, LOOKUP and SEMANTIC MODULES., This section concentrates upon the structure of each of these MODULES, and the i r intercommunication, rather than the d e t a i l s 114 of the MODULE s p e c i f i c a t i o n and generation. Although i n prin c i p l e the language implementor does not need to understand the int e r n a l structure of the USL translator or the LANCE TWS to generate a translator, the majority of implementors are 'systems programmers' who prefer to t a i l o r the system to s u i t their intended applications. This section outlines the design organization and constraints imposed by LANCE, so that sophisticated users can undertake systematic modifications. I t i s presumed that the language implementor i s already f a m i l i a r with the TRUST TWS [Vene 76] together with ALGOLW, the source language for the LANCE c o n t r o l l e r . The four generated MODULES i n the USL trans l a t o r can be separated into two independent classes, the syntax MODULES: PARSER, SCANNER and LOOKUP, and the SEMANTIC MODULE. The syntax MODULE generators reguire the language implementor to write a description of the external format of the USL, using the Lookup and Parser F i l e s output from INQUEST. The syntax generators are fixed components of TRUST and the language implementor i s unlikely to wish to modify these routines.,, By contrast, the SEMANTIC MODULE generator does not reguire the language implementor to modify the SEMANTIC F i l e , as i t can be input d i r e c t l y to the generator. The format of the Semantic F i l e , unlike the Lookup and Parser F i l e s , i s not designed to be easily comprehensible to the language implementor; rather i t consists of strings i n a simple macroprocessing language TOSI, a tree-oriented s t r i n g interpreter based upon TRAC, developed at 115 the University of B r i t i s h Columbia [Abra 76], Thus, the language implementor who wishes to modify or extend the QUEST f a c i l i t i e s f or defining the GBAIL semantic representation must be familiar with TOSI, the input source language for the Semantic Generator., A l l the generated MODULES i n the USL trans l a t o r use an event communication protocol developed for TRUST. Each MODULE provides a class of different entry events, composed of messages and reguests from other MODULES, and generates a class of exit events composed of messages and reguests to other MODULES. Frequently, entry and e x i t events occur i n corresponding pairs. For example, a PARSER MODULE e x i t event to *Getspace* generates an entry event to the MSC MODULE. The MSC MODULE returns with an exit event »Space Obtained* and the PARSER MODULE i s entered with the corresponding event. Although t h i s method of communication mak.es the flow of control exceedingly tedious to trace, i t does permit f l e x i b l e control as MODULES can communicate as either procedures or coroutines. Each event consists of an event number and a l i s t of parameters which are stored i n an array, or communication block, associated with each generated MODULE. Each generated module can be functionally defined by a table of i t s associated entry and e x i t events. , 116 The structure of each of the generated MODULES i s outlined below, together with the associated entry and e x i t events. (1) PARSES. The Parser Generator analyzes the BNF d e f i n i t i o n of the USL syntax, and attempts to generate an LR (k) parser. The BNF de f i n i t i o n of the USL syntax i s output in the Parser F i l e by INQUEST from the QUEST program defining the USL translator. Since INQUEST does not analyze the BNF syntax of the USL, ambiguities and errors i n the USL syntax are not detected u n t i l the Parser F i l e i s input to the TRUST Parser generator. Although the Parser Generator provides tracing of the grammar tr a n s i t i o n s and loohahead, these assume that the user i s f a m i l i a r with the theory of LR(k) parsing [Aho 72]. Frequently, i t i s necessary to modify the i n i t i a l syntactic d e f i n i t i o n of the USL to eliminate ambiguities which only become apparent during the analysis of the grammar. The Parser Generator provides primitive f a c i l i t i e s for building error recovery into the USL PARSER; however, the IMC attempts no error recovery, When the PARSER detects an unacceptable token, the IMC l i s t s the token and terminates parsing the USL program. Any i d e n t i f i e r i n the USL BNF which does not appear on the l e f t hand side of a syntax production i s assumed by the Parser Generator to be a terminal symbol i n the USL grammar. I t associates a fix e d integer, or lexeme number, with each of these terminal symbol, and the USL PARSER 117 MODULE which i t generates expects a sequence of these lexeme numbers as i t s input. The PASSER controls both the SCANNER and the LOOKUP MODULES as i t generates e x i t events which fetch the next lexeme number, and access the symbol table. PARSER Event Table event Name #- Parameters Entry Exit prstart 1 The parser i s entered with t h i s event to i n i t i a l i z e . When the parser has i n i t i a l i z e d , i t c a l l s the scanner to input the USL program. prabort 2 The parser e x i t s with t h i s event i f an irrecoverable syntax error occurs.. prgetspace 3 Mem,size ?,size prfreespace a Mem,size prreduce 5 rule num, no of e l t s When a production i s reduced the parser e x i t s with t h i s event and the IMC c a l l s the TREE-MANAGER MODULE to add a node to the parse tree. prnext 6 parnum,lexnum The parser exits with t h i s event to reguest a new token from the scanner praccepted 7 parnum,lexnum When a token i s accepted, the parser e x i t s with t h i s event and the IMC c a l l s the TREE-MANAGES MODULE to add a leaf to the parse tree. prdone 8 Exit event f o r a successful parse., prerror 9 Mes,len,sym,frem,to The parser e x i t s with t h i s event for recoverable syntax errors p r f a t a l 10 Mes,len,sym prseek 11 Str,len ?,?,num 118 SCANNER. The SCANNER i s c a l l e d by the PARSER to return the next lexeme of the USL input program. The Scanner Generator provided by TRUST allows the USL implementor to define USL lexemes as regular expressions over an input alphabet, and associate the corresponding lexeme numbers expected by the PARSER,f Lexemes i n the USL are defined as seguence of characters, or as compound regular expressions of other lexemes. The USL lexeme d e f i n i t i o n s cannot resolve when a character seguence has an ambiguous lexeme i n t e r p r e t a t i o n , for example, when distinguishing reserved words from i d e n t i f i e r s . This f a c i l i t y i s provided by the TRUST Lookup Generator described b e l o w . B e c a u s e the SCANNER i s controlled by the PARSER, the SCANNER has a simple set of events. .,. SCANNER Event Table event Name # Parameters Entry Exit s c i n i t 1 The scanner i s entered with t h i s event to i n i t i a l i z e i t s tables. scgetlexeme 2 parnum,len,Lex repr The scanner i s entered with t h i s event from the parser. The scanner e x i t s via t h i s event for lexemes which do not reguire the LOOKUP module. scgetbuffer 3 Buff, len Reguests the next l i n e of the USL program. scgetspace 4 Hem, size ?,size scfreespace 5 Mem,size sccharerror 6 r e p i char char i n error The scanner exits with t h i s event when the next character from the input stream i s i n v a l i d . 119 LOOKUP. The LOOKUP MODULE provides a simple symbol table f a c i l i t y for the USL translator. Entries in the table consist of string s , with an associated index i n the table, referred to as an 'idnum*. The table i s used to distinguish keywords i n the USL syntax from i d e n t i f i e r s . The LOCK DP i s i n i t i a l i z e d with the l i s t of keywords, and the i r associated idnum, by the Lookup Generator. During parsing the USL program i d e n t i f i e r s and str i n g s are entered into the table and assigned sequential idnum's, which can subsequently be used to retrieve the string representation. .  LOOKUP Event Table event Name # Parameters Entry Exit l k i n i t 1 I n i t i a l i z e tables and return. lknew 2 Str,len Str,len,idnum The LOOKUP i s entered with• t h i s event from the SCANNER. The event i s returned i f the st r i n g i s a new entry. lkkeyword 3 Str,len Str,len,parnum This event i s returned i f the string i s a keyword. lkfound 4 ?,?,num Str,len,idnum This event i s returned i f the s t r i n g i s not a keyword but has previously been entered in the table. lknotfound 5 ?,0,-1 th i s event i s returned i f the s t r i n g cannot be located i n the table. lkgetspace 6 Mem,size ?,size lkfreespace 7 Mem,size 120 INQUEST compiles QUEST programs into f i l e s which define both the syntax and the corresponding GBAIL semantic representation of the USL. The Semantic F i l e produced from a QUEST program consists of strings i n the macroprocessing language TOSI, INQUEST composes a TOSI semantic s t r i n g f o r each USL syntax production which i s output to the Parser F i l e . The TOSI semantic strings defined for the syntax productions build the GBAIL structure representation for the parse tree node and control subseguent tr a v e r s a l of the parse tree. Each semantic s t r i n g generated by INQUEST i s composed of a sequence of nested c a l l s to the TOSI macro l i b r a r y . , The macro l i b r a r y i s independent of the USL, and i s supplied by LANCE as a component of the SEMANTIC MODULE. TOSI macro c a l l s are used to evaluate s t r i n g and numeric functions, and to a l t e r the execution environment by defining new macros or generating events from the SEMANTIC MODULE. The language implementor can a l t e r the functions performed by the TOSI semantic strings by modifying or extending the macro l i b r a r y supplied by LANCE. The semantic strings use three categories of macro c a l l s : (1) TOSI I n t r i n s i c Macros. TOSI provides a small set of i n t r i n s i c macros f o r arithmetic and cont r o l functions. The macro l i b r a r y supplied by LANCE i s i t s e l f defined i n terms of thi s primitive macro set. / Some examples of the primitive macros are as follows: (#EQ,s1,s2,true-string,false-string) (SUM,numeric-string,numeric-string) 121 (SUBSTB, s t r i n g , start-index, length) (DEFINE,macro-name,arg-1,arg-2,...,arg-n,macro-body) (2) Library Control Macros, Some common sequences of primitive macros are DEFINEd in the macro l i b r a r y to simplify the control structures i n the semantic strings, for example: (WHILE,condi tion, what) (3) Library event Macros. The majority of macros DEFINEd i n the macro l i b r a r y are used to generate SEMANTIC events to reguest external functions provided by the IMC and other MODULES, such as c a l l i n g the TREE-MANAGER to descend the parse tree or c a l l i n g the LISTER to output a message. Although GRAIL structures could be represented as TOSI stri n g s and d i r e c t l y manipulated within the SEMANTIC MODULE, i t i s much more e f f i c i e n t to implement them using ALGOLW records i n the global data area of the IMC. S i m i l a r l y , the stack associated with the seguence of QUEST semantic procedures c a l l s can be more e f f i c i e n t l y implemented in ALGOLW within the IMC than i n TOSI. Thus, the IMC provides both a data and control environment for the SEMANTIC MODULE., The TOSI semantic strings generated by INQDEST i n t e r a c t with t h i s environment by means of SEMANTIC events. Library macros are provided to enable the semantic st r i n g s to generate c a l l s f o r these of these SEMANTIC events, A simple example of the TOSI semantic s t r i n g generated by INQUEST for the ASPLE tran s l a t o r i s given i n Appendix E. The 122 events associated with the SEMANTIC MODULE are l i s t e d below. SEMANTICS Event Table event Name # Parameters Entry Exit sminit 1 Entered to i n i t i a l i z e the TOSI semantics. smstart 2 Entered after the PABSEB MODULE has completed generation of the parse tree to commence generating GBAIL program representation. smprint 3 Line,len smload-addr tt ,mem addr Name smgetspace 5 Mem,size ,size smfreespace 6 Mem,size down-tree 7 branch,list-sub-branch Semantic request to descend the parse tree. up-tree 8 ascend-tree?,reset-stack? Semantic request to ascend the parse tree or reset the production stack., s e l e c t - s t r i n g 9 ,String,len string-num,prodn Semantic request to return the TOSI s t r i n g associated with the syntax production. terminal-labellO String,len Semantic reguest to return the lab e l from the parse tree. structure 11 address , l a b e l , l e n , a t r - l i s t , l e n , atrs Semantic reguest to build a GBAIL NODE with a NODE type given by the l a b e l parameter and a l i s t of attr i b u t e s encoded i n the a t t r - l i s t . template 12 Flag the next structure b u i l t as a template. mark-stack 13 locals,params, param-list ,len Used to pass parameters to a semantic procedure. assign 1tt lhs-code,len, rhs-code,len Generated by a QUEST assignment statement. extend 15 l i s t - c o d e , l e n , elem-code,len Generated by a QUEST extend statement. delete 16 l i s t - c o d e , l e n , elem-code,len 123 Generated by a QUEST delete statement. up-structure 17 up-address ,structure-code,len Generated by a QUEST up structure expression. enter-loop 18 repeat-loop? 19 case-template 20 bind-structure21 enter-case 22 case-structure23 exit-case? 2 4 external 25 error 26 file - o u t p u t 27 display 28 terminate? match-flag case-code tree-branch , s t r uc t ure -co de,len structure-code,len, t-address increment case-code,structure-code,len buffer,len message,len line,len ##file,d1,d2 structure-code,len 124 Appendix C. STACKER: the STACODE V i r t u a l Machine TOSR Display Stack (Indexed by Lexical Level) I ±-I I N-1 DISPTR i >I 0 i • I J STACK BASE Execution Stack i TOS I Level N Ease Level N-1 Base ^ I Level 0 Base C-1 STACODE Environment 125 Appendix D. QUEST Syntax Chart Syntax Chart Notation: r A T I B | choose one of A, B or C «. C J £abcf •abc* i s optional {abc} »abc*, »abcabc» , »abcabcabc*, .... Reserved words i n QUEST are boldfaced, and non-terminals i n the syntax d e f i n i t i o n are c a p i t a l i z e d . , Terminal symbols in the QUEST syntax are either i d s , USL i d e n t i f i e r s composed of any sequence of characters enclosed i n angle brackets (e.g., <0SL-procedure>) or str i n g s , composed of any sequence of characters enclosed i n quotes (e.q., *list»). QUEST comments are delimited by /* and */. QUEST-PROGRAM ::= > quest (structure i d :•= STRUCTURE ;} r e x t e r n a l:(strinq) {id ( {id ,} i d ) : % | i % .;} «- QUEST-PROCEDURE J charter {PRODUCTION-SPECIFICATION .} endguest PRODUCTION-SPECIFICATION ::= i d { {id ,} i d ) : {PRODUCTION-RHS ;}. PRODUCTION-RHS . PRODUCTION-RHS ::= r ELEMENT -, {| r * i U % ^ QUEST-PROCEDURE^ % «• ( {ELEMENT} ) | \ i d J *. * j ELEMENT ::= r s t r i n q 1 I i d | «~ i d . i d J QUEST-PROCEDURE ::= {•Structure {id ,} i d ;^ {STATEMENT ;}. STATEMENT S T A T E M E N T :;= r i d := S T R U C T U R E •, | append S T R D C T O B E to S T B O C T U B E | | delete S T B U C T U B E frOB S T B U C T U B E | I traverse id { { L - S T B U C T U B E ,} L - S T B U C T U B E ) I I c a l l i d { { L - S T B U C T U B E ,} L-ST8UCTURE ) | 1 f o r i d do { S T A T E M E N T ,} S T A T E M E N T od | I error { st r i n g ) I I display ( i d ) I «- case S T B U C T U B E J { S T B U C T U B E : C A S E - L I S T ) {•else :: CAS E - L I S T % endcase C A S E - L I S T r e x i t -, f {STATEMENT ;}«| |» S T A T E M E N T ^ | 1 «- r e i a t c h J S T B U C T U B E ::= r i d -i | s t r i n g I | up ( id ) I | lookup ( i d ) | »- G B A I L - N O D E ( { L - S T R U C T U R E L - S T R U C T U B E ) •» L - S T B U C T U B E ::= f i d £ STBUCTUBE*! G R A I L - N O D E ::= r ' • l i s t ! i | • c l a s s 1 i I •array* I | *block* I I •enter• I J ' e x i t ' I I 'opcode * I I 'de f ine ' | I * s e l ec t * I I * procedure' I | *invoke* j I * return* j I •reference * | I * s u b l i s t • l I *set* I »• s t r i n g Appendix E. The ASPLE Compiler QUEST Source L i s t i n g quest /* Define the global variables.,*/ s tructure <program-code> := ' l i s t * ; /* Define the global procedures. */ <search-id> (<id-mode>,<id-ref>,<dcl-list>,<id-name>) % case <dcl-list> • s u b l i s t ' ('define* (<id-mode>:, •s u b l i s t * (<id-ref>:<id-name>))) : ex i t e lse : <id-mode> := *undef* ; <id-ref> := *undef* ; e x i t endcase % . /* Define the ASPLE syntax and associated semantics,, charter <program> (<program-list>) : •begin* <dcl-train> 'semicolon' <stm-train> 'end' % s t r u c t u r e : <dcl~list>,<stm-list> ; <program-code> := •block' (<dcl-list>:»list» , < s t i - l i s t > : ' l i s t M traverse <dcl-train> (<dcl-list>) ; traverse <stm-train> (<stm-list>,<dcl-list>) %-... /* declarations */ <dcl-train> (<dcl-list>) ; <declaration> % % ; <declaration> 'semicolon' <dcl-train> % traverse <declaration> (<dcl-list>) ; traverse <dcl-train> (<dcl-list>) % . <stm-train> (<stm-list>,<dcl-list>) : <statement> % % ; <statement> 'semicolon* <stm-train> % traverse <statement> (<stm-list>,<dcl-list>) ; traverse <stm-train> (<stm-list> #<dcl-list>) % . 128 <declaration> (<dcl-list>) : <ffiode> <idlist> % structare 45 <mode-code>,<idlist-code> ; traverse <mode> (<mode-code>) ; <idlist~code> := • l i s t * ; append 'define* (<mode-code>#<idlist-code>) to <dcl-list> ; 50 traverse <idlist> (<idlist-code>,<dcl-list>) % <mode> (<mode-code>) ; 'bool* % <mode-code> := *bool' % ; 55 •int* % <iaode-code> := ' i n t ' % ; 'ref* <mode> % strnetore <deref-code> ; 60 traverse <mode> (<deref-code>) ; <mode-code> := M i s t * (*ref*,<deref-code>) % . <idlist> {<idlist-code>,<dcl-list>) z <id> 65 % structure <id-narae>, <id-ref>, <id-ntode> ; <id-name> := lookup (<id>) ; c a l l <search-id> (<id-mode>#<id-ref >,<dclr-list>,<id-name>) ; 70 case <id-mode> 'undef»: append <id-name> t o <idlist-code> ; e x i t e l se : error ( ' i d e n t i f i e r i s already defined') ; e x i t 75 endcase % ; <id> 'comma* <idlist> % s tractare <id-name>, <id-ref>, <id-mode> ; traverse <idlist> (<idlisti-code>,<dcl-list>) ; 80 <id-name> := lookup (<id>) ; c a l l <search-id> (<id-mode>,<id-ref>#<dcl-list>,<id-name>) ; case :<id-mode> •undef: 85 append <id-name> to <idlist-code> ;.exit e l se : e r r o r ( • i d e n t i f i e r i s already defined*) ; e x i t endcase :% . 129 /* statements */ 90 <statement> (<stm-list>,<dcl-list>) : <id> * assign* <exp> % s tructare <exp-code>,<id-name>,<id-ref>,<id-mode> ; <id-name> := lookup (<id>) ; 95 /•Search for the i d i n the declaration l i s t . */ c a l l <search-id> (<id-mode>,<id-ref>,<dcl-list>,<id-name>) ; case :<id-mode> /* Is the i d e n t i f i e r undefined? */ 100 * undef* : e r r o r {'undeclared i d e n t i f i e r * ) ; e x i t e lse : traverse <exp> (<exp-code>#<id-mode>,<dcl-list>) ; append 'opcode* 105 (*:=*, 'reference* (<id-mode>) , 'reference* (<id-ref>)j<exp-code>) t o <stm-list> ; ex i t endcase % ; 110 ' i f * <exp> 'then' <stm-train> *fi» % s tructure <exp-code>,<exp-mode>,<if~stm-list> ; <exp-mode> := *bool' ; traverse <exp> (<exp-code>,<exp-mode>,<dcl-list>) ; 115 <if-stm-list> ;= ' l i s t * ; traverse <stm-train> (<if-stm-list>,<dcl-list>) ; append 'select* ('bool*,<exp-code># • l i s t * { ' l i s t ' {»true* #<if-stm-list»)) to <stm-list> 31 ; 120 • i f <exp> 'then* <stm-train>.<t> •else* <stm-train>.<f> ' f i * % s tructure <select-code>,<exp-code>,<exp-mode>, 125 <t-stm-list>,<f-stm^list> ; <exp-mode> := 'bool* ; traverse <exp> (<exp-code>,<exp-mode>,<dcl-list>) ; <t-stm-list> := ' l i s t * ; <f-stm-list> := M i s t * ; 130 traverse <stm-train>.<t> {<t-stm-list>,<dcl-list>) ; traverse <stm-train>.<f> {<f-stm-list>,<dcl-list>) ; <select-code> := 'select' {*bool*,<exp-code>, • l i s t ' ( ' l i s t * {'true*,<t-stm-list>) , ' l i s t * ('false*,<f-stm-list»}) ; 135 append 'exit* ('reference* (<select-code>) ,) to <t-stm-list> ; append <select-code> t o <stm-list> % ; 130 •while 1 <exp> 'do* <stm-train> •end* % structure 140 <exp-code>,<exp-mode>,<while-list>,<while-code> ; <exp-mode> := 'bool' ; traverse <exp> (<exp-code>,<exp-mode>,<dcl-list>) ; <while-list> := ' l i s t * ; traverse <stm-train> {<while-list>,<dcl-list>) ; 145 <while-code> := 'select* ('bool*,<exp-code>, • l i s t ' <* list«(»true',<while-list>))) ; append * enter* ('reference* (<while-code>) ,) to <while-list> ; append <while-code> t o < s t i - l i s t > % ; 150 / • t r a n s p u t statements */ 'input* <id> % s t r u c t u r e : <input-code>,<id-name>^<id-ref>,<id-mode> ; 155 <id-name> : = . lookup (<id>) ; /* Search for the id i n the declaration l i s t . */ c a l l <search-id> (<id-mode>,<id-ref>,<dclrlist>,<id-name>) ; <input-code> := 'reference* (<id-ref>) ; 160 case <id-mode> /* Is the i d e n t i f i e r undefined? */ 'undef : e r r o r {'undeclared i d e n t i f i e r * ) , ; e x i t /* Otherwise i s the i d mode primitive? */ 165 'set' (*int»,'bool') : append 'opcode* {*input*,*reference» (<id-mode>),<input-code>) to <stm-list> ; e x i t /* Dereference and renatch. */ 170 • l i s t * ('ref•,<id-mode>:) : <input-code> :- 'opcode* (»dref ,'reference' (<id-mode>) ,<input-code>) ; renatch endcase:% ; 175 •output' <exp> % s tructure <exp-code>,<exp-mode> ; <exp-mode> := 'set' (*int*,*bool*) ; 180 traverse <exp> (<exp-code>,<exp-mode>,<dcl-list>) ; append 'opcode' ('output*,'reference' (<exp-mode>),<exp-code>) to <stm-list> % . 131 /* expressions */ 185 <exp> C<exp-code>#<exp-mode>,<dcl-list>) : <factor> % % ; <exp> ' •' <factor> % structure <subexp-code>,<factor-code> ; 190 /* Check for a *bool» or 'i n t * exp mode. */ case <exp-mode> •set* (»bool*,*int*) : traverse <exp> (<subexp-code>,<exp-mode>,<dcl-list>) ; 195 traverse <factor> (<f actor-code>, <exp-mode>, <dcl-list>) ; <exp-code> := •opcode* <*•••»•reference* j<exp-mode>), <subexp-code>,<factor-code>) ; 200 e x i t e l se •;: e rror (** cannot y i e l d a reference*) ; e x i t endcase % . 205 <factor> (<factor-code>,<factor-mode>,<dcl-list>) : <primary> % % ; <factor> *** <primary> % structure <subfac-code>,<primary-code> ; 210 /* Check for a *bool* or *int * factor mode,,*/ case <factor-mode> •set* (*bool*,»int») : traverse <factor> (<subfac-code>,<factor-mode>,<dcl-list>) ; 215 traverse <primary> (<primary-code>,<f actor-mode>,<dcl-list>) ; <factor-code> := * opcode* {***,'reference' (<factor-mode>), <subfac-code>,<primary-code>) ; 220 e x i t e lse : e r r o r {'* cannot y i e l d a reference') ; e x i t endcase % . 132 <primary> {<primary-code>,<primary-mode>,<dcl-list>) •: <id> % structure <id-name>,<id-mode>,<id-ref> ; <id-name> := lookup (<id>) ; 230 /* Search for the id i n the declaration l i s t . */ c a l l <search-id> (<id-mode>,<id-ref>,<dcl-list>,<id-name>) ; <primary-code> := 'reference* (<id-ref>) ; /* If the primary mode i s not a simple type, then 235 increment the reference l e v e l of the i d mode. */ case <primary-mode> ' l i s t ' (*ref»,) : <id-mode> := ' l i s t ' ('ref','reference * (<id-mode>>) ; e x i t 240 else : e x i t endcase ; case <id-mode> /* Is the i d e n t i f i e r undefined? */ 'undef* : 245 e r r o r ('undeclared i d e n t i f i e r * ) ; e x i t /* Otherwise match the primary against the i d mode. */ <priraary-mode> : ; <primary-mode> := <id-mode> ; e x i t /* Deref and rematch i f the modes do not match. */ 250 • l i s t * (*ref*,<id-mode>:) : <primary-code> := 'opcode* (*dref *,'reference * (<id-mode>) ,<primary-code» ; reaatch /* Error: no further dereferencing i s possible. */ 255 e lse : /* The error message depends upon the expected type of the primitive mode. */ case <primary-mode> • l i s t * ('ref',) : 260 error ('expression cannot be dereferenced'); ; e x i t e l se : error {'id mode incompatible with exp type*) ; e x i t 265 endcase ; e x i t endcase:% ; 133 <constant> 270 % s t r u c t u r e . <constant-name>,<constant-type>,<constant~ref> ; traverse <constant> (<constant-name>,<constant-type>) ; /* Is the constant already defined? */ case <dcl-list> 275 •subl i s t * (• define* (<constant~type>s , • l i s t * <<constant-ref>:),<constant-name>)) : e x i t e l se : append •define* (<constant~type>, 280 * l i s t * (<constant-ref>:*###const###»), <constant-name>) to <dcl-list> ; e x i t endcase ' ; /* Check the constant type against the primary mode. */285 case .<constant-type> <primary-mode> : <primary-code> : = »reference* (<constant-ref >) ; <primary-mode> := <constant-type> ; ex i t 290 else : e r r o r ^incompatible constant mode*) ; e x i t endcase % ; •lbrak* <exp> 'rbrak* % % ;. 295 *lbrak* <compare> *rbrak» % % . <compare> (<compare-code>,<compare-mode>,<dcl-list>) : <exp>.<1> <compop> <exp>.<r> % structure 300 <ccmpop-name>,<exp-l-code>,<exp-r~code> ; /* test the type of the compare expression */ case <compare-mode> •bool* : traverse <exp>.<1> 305 (<exp-l-code>, *int» ,<dcl-list>) ; traverse <exp>.<r> (<exp-r-code>, »int'*<dcl-list>) ; traverse <compop> (<compop-name>) ; <compare-code> ;= 'opcode' 310 (<compop-name>, *int *,<exp-l^-code>, <exp-r-^code>) ; ex i t e l se : e r r o r ('unexpected boolean expression') ; e x i t endcase % . .. 315 <compop> (<compop-name>) : »ne* % <ccmpop~name> ;=*-*=* % ; % <compop-name> := •=• % •., <ccnstant> (<constant-name>,<constant-mode>) : 'true* SI <constajit-name> := 'true' ; <constant-mode> := *bocl* % ; •f a l s e ' % <constant-name> := 'false* ; <constant-mode> ;= 'bool' % i <int-constant> SI <constant-name> : = . lookup (<int-constant>) <constant-mode> := 'int* 35 . endquest 135 ASPLE Parser F i l e This section l i s t s the syntax productions for ASPLE, generated by INQUEST from the preceding QUEST program. program : •begin 1, d e l - t r a i n , 'semicolon*, stm-train, »end* . d e l - t r a i n : declaration . de l - t r a i n : declaration, 'semicolon*, d e l - t r a i n . stm-train : statement . stm-train : statement, 'semicolon*, stm-train . declaration : mode, i d l i s t , mode : •bool' . mode : • i n t ' . mode : ' r e f , mode * i d l i s t : i d . i d l i s t : i d , 'comma*, i d l i s t . statement : i d , <assign', exp ., statement : ' i f , exp, 'then', stm-train, ' f i ' . statement : ' i f , exp, *then', stm-train, 'else*, stm-train, * f i * . 136 statement : •while 1, exp, 'do•, stm-train, •end 1 . statement : • input*, i d . statement : •output•, exp . exp : factor ., exp : exp, •+•, factor ., factor : primary . factor i factor, •**, primary . primary : id . primary : constant . primary : •lbrak*, exp, «rbrak* . primary : •lbrak*, compare, * rbrak* .fi compare : exp, compop, exp . compop : *ne* . compop : •eq* . constant : 'true* . constant : 'f a l s e ' . constant : int-constant . 137 ASPLE Semantics F i l e This section includes an example of the TOSI semantic strings from the Semantics F i l e generated by INQUEST. (I) TOSI semantic s t r i n g generated for the QUEST global semantic procedure <search-id>. nodetype 200 s t r i n g 1 = ti |$set-case-level$,1) { while, < <#eg* <$test-case$) , 2) >,< <$start-case.$,p3) <#eg, ($test-case$) ,1,< ($initialize-templateS) <#eg, ($template-match$, ($s$,sublist, ($s$, define, ($s-binding$, p1, ($s$,,,0)) ($s$,sublist, <$s-binding$,p2,pU) ,1) ,2) , 1) ) ,1 ,< ($exit-case$)>)>) ($test-case$),1,< ($assign$,p1, ($s$,undef,,0}) ($assign$,p2, ($s$,undef ,,0)) ($exit-case$)>)>) ($set~case-levels,-1) " ; 138 (II) TOSI semantic string generated for the QUEST semantic procedure associated with the production for a <program>. nodetype 1 s t r i n g 1 = ($allocate-local$,2) ($assign$, g1, ($s$,block, ($s-binding$,l 1, ($s$,list,,0)) ($s~binding$,12, ($s$,list,,0)) ,2)) {$godown$,1,0,11,1) ($godown$,2,0,12 11,2) n ; 139 (III) TOSI macros. (define,$godown$,|branch|,\sub-branch|,\parameters|,|p-count|, % This macro c a l l s LANCE to go down the |branch| and |sub-branch| of the current node i n the syntax tree, using the |p-count| parameters i n the JparametersJ s t r i n g . The Jparameters| string i s decoded by LANCE, then the s t r i n g associated with the new node i s processed. When processing i s completed, control moves back up the syntax tree. ,% • < (define,#temp#,<j parameters|>) (event,7, | branch], jsub-branch|) % go down the tree 55 (event,13,0,|p-count|,(addr ,# temp* j,(length,<|parameters|>)) (event,9,1,0) % s e l e c t the s t r i n g % (param,2,3) (event,8,1,1) % return up, and reset the stack %>) (define,$allocate-lccal$,Jl-count|, SJ This macro c a l l s LANCE to alloca t e space i n the semantic stack for |l-count| l o c a l structure variables. ,% < (event,13, |l-count| ,0) >) (define,$s-binding$,|actual-parameterJ,I structure}, % This macro sets the binding of the jactual-parameter| stack address to the |structure|, and then returns the |structure|. The binding i s only performed i n a template i f the template match succeeds. % < (define,#binding#,<|actual-parameter| >) (define,#structure#,<|structure|>) (event,21,(addr,#binding#),(length,<|actual-parameter|>), (addr,#structure#),(length,<|structurej>)) J s t r u c t u r e d ) 140 Appendix P. ASPLE Program and Translation ASPLE Source /* ASPLE: example 1 */ /* This i s a simple ASPLE program to test */ /* the translator generated by QOEST. */ /* author: W. Appelbe */ /* date : 1977 Hay 18 */ begin i n t x, y, z; input x; y := 1; z := 1; i f {x = 0) then while <z = x) do z := z + 1; y := y * z end f i ; output y end 141 GBAIL Bepresentation Structure Display: node # l i s t node # block ( 7 define 5 i n t i 8 2 11 y 14 x ) 27 define 24 i n t ( 25 ###const#### ) 23 1 50 define 47 i n t g 48 ( ###const#### ) 46 0 ) < 21 opcode 19 input 20 ## reference : 5 18 ## reference : 14 32 opcode 29 * • • 30 ## reference : 24 31 ## reference : 11 28 ## reference : 25 40 opcode 37 • — • 38 ## reference : 24 39 ## reference : 8 36 ## reference : 25 104 select 100 bool 54 opcode 52 = 53 i n t 44 ## reference 51 ## reference • ( ( 101 true 26 49 14 48 102 103 142 { 55 96 sel e c t 92 bool 65 opcode 63 64 i n t 59 ## reference : 8 62 ## reference : 14 ( 95 < 94 93 true { 66 79 opcode 76 : = 77 ## reference : 24 78 ## reference : 8 75 opcode 73 • 74 ## reference : 24 69 ## reference : 8 72 ## reference : 25 91 opcode 88 : = 89 ## reference : 5 90 ## reference : 11 87 opcode 85 * 86 ## reference : 5 82 •## reference : 11 84 ## reference : 8 99 enter 97 ## reference : 96 98 *undef* ) ) ) ) ) ) 112 opcode 110 output 111 ## reference : 5 \ STACODE Output Program ENT0004 ENT0104 ENT0096 EXI0094 EXI0096 EXI0102 EXI0104 EXI000 4 LABEL CONST INT, CONST INT, CONST INT, CONST INT,1 CONST INT,0 DUPL CLEAR TBE TOS READ INT STORE INT,000,002 LOAD INT,000,003 STORE INT,000,001 LOAD INT,000,003 STORE INT,000,000 LABEL LOAD INT,000,002 LOAD INT,000,004 NEQ BRANCH FALSE,EXI0102 LABEL LOAD INT,000,000 LOAD INT,000,002 NEQ BRANCH FALSE,EXI0094 LOAD INT,000,000 LOAD INT,000,003 ADD INT STORE INT,000,000 LOAD INT,000,001 LOAD INT,000,000 HUL INT STORE INT,000,001 BRANCH UNCOND,ENT0096 LABEL LABEL LABEL LABEL LOAD INT,000,001 WRITE INT LABEL 144 General Index adaptability ,. 6 ALGOL60 ...,........ ...,...................................... 24 AIG0L68 . .. .. , . ........41 ALG 0L68 C . ............. ........ .15 ARB AY . . . . . . . . . . . .. . . . . .... . . ..................... . . . . ... . . 35 ASPLE ... ..................... 88 Attribute .....,.............................................. 27 Attribute Grammars ........... ..25 BCPL ... .... ........... .... . .................... ..... 16,33,35,44 BCPL Applicative Expression Tree . . . . . . . . i , . . . . . . . . , . . . . ...16,18 BLOCK . .... .. .......... . . . . . . . . . . . . . . . . . . . .......39,46 CLASS ....... ...... . . 35 DEFINE ............ ...........,...,....... .......... ..........45 ENTEB .......... .,.................,40 EXIT ....,....... . . . . . . . . . . . ........................40 FOBTBAN ....................................... ............41,46 GBAIL ............ .....•,...... ......... ....... ,. ........... 3 ,27 Intermediate Beprese ntations .........-. ... .......... 10 INVOKE ...........• •........,*.....•.«...... 47 Iter a t i o n ........................ ............................ 42 JANUS . , ... .. V. vvvV. . .. v. .. . . 22 LABEL .. . . . ... . , . ... . . . ....... ......,.......v. .27 LANCE ....................... ... ..... ......................... 12 LIST .... ... 28 NODE ... .......,....... .......... ..... ..... ......,...........,27 OCODE ..... ..... ..... ........ ........ . . .... .. 16 OPCODE . ..... .. ........ .. ,. . . ... 31, 42 Parent NODE .................................. • ••;.............28 Parse Tree ......... ............. *•*•.......................9,21 Parsing .............................,............ ? PASCAL ..... ... ...........,.................,...........35-36,48 Phases • • 8 Primitive .27 PBOCEDUBE . ...... . . . ............ ......... ............... ......46 QUEST .................. ... .............. . . . 12 BEFEBENCE . . . . . . . . . . . . . . . . . . . . . . . . ......... ........ 30,33 BETUBN .........................................•.47 Boot NODE .................... . .... ... ............... ... ......27 SELECT .-^ -.V. • . . . . .. . . . . . . . .. . 43 Semantic Analysis ........................................ ..... 9 Simple Type ..................................................34 STACODE . 13 Structure ....................................................27 Subattribute .................................................29 Three-address Codes ..........................................21 Translator Writing Systems ................................... 11 Turing Machine ..........................•.................... 19 TWS ....... ........ ........... . i ...... .................... .... 11 UNCOL . . .. ...... . ............ ..... ............. ... ............ 18 VDL . . . . r . . . . . . . . . . v . . . . V...........25-26 ZCODE 16 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0076824/manifest

Comment

Related Items