UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Translator writing system for minicomputers. Madderom, Jake 1973

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1973_A7 M33_3.pdf [ 3.75MB ]
Metadata
JSON: 831-1.0101550.json
JSON-LD: 831-1.0101550-ld.json
RDF/XML (Pretty): 831-1.0101550-rdf.xml
RDF/JSON: 831-1.0101550-rdf.json
Turtle: 831-1.0101550-turtle.txt
N-Triples: 831-1.0101550-rdf-ntriples.txt
Original Record: 831-1.0101550-source.json
Full Text
831-1.0101550-fulltext.txt
Citation
831-1.0101550.ris

Full Text

A TRANSLATOR WRITING SYSTEM • FOR MINICOMPUTERS by Jake Madderom B.A.Sc, Uni v e r s i t y of B r i t i s h Columbia, 1968 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n the Department of E l e c t r i c a l Engineering We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June 1973 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f the requirements f o r an advanced degree a t the U n i v e r s i t y of B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the Head o f my Department or by h i s r e p r e s e n t a t i v e s . I t i s understood t h a t c o p y i n g or p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed w ithout my w r i t t e n p e r m i s s i o n . Department o f £ C* ' r\ ^ <£. r~ , .i The U n i v e r s i t y o f B r i t i s h Columbia Vancouver 8 , Canada ABSTRACT Some portions of real-time computer process control software can be programmed with s p e c i a l purpose h i g h - l e v e l languages. A trans-l a t o r writing'system f o r minicomputers i s developed to a i d i n w r i t i n g translators f o r those languages. The t r a n s l a t o r w r i t i n g system uses an LR(1) grammar analyzer with an LR(1) skeleton parser. XPL i s used as the source language for the semantics. An XPL to intermediate language t r a n s l a t o r has been written to aid i n the t r a n s l a t i o n of XPL programs to minicomputer assembly language. A simple macro generator must be w r i t t e n to translate intermediate language programs i n t o various minicomputer assembly languages. i TABLE OF CONTENTS Page ABSTRACT i TABLE OF CONTENTS • i i LIST OF ILLUSTRATIONS i i i GLOSSARY i v ACKNOWLEDGEMENT v i i 1. INTRODUCTION . 1 2. A TRANSLATOR WRITING SYSTEM FOR MINICOMPUTERS 4 3. GRAMMARS 8 3.1 Parsing Algorithms . . . . 9 3.2 Types of Context-free Grammars 10 4. SEMANTICS . . 12 4.1 XPL . . . . 13 4.2 B i t Data Types . . 15 5. MINCOM - AN XPL TO INTERMEDIATE LANGUAGE COMPILER . . . . 16 5.1 Intermediate Language 17 5.2 Structure of Programs Produced by MINCOM . . . 20 5.3 MINCOM 23 5.4 Constants within MINCOM , 23 5.5 Addressing '.. 26 5.6 MINCOM Data Structures 27 5.7 Code Generation . . . . . . . . 28 5.8 Constants 29 6. ANALYZERS 30 6.1 A Local Context Simple Precedence Grammar . . . 30 6.2 LR(1) Grammars 31 6.3 LR(1) Analyzer Algorithm . 39 6.4 Other Data i n Output Declarations . 41 6.5 State Algorithm , 41 6.6 Detection of Non-LR(l) Grammars 45 6.7 Skeleton Parser Variables 45 7. ERROR RECOVERY FOR THE PLALR(l) PARSER . • 46 8. RESULTS 50 9i CONCLUSIONS 55 Appendix I Example Translator Grammar 57 Appendix II XPL Grammar 59 REFERENCES 62 i i LIST OF ILLUSTRATIONS Figure Page 1.1 Ideal TWS . . 1 2.1 Block Diagram of Minicomputer TWS 5 5.1 Intermediate Language D e f i n i t i o n 18 5.2 Intermediate Language Operand Types 20 5.3 Structure of Intermediate Language Programs 21 5.4 MINCOM Source Structure 24 5.5 Constants Used to Describe Minicomputer 25 6.1 VSLR(l) Parsing Algorithm 33 6.2 A Non-SLR(l) Grammar 34 6.3 FSM For Grammar of F i g . 6.2 . . . 34 6.4 PLALR(l) Search Algorithm 37 6.5 Example cf Tables Produced by Grammar Analyzer . . . 39 6.6 LR(1) Analyzer Algorithm . . 40 6.7 Erro r Detection for LR(1) Analyzer 43 6.8 Skeleton Parser Data Structure 44 7.1 Sample Program Showing E r r o r Recovery 47 8.1 PLALR(l) Nova Memory Requirement 51 8.2 Table Sizes for Various Grammars 51 8.3 Sample Run Using Grammar of Appendix I 53-4 i i i GLOSSARY BNF - Backus-Naur Form. A metalanguage used i n w r i t i n g grammar d e f i n i -tions . Byte - Eight binary b i t s . Canonical derivation - The der i v a t i o n i n which only the r i g h t non-terminal i s replaced at each step. Canonical parse - The reverse of the sequence of phrases that were used i n the canonical d e r i v a t i o n . Configuration - The states of the LR(1) parser consist of configuration sets. A configuration i s a production with one symbol marked as a successor. Context-free phrase-structure grammar - A phrase-structure grammar having only a s i n g l e non-terminal symbol as the l e f t part of the pro-ductions of the grammar. Derivation - A der i v a t i o n of a s t r i n g i s the process of s u b s t i t u t i n g phrases f o r non-terminal symbols, s t a r t i n g with the goal symbol, u n t i l the s t r i n g i s obtained. Extended precedence grammar - Those grammars for which a unique pre-cedence r e l a t i o n s h i p e x i s t s between a l l l e g a l symbol p a i r s and the next symbol. FSM - F i n i t e State Machine. The LR(1) parser i s a f i n i t e state machine. Goal Production - The production that defines the goal symbol. Goal symbol - The non-terminal symbol that does not appear i n the r i g h t part of any production. Grammar - Set of rul e s , c a l l e d productions, that determine which s t r i n g s are v a l i d . Inadequate state (of an LR(1) parser) - States with two or more reduc-tions or states with both t r a n s i t i o n ( s ) and redu c t i o n ( s ) . Look-ahead set - A set of terminal symbols used by LR(k) parsers to make stacking decisions. LR(k) grammar - A grammar whose programs may be parsed during a s i n g l e l e f t to r i g h t scan without looking ahead more than k symbols. Macro - A s e r i e s of assembly language i n s t r u c t i o n s . Metalanguage - A language used to define another language. i v Metas'ymbols - The symbols of a metalanguage. MINCOM - The name given to the XPL to intermediate language t r a n s l a t o r w r itten f or this thesis. MSP(2,1;1,1) - Mixed Strategy Precedence(2,1;1,1) . The p a i r 2,1 r e f e r s to the two symbols i n the parse stack and the s i n g l e next symbol used to make precedence stacking decisions. The 1,1 p a i r r e f e r s to the number of symbols to the l e f t and r i g h t of a proposed reduction used to check context. Non-terminal symbol - A name given to a phrase. Nova - A Data General minicomputer. Parse - A parse of a s t r i n g indicates which phrases were used to form the s t r i n g . A l l programs must reduce to the goal symbol. Parser - The portion of a t r a n s l a t o r that recognizes s e r i e s of terminal and non-terminal symbols as phrases. Phrase - An ordered sequence of terminal and non-terminal symbols. Phrase-structure grammar - A grammar i n d i c a t i n g a l l v a l i d phrases. PLALR(l) - Parser Look-Ahead L e f t to Right one. A parser which cal c u l a t e s the look-ahead sets when needed rather than using stored look-ahead sets was written for t h i s t h e s i s . Production - A rule of a grammar. Program - A v a l i d s t r i n g . Programming language - A set of programs. Reduction - The replacement of a sequence of symbols with an equivalent non-terminal symbol. Reserved word - A terminal symbol i n a grammar d e f i n i t i o n that i s not an i d e n t i f i e r >, <string>, <number> or s p e c i a l character. Scanner - The portion of a t r a n s l a t o r that reads the input text and recognizes the terminal 'symbols. Semantic routine - The portion of a t r a n s l a t o r that does a l l the data processing of input data. Simple precedence grammar - Those grammars for which a unique precedence r e l a t i o n s h i p e x i s t s between each possible symbol p a i r . SLR(l) - Simple l e f t to Right one. A grammar i s SLR(l) i f i t i s LR(1) and a l l the look-ahead sets associated with an inadequate state are d i s j o i n t . v S t r i n g - A sequence of terminal symbols. Successor - A successor symbol i n a state ( r e f e r r i n g to the states of an LR(1) parser) i s a symbol that may follow a s t a t e . The succes-sor has associated with i t a next s t a t e . System Builder - Sp e c i a l purpose t r a n s l a t o r program. Terminal symbol - A symbol i n a grammar d e f i n i t i o n that does not appear as a l e f t part of a production. T r a n s i t i o n ( r e f e r r i n g to LR(1) parsers) - A configuration with a non-null successor symbol. TWS - Translator Writing System. UNCOL - Universal Computer Oriented Language. Vocabulary - The set of terminal symbols of a programming language. VSLR(l) grammar - Very Simple L e f t to Right one grammar. A grammar i s VSLR(l) i f i t i s LR(1) and each state has zero or one reduction. XPL - The name of a programming language. A d i a l e c t of the language PL/ 1 . v i ACKNOWLEDGEMENT I wish to express my gratitude to the people who have a s s i s t e d me during this research project. I am deeply g r a t e f u l to Dr. J.S. MacDonald, my research supervisor, f o r h i s h e l p f u l suggestions, encourage-ment and time. I appreciate the i n t e r e s t Dr. G.F. Schrack showed i n my research and thank him for h i s valuable comments. Thanks are also due to Mr. Peter Madderom for h i s help with proof reading and to Miss Norma Duggan for typing the f i n a l manuscript. A s p e c i a l thanks to my wife, Marie, f o r her encouragement and understanding during my graduate program. I am g r a t e f u l to the National Research Council of Canada and to the Un i v e r s i t y of B r i t i s h Columbia f or t h e i r f i n a n c i a l support. v i i 1. INTRODUCTION The purpose of t h i s t h e s i s i s to develop a t r a n s l a t o r w r i t i n g system (TWS) f o r minicomputers. The TWS i s to c o n s i s t of programs which run on a lar g e computer and which produce t r a n s l a t o r s w r i t t e n i n a spe-c i f i e d minicomputer machine code or assembly language. The i d e a l c o n f i g u r a t i o n f o r the TWS i s shown i n Figure 1.1. The a c t u a l c o n f i g u r a -t i o n produced f o r t h i s t h e s i s i s discussed i n Chapter 2. The o r i g i n a l concept of a TWS f o r minicomputers came out of MacDonald D e t t w i l e r and As s o c i a t e s ' (MDA) o r i g i n a l e f f o r t s w i t h a t r a n s l a t o r f o r the B.C. Te l e -phone p o r t i o n of the Northern I n t e r p r o v i n c i a l Radio System (NIPRS) [1]. The grammar f o r the B.C. Telephone t r a n s l a t o r ( c a l l e d a system b u i l d e r by MDA) i s given i n Appendix 1. <<*A| N I COMPUTER ASSEMBLY LANGUAGE DEFINITION \ LANGUAGE GRAMMAR DEFINIT ION DATA BASE FORMAT > TRANSLATOR WRIT ING SYSTEM F i g o r e 1 .1 |deaI TWS ^TRANSLATOR WRITTEN IN ASSEMBLY LANGUAGE ' The software f o r the NIPRS sup e r v i s o r y system c o n s i s t e d of two d i s t i n c t p a r t s : 1. The oper a t i n g system, c o n s i s t i n g of a s u p e r v i s o r , i n t e r r u p t 1 2 handler, device handlers , l o g i c i n t e r p r e t e r , and other b a s i c tasks. These programs were coded i n assembly language by MDA. 2. The system d e f i n i t i o n f i l e , c o n s i s t i n g of messages, d e s c r i p t i o n of input and output s i g n a l s , and l o g i c equations f o r reporting and c o n t r o l . This data was defined by B.C. Telephone personnel using the system b u i l d e r . The system b u i l d e r was a t r a n s l a t o r that,converted English statements into the system d e f i n i t i o n f i l e . The system b u i l d e r ran on the spare supervisory minicomputer. Because the input language to the S37stem b u i l d e r was very h i g h - l e v e l and was designed to correspond to the normal jargon of the telephone industry, B.C. Telephone personnel were able to program t h e i r own system i n a f r a c t i o n of the time that would have been required using a general purpose language. S p e c i a l purpose languages such as AUTRAN, BATCH, IBM PROSPRO/1800, and G.E.'s BICEPS [2] have been developed to enable users of process control systems to write t h e i r own software without having to worry about the structure of the software system or the computer hardware. These languages served b a s i c a l l y the same purpose as the B.C. Telephone system b u i l d e r but were written f o r general process c o n t r o l applications and could not take advantage of s p e c i a l industry jargon or hardware configurations. Creating a s p e c i a l h i g h - l e v e l language f o r each supervisory or c o n t r o l a p p l i c a t i o n requires a d i f f e r e n t t r a n s l a t o r (system b u i l d e r ) program f o r each a p p l i c a t i o n . The t r a n s l a t o r should run on a computer that i s r e a d i l y a v a i l a b l e to the user since most system changes require modifications to the system d e f i n i t i o n f i l e . That computer could be a spare co n t r o l minicomputer or a large data processing computer. A t r a n s l a t o r w r i t i n g system would reduce the programming time required to 3 produce s p e c i a l purpose h i g h - l e v e l language t r a n s l a t o r s . Since trans-l a t o r w r i t i n g systerns for large computers such as the IBM 360 s e r i e s e x i s t , this thesis deals mainly with a TWS f o r minicomputers. The programs that form the TWS were written to run on the IBM 360/67 a v a i l a b l e at UBC. The TWS can be used on any large IBM 360 or 370. A Data General Nova minicomputer with a four thousand word memory was used as the target computer for the sample t r a n s l a t o r s produced i n this t h e s i s . Only the portions of the tr a n s l a t o r s produced by the TWS were run on the Nova. Complete t r a n s l a t o r s require more than four thousand words of memory. The thesis i s organized as follows. Chapter 2 gives an out-l i n e of a l l the components of a minicomputer TWS. Chapter 3 presents some background material on grammars and various types of parsing algo-rithms. The h i g h - l e v e l language XPL [3] and the compiler required to tra n s l a t e XPL i n t o minicomputer assembly language i s described i n Chapters 4 and 5. Various parsing routines are t r i e d and compared i n this thesis. The grammar analyzer algorithms and t h e i r associated parsers are presented i n Chapter 6. Chapter 7 gives a b r i e f summary of some of the error recovery techniques attempted. Chapter 8 compares the table sizes f o r d i f f e r e n t grammars and discusses an example of a t r a n s l a t o r implemented on the Nova. Chapter 9 provides the conclusions and in d i c a t e s what further work i s required. 4 2. A TRANSLATOR WRITING SYSTEM FOR MINICOMPUTERS Hie structure of the t r a n s l a t o r w r i t i n g system developed i n t h i s thesis i s i d e n t i c a l to that of the XPL TWS of McKeeman et a l . [3]. XPL i s the name of a d i a l e c t of PL/1 developed by McKeeman s p e c i f i c a l l y f or t r a n s l a t o r w r i t i n g . There are four programs i n the XPL TWS. A grammar analyzer i s used to tr a n s l a t e grammars wr i t t e n i n relaxed Backus-Naur Form (BNF) [4] i n t o XPL declaration statements. These declaration statements are combined with a skeleton t r a n s l a t o r to form a parser f o r the grammar. User w r i t t e n semantic routines must be com-bined with the parser to form a complete t r a n s l a t o r . The other two programs i n the XPL TWS are XCOM, an XPL to IBM 360 machine code compiler, and *XPL, an assembly language monitor. The monitor i n t e r -faces with the operating system so that only the monitor need be changed to use the XPL TWS with any 360 operating system. A minicomputer TWS could have been produced by using the XPL TWS and changing XCOM to a minicomputer machine code compiler. That approach was not followed for two reasons. F i r s t , to be useful f o r real-time systems, a TWS must be target machine independent. A compiler would be r e s t r i c t e d to one minicomputer. Secondly, the grammar analyzer of the XPL TWS accepts a l i m i t e d class of grammars and produces tables of si z e proportional to the square of the number of symbols i n the language. An XPL to intermediate language t r a n s l a t o r was developed to make the TWS target machine independent. A macro generator must be written to convert the intermediate language to a p a r t i c u l a r mini-computer machine ' code or assembly language. A new grammar analyzer 5 < GRAMMAR GRAMMAR A N A L Y Z E R 1 X P L DATA DECLARAT IONSJ XPL DATA ] D E C L A R A T I O N S . SKELETON TRANSLATOR SEMANTICS TRANSLATOR _J XPL TO I INTERMEDIATE LANGUAGE C O M P I L E R (MlNCOM) / T R A N S L A T O R ' ~| ' WRITTEN IN * j I . L . / T R A N S L A T O R " ~| L I . L I . L . TO ASSEMBLY LANGUAGE MACRO GENERATOR I _ J "TRANSLATOR 1 WRITTEN IN A S S E M B L Y j LANGUAGE J ' A S S E M B L Y LANGUAGE MONI TOR L . U s e r w r i t t e n ~~| M a c h i n e p r o d u c e d P a r t TWS o f Figure 2.1 Block Diagram of Minicomputer TWS 6 and skeleton t r a n s l a t o r were written i n an attempt to extend the class of acceptable grammars and to reduce the s i z e of the t r a n s l a t o r tables. The language XPL was retained as the high l e v e l language for the semantic routines. A block diagram for the minicomputer t r a n s l a t o r w r i t i n g sys-tem i s given i n Figure 2.1. The skeleton t r a n s l a t o r s discussed i n this thesis c o n s i s t of three main subroutines: a parser, a scanner, and an e r r o r recovery routine. The scanner was the same f o r a l l the t r a n s l a t o r s produced. The scanner i s c a l l e d by the parser whenever the parser requires another symbol. The scanner reads the input text and searches f o r pre-defined symbols. I t i s able to recognize a l l the s p e c i a l s i n g l e chara-cters and reserved words defined i n the grammar and the symbols <iden-f i e r > , <string> and <number>. An <identifier> i s defined w i t h i n the scanner as any sequence of characters beginning with an alphabetic character and containing only alphabetic characters and numbers. A <number> i s any sequence of the characters 0 to 9. A <string> i s any sequence of characters be-ginning with an apostrophe and ending with an apostrophe or carriage return. Symbols may not be continued on the following l i n e . The symbols <identifier>, <string> and <number>are defined i n the t r a n s l a t o r at run time by i n i t i a l i z i n g v a r i a b l e s . Those v a r i a b l e s may be changed by the semantic routines. I f , for instance, at some point i n the t r a n s l a t i o n , blanks must be included within an <identifier> then a blank may be defined as an alphabetic character (leading blanks would s t i l l be ignored). The parser determines which sequence of productions must be 7 applied to reduce the input text to the f i n a l goal symbol (such as <program>). Each time the parser recognizes a production, the semantic routine associated with that production i s c a l l e d . The er r o r recovery routines developed i n t h i s thesis served two purposes. The f i r s t was to recover from syntax e r r o r s . When the parser recognizes a syntax error, some attempt must be made to parse the rest of the input c o r r e c t l y . The second purpose of the recovery routine was to enable reserved words to be used as <identifier>s. The scanner recognizes a l l reserved words. I f a reserved word i s used out of i t s correct context, the recovery routine changes the symbol number to that of an <identifier> and r e s t a r t s the parse. This technique allows a l l reserved words to be used as < i d e n t i f i e r > s i n a l l contexts where the reserved word i s i l l e g a l . 8 3. GRAMMARS Grammars expressed i n BNF are not n e c e s s a r i l y acceptable to a t r a n s l a t o r w r i t i n g system. This chapter outlines some of the types of grammars which have been used i n t h i s t h e s i s . Before grammars can be discussed, a few d e f i n i t i o n s must be given. The character sequences recognized by the scanner w i l l be c a l l e d terminal symbols or symbols. The symbols form a vocabulary. A s t r i n g i s any sequence of these symbols. A grammar i s a set of r u l e s , c a l l e d productions, that determine which s t r i n g s are v a l i d . A program i s any v a l i d s t r i n g . A programming language i s a vocabulary, a grammar, and a set of procedures which define the meaning of the rules of the grammar. An ordered group of symbols i s c a l l e d a phrase. A non-ter-minal symbol i s a name given to a phrase. A phrase may consist of non-terminal symbols as w e l l as terminal symbols. A phrase-structure grammar f o r a language indicates a l l v a l i d phrases. A context-free phrase-structure grammar i s a phrase-structure grammar having only a si n g l e non-terminal symbol as the l e f t part of the productions of the grammar. The term "context-free" i s a misnomer. It does not mean that the symbols forming a s t r i n g do not change meaning with context. I t only indicates that, beginning with a unique non-terminal c a l l e d a goal symbol, s u b s t i t u t i n g any phrase for a non-terminal symbol u n t i l there are no more non-terminal symbols w i l l r e s u l t i n a program. » The phrase that may be substituted f o r a non-terminal i s not dependent on the symbols surrounding the non-terminal. 9 There must be only one production i n a grammar with a l e f t part that does not appear i n any other production. That production i s c a l l e d the goal production. The r i g h t part of the goal production must contain at l e a s t one non-terminal symbol otherwise the language w i l l c onsist of a si n g l e s t r i n g . A derivation of a s t r i n g i s the process of s u b s t i t u t i n g phrases for non-terminals. A canonical d e r i v a t i o n i s a d e r i v a t i o n i n which only the rightmost non-terminal i s replaced at each step. A canonical parse of a s t r i n g i s the reverse of the sequence of productions that were used i n the canonical d e r i v a t i o n . A parse of a s t r i n g i n d i c a t e s which productions were used to produce the s t r i n g . 3.1 Parsing Algorithms One way of implementing parsing algorithms i s through the use of parse stacks and constant tables. A one-pass parser uses the i n -formation i n the tables, the information i n the parse stack and the next k symbols from the input text to determine how to proceed with the parse. The parse stacks contain a past h i s t o r y of the parse and the tables describe the grammar. The parser does one of the following: 1. I t can reduce the stack. A reduction i s the replacement of a sequence of symbols on the parse stack with a non-terminal symbol. I f i t makes a reduction then a production has been recognized and a semantic routine associated with that production i s executed. 2. I t can recognize that an e r r o r has occurred and can make some attempt at recovery. 3- I t can stack the next symbol. 4. I t can ask for another symbol from the scanner before doing one of the other three. 10 3.2 Types of Context-free Grammars This thesis discusses simple precedence grammars, extended precedence grammars and l e f t to x _ight one (LR(1)) grammars. A simple precedence grammar i s one for which a unique precedence r e l a t i o n s h i p e x i s t s between each possible symbol p a i r . An extended precedence gram-mar i s one for which a unique precedence r e l a t i o n s h i p e x i s t s between a l l l e g a l symbol pa i r s and the next symbol. An LR(1) grammar i s one whose programs may be parsed during a s i n g l e l e f t to r i g h t scan without looking ahead more than one symbol. The simple precedence grammar parsers use only the top parse stack symbol and the next input symbol to decide whether to reduce, stack, or c a l l an er r o r routine. A simple two dimensional array i s s u f f i c i e n t to ind i c a t e one of the above three actions for 'every symbol p a i r i n the grammar. The decision to reduce alone i s not enough. The parser must determine which production i s a p p l i c a b l e . With the prece-dence grammars, that can be accomplished by comparing the top symbols i n the stack with the r i g h t parts of the productions. The extended precedence grammar parsers use the top too symbols on the parse stack with the next input symbol to make the reduce, stack, or error decision. To store a l l possible t r i p l e combination of symbols requires a three-dimensional array that i s usually quite sparse because most t r i p l e s cannot occur i n p r a c t i c a l languages. McKeeman et a l . [3] reduced the dimension of the array required by adding a chec k - t r i p l e type to the two-dimensional simple precedence array. Only l e g a l t r i p l e s need be stored or checked. The above two grammars severely l i m i t the way the BNF 11 productions can be w r i t t e n . Very l i t t l e context i s a c t u a l l y used i n the parse. One extension to the precedence approach was made by McKeeman et a l . [3]. Their Mixed Strategy Precedent Grammar (MSP(2,1;1,1)) uses the two symbols on the stack top with the next input symbol .(2,1) to make most parsing decisions. The (1,1) p a i r r e f e r s to a s i n g l e symbol to the l e f t or r i g h t of a production checked when l e f t or r i g h t context i s required. LR(1) grammars form a more general class of grammars than the simple and extended precedence grammars. A l l simple and extended pre-cedence grammars are LR(1). Knuth [5] showed that a parsing algorithm for an LR(k) grammar i s i n one of N states during a parse. Each state contains information r e l a t i n g the next input symbol to a following state or a reduction. Reductions force the parser into a previous s t a t e . Chapter 6 describes how the above three types of grammars were used to implement parsers for the minicomputer TWS. 12 4. SEMANTICS Chapter 3 described v a r i o u s parsers that can be used f o r p h r a s e - s t r u c t u r e grammars. The p a r s e r , combined w i t h a scanner, accepts a user i n p u t program and recognizes when to apply the productions of the grammar. The parser does not know the meaning a s s o c i a t e d w i t h each p r o d u c t i o n . The parser accepts symbols from the scanner and stacks the symbols u n t i l a production i s recognized. I t then c a l l s a semantic r o u t i n e a s s o c i a t e d w i t h the production before reducing the stack. The semantics must a s s o c i a t e a s e r i e s of machine language i n s t r u c t i o n s w i t h each pro d u c t i o n . The changes that these machine language i n s t r u c t i o n s make to any output medium th a t the t r a n s -l a t o r uses (magnetic tape, d i s c , core b u f f e r , t e l e t y p e e t c ) , d e f i n e the semantics of a p r o d u c t i o n . Expressed i n t h i s way, semantics do not i n d i c a t e some a b s t r a c t meaning, they only i n d i c a t e a c t i o n s that must be performed by the computer when a p r o d u c t i o n i s recognized by the parser. One way to i n d i c a t e the semantics of a p r o d u c t i o n would be to express the semantics as a s e r i e s of machine i n s t r u c t i o n s . That would produce an e f f i c i e n t t r a n s l a t o r but would r e q u i r e a l a r g e amount of programming time. I t would a l s o f o r c e a complete r e w r i t e of the semantic r o u t i n e s every time a t r a n s l a t o r was r e q u i r e d f o r a d i f f e r e n t computer. 13 Since the t r a n s l a t o r i s not a real-time program, the execution e f f i c i e n c y i s not. an important consideration, and a h i g h - l e v e l program-ming language may be used f o r semantic routines. FORTRAN was an obvious choice for minicomputers because i t has been implemented on most a v a i l -able machines. But FORTRAN does not have the s t r i n g handling c a p a b i l i t i e s or the data types that must be used i n manipulating a complex data base. FORTRAN was not used because a language with a l l the required features e x i s t e d . 4.1 XPL McKeeman et a l [3] defined XPL, a d i a l e c t of PL/1, in' order to create a language that could be used for w r i t i n g t r a n s l a t o r s . Since. XPL contains a l l of the features required f o r data manipulation, XPL was used as the high l e v e l language f o r the semantic routines. The de-c i s i o n to use a language other than FORTRAN made i t necessary to i n c o r -porate a compiler i n t o the TWS. That compiler had to tr a n s l a t e XPL programs i n t o the assembly language of a target minicomputer. A two stage t r a n s l a t i o n process was used. The program MINCOM was wri t t e n to translate XPL programs to an intermediate language. McKeeman's © t r a n s l a t o r w r i t i n g system was used to produce MINCOM. A macro generator was used to tr a n s l a t e the intermediate language to a minicomputer assem-b l y language. The complete grammar for XPL as used i n t h i s thesis i s given i n Appendix I I . Two changes were made to McKeeman's d e f i n i t i o n i n order to decrease the storage requirements of the t r a n s l a t o r . Dynamic 14 storage a l l o c a t i o n was used to implement v a r i a b l e l e n g t h c h a r a c t e r s t r i n g s on the IBM 360. Although dynamic storage a l l o c a t i o n does make e f f i c i e n t use of core, the r o u t i n e s used to a s s i g n and r e l e a s e core occupy a s i g n i f i c a n t amount of space. The COMPACTIFY r o u t i n e of the XPL s k e l e t o n t r a n s l a t o r r e q u i r e s e i g h t hundred bytes of memory. The XPL programs c a l l a monitor f o r a l l i n p u t , output and s t r i n g manipulation. The monitor w r i t t e n f o r the Nova computer occu-p i e d about seven hundred 16 b i t words. The monitor must be w r i t t e n i n assembly language and must be r e w r i t t e n every time XPL i s implemented on a new computer. Since the monitor does a l l s t r i n g m a n i p u l a t i o n , the dynamic s t r i n g assignment ro u t i n e s would have to be p a r t of the monitor and would have to be w r i t t e n i n assembly language. Dynamic s t r i n g a l l o c a t i o n would f o r c e more interdependence between the v a r i o u s s t r i n g manipulation r o u t i n e s and make them more d i f f i c u l t to w r i t e . Dynamic s t r i n g a l l o c a t i o n i s not p a r t of the XPL d e f i n i t i o n . I t i s only the way v a r i a b l e length s t r i n g s were implemented on the IBM 360. V a r i a b l e l e n g t h . s t r i n g s are e s s e n t i a l f o r easy s t r i n g m anipulation and must be implemented. One s o l u t i o n would be to a s s i g n a maximum b u f f e r area f o r each s t r i n g , but t h a t would r e q u i r e too much memory space. A change was made to the XPL grammar to a l l o w a maximum len g t h to be s p e c i f i e d f o r any s t r i n g . I f no l e n g t h i s s p e c i f i e d , a d e f a u l t l e n g t h i s assumed. MINCOM allows any l e n g t h to be s p e c i f i e d . S t r i n g s longer than 256 characters may be used to s t o r e s t r i n g constants. The i n d i v i d u a l constants may be accessed w i t h a s u b s t r i n g r o u t i n e . One other change was made to the XPL grammar to allow more e f f i c i e n t use of subroutine v a r i a b l e storage. The IBM 360 implementation assigned permanent storage to a l l b i t and f i x e d data types d e c l a r e d w i t h i n a 15 procedure. Since not a l l procedures c a l l a l l other procedures, that storage could be shared. The MINCOM parser uses only a s i n g l e l e f t to r i g h t scan on the inp u t program and cannot determine which procedures are going to be c a l l e d from w i t h i n the current procedure. MINCOM must know which procedures are to be c a l l e d i n order to ass i g n storage that w i l l not be used by those procedures. The data type NESTED was added to the grammar to convey the above i n f o r m a t i o n to MINCOM. The semantic r o u t i n e s are a s s o c i a t e d w i t h the appropriate p r o d u c t i o n v i a a CASE statement. The p a r s e r , when i t has determined th a t a re d u c t i o n i s necessary, c a l l s the procedure SYNTHESIZE w i t h the production number as a parameter. C o n t r o l i s returned to the par s e r when the appropriate semantic r o u t i n e i s f i n i s h e d . 4.2 B i t Data Types MINCOM allows the length of the data type BIT to vary i n leng t h from one to the word le n g t h of the minicomputer. S i n g l e BIT v a r i a b l e s are assigned a f u l l word of memory. B i t arrays are assigned only as much storage as i s r e q u i r e d . For example, the statement DECLARE A(15) B I T ( l ) ; would cause a s i n g l e 16 b i t word to be assigned to the array A. 16 5. MINCOM - AN XPL TO INTERMEDIATE LANGUAGE. COMPILER In June 1958, System Development Corporation s t a r t e d to i n -vestigate procedure-oriented programming language concept for real-time applications [7], They developed JOVIAL - a programming language for real-time command systems. They used a Generator (written i n JOVIAL) to t r a n s l a t e a JOVIAL program i n t o a Universal Computer Oriented Language (UNCOL). A Translator (also written i n JOVIAL) was used to transform UNCOL int o assembler. Only the t r a n s l a t o r needed to be changed i n order to implement JOVIAL on another computer. A s i m i l a r approach as that used i n the implementation of JOVIAL was used i n th i s t h e s i s . MINCOM i s a compiler that translates an XPL program into an intermediate language. The intermediate l a n -guage was designed to be we l l suited f o r the t r a n s l a t i o n of XPL to minicomputer assembly languages. MINCOM uses a set of constants which describes the target minicomputer assembly languages. Constraints such as word-length, op-code types, number of general r e g i s t e r s , number of index r e g i s t e r s and types of addressing may be changed by changing constants i n the MINCOM source. The intermediate language to assembly language t r a n s l a t o r i s a very simple macro generator. There have been other attempts to define a Univ e r s a l Computer Oriented Language (UNCOL) on an assembly language l e v e l [8]. Such-a d e f i n i t i o n would enable a l l programs to be expressed i n UNCOL and would eliminate the reprogramming costs associated with changing com-puters. This approach has not been succ e s s f u l i n the past because a UNCOL could never incorporate a l l the s p e c i a l i n s t r u c t i o n s of a l l computers. 17 UNCOL programs could not make e f f i c i e n t use of any p a r t i c u l a r computer. The TWS described i n th i s thesis i s meant to create programs to be run on spare control computers. The memory s i z e may be as small as eight thousand words but the program execution e f f i c i e n c y need not be optimum. The concept of UNCOL has been implemented but the language used has been termed an intermediate language rather than a Universal Computer Oriented Language because intermediate language programs can be used only i f the execution speed of that program i s of minor con-cern and the source language i s XPL. There i s no u n i v e r s a l use im-p l i e d . The term minicomputer needs some explanation. MINCOM has been w r i t t e n to minimize the s i z e of the intermediate language program. Only f o r minicomputers i s the cost of memory s i z e more important than the execution time of the program. Tne intermediate language has been defined to make use of a very l i m i t e d i n s t r u c t i o n set. Only f o r com-puters with a short word length (and a small i n s t r u c t i o n set) w i l l rea-sonably e f f i c i e n t programs r e s u l t . The intermediate language i s res-t r i c t e d to si n g l e address i n s t r u c t i o n s therefore no use can be made of any double address i n s t r u c t i o n s a v a i l a b l e on most large computers (and also on some minicomputers). 5 .1 Intermediate Language The intermediate language was influenced by the XPL language and by common minicomputer i n s t r u c t i o n sets. Most minicomputers use only f i x e d data types ( f u l l word integers) therefore MINCOM should ex-press a l l character and b i t operations i n terms of integer operations. Only those operations used by XPL were considered f o r the intermediate 1 8 O P E R A T I O N S E M A N T I C S 1 . A d d O P 1 <~- O P 1 + O P 2 2 . S u b t r a c t OPT O P 1 - O P 2 3 . A n d O P 1 O P 1 & O P 2 4 . O r O P 1 « ~ O P 1 I O P 2 C O M P A R I S O N S E M A N T I C S ' 5 . 0P1 = 0 P 2 0 P 1 =-1 i f t r u e e I s e O P I = 0 6 . 0 P 1 < 0 P 2 0 P 1 =-1 i f t r u e e I s e 0 P 1 = 0 7 . 0 P 1 > 0 P 2 O P I = - i i f t r u e e I s e 0 P 1 = 0 8 . 0 P 1 - » = 0 P 2 0 P 1 = - 1 i f t r u e e I s e O P I = 0 9 . 0 P 1 ~<< 0 P 2 0 P 1 = - 1 i f t r u e e I s e 0 P 1 ^ 0 1 0 . 0 P 1 > 0 P 2 0P1 =-1 i f t r u e e I s e 0 P 1 = 0 M O N A D I C O P E R A T I O N S S E M A N T I C S 1 1 , 1 2 , 1 3 . 1 4 . 1 5 . , N o t 0 P 1 , N e g a t e O P 1 D A T A M O V E M E N T L o a d r e g i s t e r S t o r e r e g i s t e r M o v e r e g i s t e r C O N T R O L I N S T R U C T I O N S 0 P 1 <s- - O P 1 - 1 0 P 1 « ~ - 0 P 1 . S E M A N T I C S 0 P 1 M e m o r y M e m o r y O P I 0 P 1 < - O P 2 i o. 1 7 . 1 8 . 1 9 . 2 0 . 2 1 . 2 2 . 2 3 . B r a n c h t o 0 P 2 B r a n c h t o 0 P 2 B r a n c h t o 0 P 2 B r a n c h t o 0 P 2 B r a n c h t o 0 P 2 B r a n c h t o 0 P 2 f 0 P 1 f 0P1 f 0 P 1 f 0 P 1 f 0 P 1 = 0 < 0 > 0 •< 0 ' > 0 f 0P1 ^ = 0 B r a n c h u n c o n d i t i o n a l l y t o 0 P 2 B r a n c h t o s u b r o u t i n e 0 P 2 2 4 . E n t e r M a c r o 2 5 . E x i t M a c r o D A T A D E F I N I T I O N 2 6 . C h a r a c t e r d a t a . 2 7 . F i x e d d a t a 2 8 . L a b e l d e f i n a t i o n 2 9 . A d d r e s s c o n s t a n t 3 0 . A s s i g n b l o c k o f s t o r a g e P S E U D O - O P E R A T I O N S 3 1 . S e t l o c a t i o n c o u n t e r F i g u r e 5 . 1 I n t e r m e d i a t e L a n g u a g e D e f i n i t i o n 19 language. The operations i n c l u d e d i n the intermediate language as des-c r i b e d i n [9] are given i n Figure 5.1. The XPL input/output f u n c t i o n s INPUT(N), OUTPUT(N), and FILE(N,M) were implemented w i t h branches to the assembly language sub-r o u t i n e MONITOR. The only non-character operations used i n XPL but not in c l u d e d i n the intermediate language were m u l t i p l y , d i v i d e , mod (remainder), s h i f t r i g h t and s h i f t l e f t . The Data General Nova com-puter that was used to t e s t the TWS d i d not have the hardware m u l t i p l y or d i v i d e o p t i o n . I t was f e l t t h a t even i f a minicomputer had a hard-ware m u l t i p l y o r d i v i d e , a c a l l to the monitor f o r m u l t i p l i c a t i o n and d i v i s i o n would add very l i t t l e e x t r a overhead s i n c e s p e c i a l r e g i s t e r s or memory l o c a t i o n s would have to be loaded i n any case. The s h i f t r i g h t and s h i f t l e f t operations could have been i n c l u d e d i n the intermediate language. They were l e f t out of the i n t e r -mediate language (and were implemented as monitor c a l l s ) f o r two reasons. For the Nova, the macros were f i v e i n s t r u c t i o n s long whereas the moni-t o r c a l l i n c l u d e d only three i n s t r u c t i o n s . I n c l u d i n g SHR and SHL as monitor c a l l s meant that a l l language defined f u n c t i o n s could be t r e a t e d as monitor c a l l s w i t h i n MINCOM. The cha r a c t e r f u n c t i o n s SUBSTR(S,N,M), || (catenate) and BYTE(S,N) would have r e q u i r e d very lengthy macros had they been i n c l u d e d i n the intermediate language. The only other f u n c t i o n s r e q u i r e d by XPL and i n c l u d e d i n the monitor were s t r i n g comparison, s t r i n g assignment, and i n t e g e r to character s t r i n g conversion. The intermediate language as described above has not def i n e d the operands. The operands are defined w i t h i n MINCOM. These 20 d e f i n i t i o n s may be changed by changing constants w i t h i n MINCOM. The types of operands that may be s p e c i f i e d are given i n Figure 5.2. 0P1 0P2 0. R e g i s t e r R e g i s t e r 1. R e g i s t e r Memory 2. R e g i s t e r R e g i s t e r o r R e g i s t e r Memory 3. R e g i s t e r u n d e f i n e d 4. not used 5. u n d e f i n e d R e g i s t e r 6. u n d e f i n e d M,emory 7. u n d e f i n e d u n d e f i n e d F i g u r e 5.2 I n t e r m e d i a t e Language Operand Types MINCOM assumes that r e s u l t s . a r e l e f t i n 0P1 and that 0P2 ( i f i t i s a reg i s t e r ) i s free a f t e r an operation. MINCOM does a l l the address c a l c u l a t i o n s and indicates i n d i r e c t addressing for Memory Reference i n s t r u c t i o n s i f i n d i r e c t addressing i s required. 5.2 Structure of Programs Produced by MINCOM XPL variables are known only within t h e i r own scope and within higher scopes. Level zero has been assigned to the glo b a l scope, l e v e l one to a scope nested within l e v e l zero, level'.two to a scope nested within l e v e l one and so on. Only some of the XPL phrases i n d i c a t e the s t a r t of a new l e v e l . The following i s a l i s t of phrases which, when recognized by the parser, cause the l e v e l number to be incremented. The corresponding scope ending i s also l i s t e d . 21 AMDN I T O R L a b e l 30 L a b e l 32 L a b e l 31 L a b e l 1+N 1£N<27 L a b e l 1 L a b e l 3 4 L a b e l 33 S T R I N G V A R I A B L E S AND A R R A Y S S T R I N G C O N S T A N T S AND L E V E L Z E R O S T R I N G V A R I A B L E S , F I X E D A R R A Y S , AND B I T A R R A Y S F I X E D V A R I A B L E S F I X E D C O N S T A N T S F I X E D V A R I A B L E S F I X E D C O N S T A N T S P R O G R A M V A R I A B L E A N D C O N S T A N T S P A C E P R O G R A M V A R I A B L E AND C O N S T A N T S P A C E P R O G R A M V A R I A B L E S C O N S T A N T S AND L E V E L Z E R O F I X E D V A R I A B L E S M O N I T O R S P A C E D I R E C T L Y A D D R E S S A B L E MEMORY D l R E C T L Y A D D R E S S A B L E MEMORY P R O G R A M A R E A P A G E Z E R O F i g u r e 5 . 3 S t r u c t u r e o f I n t e r m e d i a t e L a n g u a g e P r o g r a m s 22 Level incremented A f t e r Decremented A f t e r <procedure name> <ending> <if clause> <if statement> <"group head> <ending> A l l labels are defined on l e v e l zero and can be used at any l e v e l . Procedures must be defined before they can be c a l l e d . An intermediate language program i s structured as i n Figure 5.3. A small area of memory that can be addressed from anywhere wit h i n the program must be av a i l a b l e for passing parameters to the monitor. Pages of gl o b a l l y addressable memory are defined i n MINCOM. This memory i s used f o r f i x e d constants, address constants, user defined f i x e d v a r i a b l e s and machine defined temporary v a r i a b l e s . Constants and va r i a b l e s are assigned space within the program area when MINCOM has used a l l the a v a i l a b l e d i r e c t l y addressable memory. Variables defined within the program area might be d i r e c t l y or i n d i r e c t l y addressed depending on t h e i r distance from the va r i a b l e reference. As an example of the above method of memory assignment con-s i d e r the Nova. Most of page zero may be used as d i r e c t l y addressable memory. Registers 2 and 3 may be loaded with pointers to two other pages therefore almost 256 x 3 = 768 words of g l o b a l l y addressable memory i s a v a i l a b l e . The sample parser produced for t h i s thesis occupied approximately three thousand words of memory but used les s than 200 words of f i x e d v a r i a b l e s , constants and address constants. A l l s t r i n g constants and l e v e l zero s t r i n g v a r i a b l e s , f i x e d arrays and b i t arrays are assigned to the area beginning with l a b e l 31. Non l e v e l zero s t r i n g v a r i a b l e s , f i x e d arrays and b i t arrays are assign-ed to the area beginning with l a b e l 32. A l l v a r i a b l e s within areas 31 and 32 are referenced i n d i r e c t l y and require an address constant i n 23 one of the addressable areas. Area 32 i s reassigned as the l e v e l changes, t h e r e f o r e v a r i a b l e s assigned i n area 32 cannot be i n i t i a l i z e d . Only l e v e l zero v a r i a b l e s may be given i n i t i a l v a lues. 5.3 MINCOM The XPL to intermediate language t r a n s l a t o r was modelled a f t e r XCOM, the XPL to IBM 360 machine code compiler of the XPL TWS. The XPL grammar was modifi e d as p r e v i o u s l y described then was i n p u t to the grammar analyzer of the XPL system. The scanner r o u t i n e i n the sk e l e t o n parser was modifi e d to recognize s t r i n g s , XPL comments and XPL l i t e r a l s . Only the semantic procedure SYNTHESIZE was completely r e -w r i t t e n . The t a b l e s produced by the XPL analy z e r were combined w i t h the new s k e l e t o n parser and the new SYNTHESIZE r o u t i n e to form an XPL source program. This program was compiled using XCOM to produce an object code MINCOM. The s t r u c t u r e of the source program f o r MINCOM i s shown i n Figure 5.4. 5.4 Constants w i t h i n MINCOM MINCOM was w r i t t e n so that only a very simple macro generator would be r e q u i r e d to t r a n s l a t e the inte r m e d i a t e language code i n t o a p a r t i c u l a r minicomputer assembly language. To accomplish that g o a l , the minicomputer must be described w i t h i n MINCOM. That i s done by changing constants w i t h i n MINCOM. The constants used f o r the Nova minicomputer are shown i n Figure 5.5. The intermediate language i n c l u d e s t h i r t y - o n e i n s t r u c t i o n s . Those i n s t r u c t i o n s cannot be changed but the types of operands can. The operands may be r e g i s t e r s or memory as s p e c i f i e d i n Se c t i o n 5.1. The MINCOM v a r i a b l e INST-TYPE contains the 24 TABLES/A I N MINI SPEC STARTMIN SOURCEMIN SYNTHMIN ENOMIN P r o d u c e d by XPL TWS Grammar A n a l y z e r I n i t i a l i z e d v a r i a b l e s u s e d t o d e s c r i b e t a r g e t m i n i c o m p u t e r S t a r t o f S k e l e t o n Comp t i e r S e m a n t i c r o u t i n e s End o f S k e l e t o n CompI I e r F i g u r e 5.4 MINCOM S o u r c e S t r u c t u r e operand type i n f o r m a t i o n (see P i g 5.2) and may be changed by r e p l a c i n g the i n i t i a l l i s t . The array INST-LENGTH contains the length of each type of i n s t r u c t i o n i n minicomputer words. MINCOM must know what length macro w i l l be generated f o r each i n s t r u c t i o n type i n order to deter-mine when i n d i r e c t addressing i s r e q u i r e d . The numbers of the s c r a t c h r e g i s t e r s are s t o r e d i n the array SCRATCH-REG//. Up to s i x t e e n s c r a t c h r e g i s t e r s may be s p e c i f i e d . MINCOM passes parameters to the monitor v i a a v a i l a b l e r e g i s t e r s and page zero memory l o c a t i o n s . The MON-R-STACK array contains the i d e n t i f i c a t i o n numbers of the r e g i s t e r s which can be used to pass parameters to the monitor. The content of r e g i s t e r zero was saved i f necessary but any other r e g i s t e r s were assumed f r e e . R e g i s t e r 2 was used i n the Nova implementation when programming examples showed that only about two 25 /• B E G I N F I L E <»imsi;F.cs' */ /i A L L A H R A Y 3 B E G I N W I T H I N D E X Z E R O » / . . . . D E C L A R E wOPO_LF»f.Tn L I T E R A L L Y I 1 b I ) _ _ /• H I T H O B O L E N G T H O F T H E M I N I C O M P U T E R » / " " D E C L A R E P A G E - S I Z E L I T E R A L L Y l ? 5 6 l | /* P A G E S I Z E O F T H E M I N I C O M P U T E R « / "~ " ~ " ~ " " "" ; " C E C L A R E O I R _ A D O R F S S ( ; ; ) F I X p 0 I 41 T I A L ( 3 ? , 255 , 0 ) I /« n i » E C T L V A D D R E S S A B L E S P A C E . ' T H E F I R . 1 T T 0 ' N U M B F R 3 " I N O 1C A T E" '3 T AR t'' AMD"' ' S T O P A D D R E S S E S or F I P S T B L O C K , T « r F O L L O W I N G N U X H E P IfcOICATE3 * B L O C K O F D I R E C T L Y A D D R E S S A B L E M E M O R Y , T H E F I R S T A O O » E S S I S A S S I G N E D L A B E L L 3 3 , A N Y A D D I T I O N A L P A G E S A K E L A B E L L E D W I T H A L A B E L G R E A T E R T H A U L 3 3 , */ " D E C L A R E N O _ O F „ I N D E X _ R E G L I T E R A L L Y M M " /* W I « B F K OF I K O E X R E G I S T E R S ' , " THE "c'O'MP I l £ R " A S S U M E S "Y'HAf'~E"ACH""TNOEX R E G I S T E R MILL BE L O A D E D WITH A P O I N T E R T O A P A G E OF M E M O R Y , PAGES A R E L A B E L L E D L l , L 2 , _ . , _ . _ , L 2 _ 9 . *_/ D E C L A R E N O _ O F _ S C R A T C H _ R E G L I T E R A L L Y 1 2 ' I /* N U M B E R S C R A T C H R E G I S T E R S */ D E C L A R E N O _ O F _ P A R _ R E G L I T E R A L L Y M l I /* N U M B E R O F R E G I S T E R S U S E D F O R P A S S I N G P A R A M E T E R S T O T H E M O N I T O R */ /* " ( N U M B E R O F R E G I S T E R S I N ' A D D I T I O N " T O T H E " F I R S Y ' S C R A T C H " R E G I 3 TER""")""*/''"" D E C L A R E I N D E X _ R E G _ S T A C K ( 0 ) F I X E D I N I T I A L ( 3 ) | 7 * _ T H E ~ A C T U 4 L ~ I N D E X ~ R E G I ' S r t R ~ ~ W U M 3 e R S ~ « 7 D E C L A R E S C R A T C H - R E G ( 1 ) F I X E D I N I T I A L ( 0 , I ) I '/» U S E D T O S T O R E ' T H E " T R E E " S C R A T C H R E G I S T E R S " ~ * 7 D E C L A R E S C R A T C H _ R E G / ' ( 1 ) F I X E D I N I T I A L ( 0 , 1 ) | / " * " " U S ' £ 0 " T 3 ^ T ~ P " E R M " " A N " E N T " " R E C " 0 ' R D ' ^ ^ D E C L A R E M O N _ R „ S T A C X ( 0 ) F I X E D I N I T I A L ( 2 ) I " / * A L I S T O F R E G I S T E R S T H A T M A Y 8 E " U S E D "ToR"P A S S I M G " P kR A M E T E R 3 " T 0 " " T H E " M O N I T O R */ DrccTR"£n5"Rwro"us::^  /* A C T U A L A D D R E S S O F L O W E R P A G E B O U N D A R Y */ D E C L A R E N E X T _ P A G E _ B O U N D F I X E D I N I T I A L ( 3 3 0 ) t /* A C T U A L A D D R E S S O F N E X T P A G E B O U N D A R Y */ W C T » ^ I ^ ^ X T ^ T N S T . ^ O L I N T E R F T X E O " " I N I f li'L" (S5JT /* S T A R T I N G A D D R E S S */ D E C L A R E P A G I N G _ I N C R E M E N T L I T E R A L L Y M l I /* D E T E R M I N E S T H E N U M B E R O F I N S T R U C T I O N S T O 8 E A D D E D B E F O R E T H E P A G E B O U N D A R I E S A R E M O V E D */ D E C L A R E I N S T „ L E N G T H ( 3 1 ) B I T ( 3 ) I N I T I A L . . . ( 0 , 1 . 1 . 1 ' " . . 5 . 5 , 5 , 5 , 5 ^ 5 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 2 , 2 , 1 , 1 , 7 , f e , 2 , 1 , 0 , 1 , 0 , 0 ) ) D E C L A R E T N S T _ T Y P E ( 3 1 ) B I T ( 8 ) I N I T I A L ( 0 , 0 , 0 i_0_, 0 , 0 ,J , 0 , 0 , 0_j_0_, 3_,_3_, 1 , 1 , 0 , 3 , 3 , 3 . 3 , 3 , 3 , 7 , 7 , 7 , 7 ,Jj 7 , 7 , 7 , 7 , 7 ) 1 D E C L A R E L E V E L - O N E L I T E R A L L Y ' 2 0 0 ' i /* S T A R T I N G L O C A T I O N F O R N O N - L E V E L Z E R O S Y M B O L T A B L E « / D E C L A R E S T S L I T E R A L L Y > U 0 0 M /* S Y M B O L T A B L E S I Z E */ D E C L A R E F C S _ S I Z E L I T E R A L L Y I S O O ' j /* S P A C E I N T H E C O M P I L E R F O R T H E F I X E D C O N S T A N T S A N D A D D R E S S C O N S T A N T S */ D E C L A R E D A T A _ P A G E S 3 L I T E R A L L Y i 6 0 1 I /* D E T E R M I N E S S I Z E O F A R R A Y U S E O T O K E E P T R A C K O F M I N I C O M P U T E R „ S T 0 R 4 C E -spicf-,/ D E C L A R E F C L I M L I T E R A L L Y M O O ' i / « S I Z E O F T H E F I X - U P S T A C K - * / D E C L A R E N O _ O F _ R R O C E D U R E S L I T E R A L L Y 1501 I / • " ' N U M B E R S O F P R O C E D U R E S A L L O W E D " * / S ? P I N G _ L £ K G T H i P R O C E D U R E ( L E N ) j /• U S E " T O C A L C U L A T E T H E N U M B E R O F W O R D S U S E D T O S T O R E A C H A R A C T E R S T R I N G "" C - f L E N G T H L E N C H A R A C T E R S ( L E N = 1 , 2 , , . . ) */ D E C L A R E L E N P I X E D I " R E T U R N ( L E N * 1 ) / 2 • 1 I E N D S T R I N G _ L E N G T H | '"' ' /* E N D F I L E I H I N I S P E C 3 ' * / " ' ' " ' " ' F i g u r e 5.5 Constants Used to D e s c r i b e Minicomputer 26 hundred locations were required for constants and f i x e d v a r i a b l e s . For computers with more than one r e g i s t e r a v a i l a b l e f or passing parameters to the monitor, the method used i s quite e f f i c i e n t but computers with a si n g l e accumulator require too i n s t r u c t i o n s per parameter. A simple change to MINCOM can be made to pass parameters v i a address constants following the monitor c a l l . MINCOM was not w r i t t e n to pass parameters as address constants because XPL allows expressions to be used as para-meters. The value of an expression i s normally l e f t i n a r e g i s t e r . I f address constants were used to pass parameters, the r e g i s t e r would have to be stored. 5.5 Addressing Variables and labels may be d i r e c t l y addressed, i n d i r e c t l y addressed, or addressed v i a index r e g i s t e r s . MINCOM was w r i t t e n for computers capable of i n d i r e c t addressing. A change to the character s t r i n g and array addressing routine would have to be made i f only index r e g i s t e r plus displacement type addressing was a v a i l a b l e . A l l constants and variables stored i n the predefined areas are d i r e c t l y addressed. A l l f i x e d and b i t arrays and a l l character s t r i n g s are i n d i r e c t l y addressed. Labels are d i r e c t l y addressed only i f they are on the current page. MINCOM uses the v a r i a b l e s PREVIOUS-PAGE-BOUND and NEXT-PAGE-BOUND as the l i m i t s f o r the current page. These variables must be i n i t i a l i z e d to the proper memory location s f or the f i r s t program page. Relocatable programs may be obtained by chan-ging absolute locations to labels with the macro generator. The v a r i a b l e NEXT-INSTRUCTION-COUNTER must be i n i t i a l i z e d to the f i r s t program memory l o c a t i o n . Each i n s t r u c t i o n that i s added to the program 27 increases the NEXT-INSTRUCTION-COUNTER by the length of the i n s t r u c t i o n . The page boundaries are moved when the PAGE-POSITION-COUNTER becomes equal to or greater than the PAGING-INCREMENT. The page boun-daries are moved by an integer number of paging increments. A f l o a t i n g page such as that employed i n the Nova was s p e c i f i e d by s e t t i n g the PAGING-INCREMENT equal to one. The f i x e d page used i n the PDP/8 may be s p e c i f i e d by s e t t i n g the PAGING-INCREMENT to the fi x e d page s i z e ( 1 2 8 1 Q ) . 5.6 MINCOM Data Structures McKeeman et a l . [3] indic a t e d that- for large programs, the la r g e s t s i n g l e cost of t r a n s l a t i o n using t h e i r compiler i s the cost of symbol look-up. They used a stack f o r t h e i r symbol t a b l e . As new vari a b l e s were declared on a l e v e l , they were added to the top of the stack. Symbol look-up was accomplished by s t a r t i n g at the top of the stack and searching down to l e v e l zero. This technique allowed v a r i a b l e s with the same name but declared on d i f f e r e n t l e v e l s to be distin g u i s h e d . Any reference to a v a r i a b l e was assumed to be a reference to the v a r i a b l e on the clos e s t l e v e l . In order to reduce the time spent i n symbol look-up, MINCOM uses a binary tree to store the symbols and r e l a t e d information. The average length of a binary tree search i s approximately log2(n) where n i s the number of symbols i n the tree. The average length of a l i n e a r search i s n/2. Assuming that two comparisons are required f o r each step of the binary search, the average binary search requires fewer com-parisons when n > 16. There w i l l normally be more than sixteen symbols i n the symbol table since a l l machine generated l a b e l s are entered. A 28 s i n g l e binary tree was not used because symbols must be removed from the symbol table as the l e v e l i s decreased. A s e r i e s of stacked binary trees were used for the symbol table i n MINCOM. Since the higher l e v e l s contain fewer than sixteen symbols, a clear gain i n table search time cannot be shown. Experience with XPL programs has shown that many global v a r i a b l e s are used therefore the l e v e l zero tree i s normally much l a r g e r than sixteen so that an o v e r a l l gain i s r e a l i z e d . In order to be able to add l a b e l s to the l e v e l zero tree from any l e v e l , a separate data area was assigned to l e v e l zero. 5.7 Code Generation A two hundred and f o r t y byte b u f f e r i s kept i n memory for the program i n s t r u c t i o n s . A s i m i l a r b u f f e r i s kept for the s t r i n g con-stants and l e v e l zero arrays. Since each intermediate language i n s t r u c -t i o n occupies s i x bytes, only f o r t y i n s t r u c t i o n s are stored i n memory. The XPL function FILE(N,M) i s used to write into and read from a f i l e . N s p e c i f i e s the f i l e number and M the block number. The UBC XPL im-plementation uses a block s i z e of 240 bytes hence the small b u f f e r . Sin MINCOM i s a s i n g l e pass compiler fix-ups are used for every forward branching i n s t r u c t i o n . A l l forward branches are assumed to be d i r e c t l y addressable and are coded that x^ay. Each time a reference to an undefined l a b e l (both user defined l a b e l s . and machine defined labels) i s made, the i n s t r u c t i o n l o c a t i o n and l a b e l number are entered i n a fix-up stack. Each time the next i n s t r u c t i o n , counter i s incremented, i t s value i s compared with:the f i r s t entry i n the fix-up stack. If the fix-up entry cannot be d i r e c t l y addressed, an address constant i s defined and the o r i g i n a l i n s t r u c t i o n i s modified. 29 Because only f o r t y i n s t r u c t i o n s are kept i n memory, fix-ups often i n --volve f i l e reading and w r i t i n g . The execution time of MINCOM was not found to be excessive therefore a more elaborate f i l e b u f f e r i n g scheme was not w r i t t e n . When a l a b l e i s defined, a l l references to that l a b e l are removed from the fix-up stack. 5.8 Constants Fixed constants and address constants ( l a b e l number plus d i s -placement) are kept i n memory and are not w r i t t e n i n t o the object code f i l e u n t i l a f t e r compilation i s complete. The constants are stored i n a binary' tree. As long as addressable space remains f o r the constants (space i n the minicomputer that the intermediate code i s ult i m a t e l y meant for) no constants are repeated. When constants must be i n s e r t e d w i t h i n a program, they might be duplicated. Level zero variables may be i n i t i a l i z e d with an i n i t i a l l i s t . I f l e v e l zero v a r i a b l e s are not i n i t i a l i z e d , then they are set equal to zero. A l l other variables may not be i n i t i a l i z e d by the programmer and are not i n i t i a l i z e d by MINCOM.. 30 6. ANALYZERS There are two c o n f l i c t i n g requirements that a grammar type must meet f o r minicomputer a p p l i c a t i o n s . The f i r s t i s t h a t the r e s t r i c -t i o n s on the BNF grammar must not be too severe. I t should be easy to create and debug BNF grammars acceptable to the grammar analyzer. The second requirement concerns the s i z e of the par s e r . The purpose of t h i s t h e s i s was to develop a TWS f o r minicomputers w i t h a minimum memory s i z e of e i g h t thousand words. H a l f of the a v a i l a b l e memory must be used f o r the semantic r o u t i n e s therefore the pa r s e r must occupy l e s s than four thousand words. Since an MSP(2,1;1,1) analyzer was a v a i l a b l e on the IBM 360 at UBC, the f i r s t a nalyzer produced f o r t h i s t h e s i s was a modif i e d v e r s i o n of the XPL TWS analy z e r . 6.1 A L o c a l Context Simple Precedence Grammar The XPL TWS analyzer was modif i e d to accept only simple p r e -cedence grammars (those which r e q u i r e only the top symbol on the parse stack and the next symbol from the input t e x t to make the reduce, s t a c k , or e r r o r d e c i s i o n ) . A simple precedence parser i s sm a l l e r than an MSP parser and uses s m a l l e r tables but simple precedence grammars are d i f -f i c u l t to w r i t e . In order to extend the c l a s s of grammars acceptable to the a n a l y z e r , a f u r t h e r m o d i f i c a t i o n was made. The pa r s e r was w r i t -ten so that i t would check input- symbols f o r s p e c i a l keywords. When a keyword was found, a simple precedence grammar a s s o c i a t e d w i t h that keyword was used to analyze the f o l l o w i n g statements. The grammar d e f i n i t i o n i n p u t to the analyzer was segmented w i t h $E0G cards. Each grammar was analyzed separately.. The ta b l e s were combined a f t e r an end of f i l e . 31 The l o c a l grammar approach extended the c l a s s of acceptable grammars and provided s m a l l e r t a b l e s than a g l o b a l simple precedence grammar d i d . Many s m a l l , l o c a l l y simple precedence grammars might not be g l o b a l l y MSP(2,1;1,1). The simple precedence grammars use a double s u b s c r i p t e d array f o r s t o r i n g the p o s s i b l e combinations of symbol p a i r s . A t r a n s l a t o r ' s grammar c o n s i s t s of many u n r e l a t e d statement types. These statements contain a l a r g e number of reserved words. Because of the independence of statement types, the. n by m (n t e r m i n a l symbols, m t e r m i n a l plus non-terminal symbols) array w i l l be mainly zeros i n d i c a t i n g i l l e g a l combinations. Independent grammars use many sm a l l two dimen-s i o n a l arrays of t o t a l memory s i z e much s m a l l e r than an n by m a r r a y . The grammar i n Appendix 1 was w r i t t e n to correspond to the B.C. Telephone system b u i l d e r language. The grammar was not w r i t t e n to correspond to any p a r t i c u l a r c l a s s of grammars". A f t e r i t was segmented by keywords, i t was found to be not l o c a l l y simple precedent. I t was al s o not MSP(2,1;1,1). Rather than for c e the user of the minicomputer TWS to w r i t e complicated grammars, an LR(1) grammar analyzer and parser \tfere produced. The LR(1) grammars use the complete past h i s t o r y of the parse and do not r e q u i r e any s p e c i a l techniques to make use of p r e f i x verbs. 6.2 LR(1) Grammars Before the LR(1) p a r s i n g a l g o r i t h m produced f o r t h i s t h e s i s can be described, a few d e f i n i t i o n s must be given. The pa r s e r i s a f i n i t e s t a t e machine (FSM). At each step of the parse, the pa r s e r i s i n one of n f i n i t e s t a t e s . Information a s s o c i a t e d w i t h that s t a t e i s used to determine the next s t a t e . A c o n f i g u r a t i o n w i t h a normal 32 successor symbol i s c a l l e d a t r a n s i t i o n . The successor may be the n u l l symbol i n d i c a t i n g the end of a pr o d u c t i o n . A c o n f i g u r a t i o n w i t h a n u l l successor i s c a l l e d a r e d u c t i o n . States w i t h two or more reductions and s t a t e s w i t h both t r a n s i t i o n s and reductions are c a l l e d inadequate s t a t e s . They are c a l l e d inadequate s t a t e s because one or more symbols from the inp u t t e x t must be i n v e s t i g a t e d before a d e c i s i o n to stack or reduce can be made. The LR(1) grammars considered i n t h i s t h e s i s r e -qu i r e only one symbol from the input t e x t .to make a l l the s t a c k , reduce or e r r o r d e c i s i o n s . DeRemer [ 6 ] used look-ahead se t s w i t h each c o n f i g u r a t i o n i n an inadequate s t a t e . The parser compared the next symbol from the i n p u t t e x t w i t h each symbol i n the look-ahead s e t a s s o c i a t e d w i t h a c o n f i g u r a -t i o n . I f a match was found, then the c o n f i g u r a t i o n was used to d e t e r -mine the next s t a t e . The look-ahead s et a s s o c i a t e d w i t h a t r a n s i t i o n i s the successor symbol. The set a s s o c i a t e d w i t h a reduction, c o n s i s t s of a l l t e r m i n a l symbols that may f o l l o w the r e d u c t i o n . A Very Simple LR(1) (VSLR(l)) p a r s e r and grammar an a l y z e r were produced f o r t h i s t h e s i s . A grammar i s VSLR(l) i f i t i s LR(1) and each s t a t e has zero or one r e d u c t i o n . For LR(1) grammars, the d e c i s i o n between s t a c k i n g or reducing can always be made without us i n g the l o o k -ahead sets a s s o c i a t e d w i t h the r e d u c t i o n s . I f the next i n p u t symbol does not match any of the t r a n s i t i o n s i n an inadequate s t a t e , then a re d u c t i o n must be made (or an e r r o r has o c c u r r e d ) . I f the next symbol was i n e r r o r then that e r r o r w i l l be observed i n the next non-inadequate s t a t e . I f the next symbol does match a t r a n s i t i o n symbol, then that t r a n s i t i o n must be the c o r r e c t one because only the next symbol may be used to make the d e c i s i o n . The flowchart f o r the VSLR(l) p a r s i n g C O M P I L I N G ? I No S E T N E X T S T A T E P O I N T E R S 1 ' R E Q U I R E A N O T H E R . S Y M B O L ? ' C A L L S C A N N E R F O R ' ' " 1 A N O T H E R S Y M B O L , GONE T O O F A R \ Y e s - / C A L L R E C O V E R Y IN S T A T E ? / — ^ \ R O U T I N E No E D U C T I O N T O A Y e s B E M A D E ? r ^ ' C A L L S Y N T H E S I Z E No ' S U C C E S S O R S Y M B O L \YeSj vMATCHE S I N PUT ' S Y M B O L ?/ No S T A C K S Y M B O L F i g u r e 6 . 1 V S L R ( 1 ) P a r s i n g A l g o r i t h m 3 4 1 <P ROGRA M> : := <S T A T EMENT L I S T > END 2 <S T A T E M E N T L I S T > ::= < S T A T E M E N T > 3 I < S T A T EMENT L I S T > < S T A T E M E N T > 4 < S T A T E M E N T > : := <ID L I S T > . < A S S IGNMENT> ; 5 I <ID L I S T > <OTHER A S S I G N M E N T > . 6 | <ID E NT IF IER> < A S S I G N M ENT> . 7 | <1D E N T I F I E R > <OTHER A S S IGNME NT> ; 8 < A S S I G N M E N T > : := = <ID E N T I F I E R > 9 <OTHER A S S IGNM ENT> : := = < I D E N T I F I E R > 10 <ID L 1 S T > : := < I D E N T I F I E R > < I D E N T I F I E R > 11 I < I D E N T I F 1 E R > <ID L I S T > F i g u r e 6 . 2 A Non S L R ( 1 ) G r a m m a r / C T A T* IT MTS S T A T E M E N T L I S T ) < S T A T E M E N T > <l D L I ST> I D E N T I F I E R ) END h0 <ASSIGN , \A£NT> 1 0 <0THER A S S I G N M E N T © 17 <l D E N T I F I E R > 1 8 < A S S I G N M E N T ) 1 1 <0THER A S S I GNMEN' 1 2 © 1 3 1 4 Kz) <l D E N T I F I ER> 1 5 <ID L I S T ) 1 6 - © F i g u r e 6 . 3 F S M F o r G r a m m a r o f F i g . 6 . 2 35 a l g o r i t h m i s given i n Figure 6.1. Although the VSLR(l) parser accepts a wide c l a s s of grammars, some common LR(1) grammars are r e j e c t e d . For example, the f o l l o w i n g grammar i s not V S L R ( l ) . 1. <program> ::= <statement l i s t > END 2. <statement l i s t > : : = <statement> 3. | ^statement l i s t > <statement> 4. <statement> ::= <assignment>; 5. | <other assignment> . 6. <assignment> ::= < i d e n t i f i e r > = < i d e n t i f i e r > 7. <other assignment)-: := < i d e n t i f i e r > = < i d e n t i f i e r > The equal r i g h t p a r t s of production 6 and 7 force two reductions i n one of the s t a t e s of the FSM. The next i n p u t symbol must be used to d e t e r -mine which r e d u c t i o n to make. I f i t i s a semicolon then p r o d u c t i o n 6 should be used otherwise production 7 should be used. The SLR(l) p a r s e r of DeRemer [6] i s able to make the d e c i s i o n by s t o r i n g a semicolon w i t h r e d u c t i o n 6 and a p e r i o d w i t h r e d u c t i o n 7. As l o n g as the l o o k -ahead sets are d i s j o i n t , the SLR(l) parser can proceed. Another example of a non-VSLR(l) grammar i s given i n Figure 6.2. The grammar i s LR(1) but not SLR(l) . The FSM c o n f i g u r a t i o n has been given i n Figure 6.3. A look-ahead set cannot be used to decide between reductions 8 and 9 i n s t a t e 18 because the look-ahead s e t s are dependent on the previous s t a t e s . The previous s t a t e s must be used to i n d i c a t e whether a semicolon i n d i c a t e s that p r o d u c t i o n 8 or 9 should be used i n s t a t e 18. DeRemer [6] s o l v e d t h i s type of problem by s p l i t -t i n g s t a t e s . States 17 and 18 may be repeated as s t a t e s 19 and 20. State 10 then leads to s t a t e 19 and the look-ahead s e t s i n s t a t e 20 w i l l be d i s j o i n t . 36 One of the goals of t h i s thesis was to produce a small parser which could use a wide class of grammars. A small routine was added to the VSLR(l) parser i n order to extend i t to an LR ( 1 ) parser. A l l the information required to c a l c u l a t e look-ahead sets i s contained i n the state array. I t was therefore possible to make the parser c a l c u l a t e look-ahead sets when required. The approach required no state s p l i t t i n g and no storage of look-ahead sets. The algorithm used to c a l c u l a t e the look-ahead sets occupied about 2 2 0 words on the Nova. Since that rou-t i n e was not much l a r g e r than the comparison algorithm required to search through stored look-ahead sets, the approach i s consistent with the goal of small memory s i z e at, p o s s i b l y , the expense of execution time. The execution time of the parser look-ahead algorithm (PLALR(l)) whould be longer than that of a stored look-ahead set parser but since both methods involve a search (ELALR(l) searches through the state matrix f o r a symbol, LALR(l) searches through a look-ahead set f o r a symbol) the difference should not be too s i g n i f i c a n t . Another way of looking at the PLALR(l) parsing algorithm i s to consider the parser to be i n one of two modes. The f i r s t mode i s a VSLR(l) mode. The parser continues i n the VSLR mode u n t i l more than one reduction i s encountered i n a s t a t e . A search mode i s then entered. In e f f e c t , the f i r s t reduction i s attempted. I f no e r r o r r e s u l t s a f t e r the next input symbol has been t r i a l stacked then the o r i g i n a l d e c i s i o n was the correct one. I f an e r r o r i s encountered, the following reduc-ti o n i s attempted. The process continues u n t i l a correct reduction i s found or u n t i l the l a s t reduction i n the o r i g i n a l state must be attempted. The l a s t reduction must be the correct one and the parser i s put back int o the VSLR mode. 37 ( E N T E R ) N C R E M E NT P O I N T E R Y e s O N L Y ONE MORE P O S S I B L E R E D U C T ION N f e s IN O R I G I N A L I N A D E Q U A T E S T A T E ? No ADD P R O D U C T I O N T O I N V E S T I G A T I O N S T A C K i N V E S T I G A T ION S T A C K E M P T Y ? No U S E L E F T S Y M B O L A N D P A R S E - S T A C K T O L O C A T E N E X T S T A T E • ( R E T U R N ) S E T S T A R T A N D S T O P P O I N T E R S F O R N E X T S T A T E F i g u r e 6.4 P L A L R ( 1 ) S e a r c h A l g o r i t h m 3 8 The example of a non-SLR(l) grammar was a very simple one. In general, a f i n i t e number of reduction decisions must be i n v e s t i g a t e d before a look-ahead set can be determined. The PLALR(l) algorithm makes use of the fa c t that the parse stack can never increase as nested reduc-tions are encountered. The parser must make a t r i a l reduction, obtain the next possible state, check the allowable terminal symbols and, i f a match occurs between one of them and the next input symbol, pro-ceed with the reduction under i n v e s t i g a t i o n . I f another reduction i s encountered before a match i s found, then i t s next state may be deter-mined from the parse stack. The algorithm i s r e l a t i v e l y simple for the LR(1) case because nested t r i a l reductions do not a f f e c t one another. Only the o r i g i n a l parse stack need be used to determine next s t a t e s . A h i s t o r y of t r i a l reductions i s not kept. The flow chart for the search algorithm appears i n Figure 6.4. The algorithm may be generalized to work for LR(k) grammars, although a l l inadequate states would have to be inv e s t i g a t e d , rather than only those with more than one reduction. The. search would terminate a f t e r k symbols had been matched. The values of k for an inadequate state would have to be stored i n the state array. I f the previous parse stack i s stored before each search and comparison of an input symbol, then the algorithm may proceed as i n the LR(1) case. A mis-match at any stage must restore the previous parse stack and input sym-b o l . I t i s necessary to maintain parse stacks i n the general case because matched symbols must be stacked before the algorithm can proceed to the next symbol. 39 6.3 LR(1) Analyzer Algorithm The LR(1) analyzer program accepts a grammar w r i t t e n i n a modified BNF and produces a l l the t a b l e s r e q u i r e d by the PLALR(l) parser described i n the previous s e c t i o n . The t a b l e s are output as XPL d e c l a r a t i o n statements. An example of the tables produced by the grammar analyzer i s shown i n Figure 6 i 5 - The ta b l e s were produced f o r the grammar of Figure 6.2. W" CARD OUTPUT 1 DECLARE V CHARACTER(18) INITIAL t 1 ) I, 1=1, 'END', 1 <IDENTIEIER> 1 ) ) . . . CARD OUTPUT --- 1 DECLARE MAX_RL L I TE.'R ALL Y I 1 2 1 1 . . . CARD OUTPUT 1 DECLARE CKAR_INDEX(13) DIT(3) INITIAL (0 / 1, «, «, 5, 5, 5, 5 , 5, 5, 5, 5, 5 , . . . CARD OUTPUT 1 6)1 . . . CARD OUTPUT --- 1 DECLARE PR_LENGTH ( 1 1 ) BIT(3) INIT.TALCO, 3 , 1- 2, 3, 3, 3, 3, 2, 2, 2, 2)| "i-V "CARD "OllfPlTT "--- 1 DECLARE LEFT..PART ( 1 1 ) fi I T (<!) INITIAL (0, 6, 10, io", "a", a", "8, 8 , 9 , "Ti, T, TJ1 . . . Cf.R0 OUTPUT 1 DECLARE STATE ('15) 5IT(5) INITIAL C 10, 1 , 3, 5, 2, a, s, 10, 0,1, 0,2, 0, 3, CARD OUTPUT 1 17, 6, 8, 7, 0 , « , <?, 0,5,- 17, 11, 13, 15, \b, 12, 0 .• .• l_'l. 0 , 7 , IS. \h, CARD OUTPUT"'-'~- I 0.10, 0 . 1 1 , 1fi. 0,8, 0,9, 0)1 . . . CARD OUTPUT 1 DECLARE STATE„P(19) BIT C fa J INITIAL ( 0, H , 8, 10 , 12, 14, 17, 18, 20, 21, 23, CARD OUTPUT --- 1 28, 2", 31, 32, 3 a , 38, '10, 11, 'I5)t . . . C A R D OUTPUT 1 DECLARE STATK_SYM(18) 0IT(«) INITIAL (0, 10, t, 8, 8, 7, 9 , 2, 11, 1, 5, 9 , . . . CARD OUTPUT 1 1, 11, 2, 5, 7, 3, 5)| PUNCHING COMPLETE, EXECUTION TERMINATED' Figure 6.5 Example of Tables Produced by Grammar Analyzer The t a b l e s , parser and semantic r o u t i n e s may be compiled by XCOM to produce IBM 360 obj e c t code or by MINCOM to produce an i n t e r -mediate language program. The analy z e r has been w r i t t e n so that a f l a g may be s e t to i n d i c a t e which compiler w i l l be used to t r a n s l a t e the t a b l e s . The only d i f f e r e n c e i n the ta b l e s i s the vocabulary d e c l a r a t i o n . XCOM accepts an n-dimensional array of symbols of u n s p e c i f i e d l e n g t h . MINCOM does a l s o but assigns a d e f a u l t l e n g t h to each c h a r a c t e r v a r i a b l e . Since the vocabulary w i l l never be a l t e r e d , i t should be packed as e f f i c i e n t l y as p o s s i b l e . The s i n g l e character symbols should only occupy one byte r a t h e r than the d e f a u l t l e n g t h of e i g h t y b y t e s . The vocabulary meant f o r MINCOM i s output as a s i n g l e c h a r a c t e r v a r i a b l e 40 ( E N T E R ) C A L C U L A T E T H E I N I T I A L B A S I S S E T S F O R T H E G O A L P R O D U C T I O N F I N I S H E D C O M P U T I N G C L O S U R E S E T S F O R A L L S T A T E S ? No I N I T I A L I Z E P O I N T E R S T O N E X T S T A T E Yes \ ' F I N I S H E D L O O K I N G A T A L L C O N F I G U R A T I O N S I N No C O N F I G U R A T I O N T O B E A D D E D T O S T A T E ? Yes A D D NEW C O N F I G U R A T I O N A D D NEW S T A T E S I F R E Q U I R E D F i g u r e 6.6 L R ( 1 ) A n a l y z e r A l g o r i t h m 41 of length equal, to the t o t a l number of characters i n the vocabulary. The array CHAR-INDEX i s used to s t o r e p o i n t e r s to the i n d i v i d u a l symbols. 6.4 Other Data i n Output D e c l a r a t i o n s The l i t e r a l MAX-RL gives the length of the longest word i n the vocabulary. That i n f o r m a t i o n i s used by the scanner to q u i c k l y decide whether or not a symbol can be a reserved word. The arrays PR-LENGTH and LEFT-PART provide a l l the i n f o r m a t i o n r e q u i r e d about the productions i n the grammar. PR-LENGTH gives the number of symbols i n the r i g h t p a r t of the productions. LEFT-PART gives the symbol numbers as s o c i a t e d w i t h the l e f t - p a r t non-terminal symbols. The main analyzer r o u t i n e i s used to c a l c u l a t e the remaining three arrays. The array STATE contains a l l successor i n f o r m a t i o n . STATE-P i s used to s t o r e p o i n t e r s i n t o the s t a t e array. STATE-SYM contains the symbol number used as an entry to a s t a t e . 6.5 State A l g o r i t h m Each s t a t e c o n s i s t s of a set of c o n f i g u r a t i o n s composed of a b a s i s s et unioned w i t h a closure s e t . Each t r a n s i t i o n c o n f i g u r a t i o n i n a s t a t e forms a b a s i s c o n f i g u r a t i o n i n a f o l l o w i n g s t a t e . The suc-cessor symbol of a b a s i s c o n f i g u r a t i o n i s the symbol t h a t f o l l o w s the previous t r a n s i t i o n successor symbol i n the p r o d u c t i o n . The c l o s u r e s e t of a s t a t e c o n s i s t s of a l l productions w i t h a l e f t p a r t equal to a successor of the s t a t e . The successor symbol of a c l o s u r e c o n f i g u r a t i o n i s the f i r s t symbol i n the r i g h t p a r t of the. p r o d u c t i o n . The flowchart f o r the r o u t i n e used to c a l c u l a t e the s t a t e a r r a y i s given i n Figure 6.6. I t i s s i m i l a r to the a l g o r i t h m o u t l i n e d 42 by DeRemer [6]. The routine begins i n state zero with the f i r s t pro-duction c a l l e d the goal production. State zero i s i n i t i a l i z e d to the singl e configuration consisting of the goal production. The f i r s t symbol of the r i g h t part of the goal production i s expanded by creating new states with a l l the symbols i n the . r i g h t part of the goal production used as successors. For example, consider the grammar of Figure 6.2. State zero i s i n i t i a l i z e d to the goal production: <program> ::= <statement l i s t > END with the successor '<statement l i s t > ' leading to state one. State one now consists of the goal production with 'END' a successor leading to state two. Since there are no more symbols i n the goal production, state two c a l l s f o r a reduction (see Figure 6.3). The algorithm con-tinues by expanding the non-terminal symbols i n a l l the st a t e s . One or more closure configurations must be added to a state f o r each non-terminal successor i n a s t a t e . I f a closure configuration has a successor already e x i s t i n g i n the state then the new configuration (with the successor pointer moved by one po s i t i o n ) must be added to the basis set of the in d i c a t e d successor s t a t e . A l l productions are expanded completely once. Productions may be expanded more than once only i f two or more configurations i n a state have the same successor symbol. This approach produces a minimal f i n i t e state machine. A l l non-terminal successors i n a state produce a d d i t i o n a l closure' configura-t i o n s . The routine adds closure configurations to a l l states beginning with state zero and f i n i s h i n g with the l a s t s t a t e . Any state that has already been analyzed for closures but whose basis set i s augmented, i s re-analyzed. A f t e r a l l basis and closure configurations have been ca l c u l a t e d , 43 ( E N T E R ) COMPLETED ALL STATES? NO N £ _ / REDUCTION IN STATE? 1 Y e s — { R E T U R N ) LOAD PROPOSED REDUCTIONS ON A STACK, LONGEST PRODUCTION ON TOP FIND THE NEXT OCCURRENCE OF THE LONGEST PRODUCTION'S LEFT PART IN FSM. I N I T I A L I Z E CHECK MATRIX TO SUCCESSOR SYMBOLS CALCULATE ALL FOLLOWING TERMINAL SYMBOLS PRINT ERROR M E S S A G E S IF NECESSARY CHECKED ALL \ Y e s PRODUCTIONS I N STACK? No SEARCH INTO FSM FOR NEXT PRODUCTION TO BE CHECKED. START AT PREVIOUS POINT IN FSM Yes NON-TERM I NAL \No__ FOUND?. / 6.7 E r r o r D e t e c t i o n F o r LR(1 ) A n a l y z e r 44 — -< 9 • a • 0 tt 0 • • 18 0 B 1 7 0 - S f — -< 10 0 A 0 0 STACKS IZE SP MP PARSE STACK FIXV VAR The c o n t e n t s of the p a r s e s t a c k s a s s o c i a t e d w i t h the example of F i g u r e 7.1 I i n e has been e n t e r e d . i n d i c a t e v a l u e s a f t e r the f i r s t F i g u r e 6.8 S k e l e t o n P a r s e r Data S t r u c t u r e 45 the f i n i t e s t a t e machine i s checked f o r ambiguities (non-LR(l) gram-mars) . Next a l l redundent i n f o r m a t i o n i s removed from the s t a t e a r r a y , the reductions are moved to the end of each s t a t e and the t a b l e s are produced. 6.6 Detection of Non-LR(l) Grammars A grammar i s not LR(1) i f more than one symbol from the i n p u t t e x t i s r e q u i r e d to make a s t a c k i n g o r reducing d e c i s i o n . The a l g o r i t h m operates as i n Figure 6.7. I t c a l c u l a t e s a l l the look-ahead sets assoc-i a t e d w i t h each inadequate s t a t e . Only the look-ahead se t s that may occur w i t h the same l e f t context are compared w i t h i n an inadequate s t a t e . A grammar i s LR(1) i f those sets are d i s j o i n t . 6.7 Skeleton Parser V a r i a b l e s A l l the parsers produced f o r t h i s t h e s i s i n t e r f a c e w i t h the semantic r o u t i n e s SYNTHESIZE i n the same manner as the XPL s k e l e t o n compiler of McKeeman et a l . [3]. Three p a r a l l e l stacks are used by the p a r s i n g algorithm. These stacks are shown of Figure 6.8. The PARSE-STACK contains i n f o r m a t i o n i n d i c a t i n g the past h i s t o r y o f the parse. The character array VAR contains the a c t u a l symbols input to the pa r s e r and the array FIXV contains the i n t e g e r value of any number input to the p a r s e r . The semantic r o u t i n e SYNTHESIZE i s c a l l e d when a r e d u c t i o n has been recognized but has not yet been done. The parameter passed to SYNTHESIZE i n d i c a t e s the pro d u c t i o n number of the proposed r e d u c t i o n . The g l o b a l v a r i a b l e s MP, MPP1, and SP p o i n t i n t o the parse s t a c k s . MP p o i n t s to the l e f t symbol of the recognized p r o d u c t i o n s , SP p o i n t s to the r i g h t symbol and MPP1 i s equal to MP+1. 46 7. ERROR RECOVERY FOR THE PLALR(l) PARSER The parser was meant f o r use w i t h minicomputers o f l i m i t e d memory s i z e . Elaborate e r r o r c o r r e c t i o n techniques such as des c r i b e d by Levy [10] could not be used because of the memory s i z e l i m i t a t i o n . McKeeman et a l . [3] used a simple method of e r r o r recovery. T h e i r p a r s e r , when i t discovered an e r r o r , c a l l e d a recovery r o u t i n e . The recovery r o u t i n e scanned the inp u t t e x t f o r s p e c i a l s i n g l e c h a r a c t e r s . Once a s p e c i a l character was found, the i n p u t stack was reduced u n t i l a l e g a l symbol p a i r was found. An input symbol was dis c a r d e d on a se-cond c a l l to the r o u t i n e . The method worked w e l l as long as no s i g n i -f i c a n t reserved words (such as PROCEDURE i n XPL) were ignored. The f i r s t e r r o r recovery r o u t i n e used w i t h the PLAP.L(l) parser was very s i m i l a r to McKeeman's. I t attempted to make use of the p o s s i b i l i t y that the t r a n s l a t o r was bein g used i n an i n t e r a c t i v e mode. I f an e r r o r occurred, the parser was backed up to the l a s t symbol used i n a r e d u c t i o n , the remainder of the curr e n t input l i n e was d e l e t e d , •and a message was p r i n t e d g i v i n g the l a s t symbol used. E r r o r c o r r e c t i o n could then take place w i t h no e r r o r i n the output data. I f f u r t h e r e r r o r s occurred, the parse stack was reduced and inp u t symbols were d i s -carded u n t i l a c o r r e c t reduction was found. For batch processing of programs, the method o u t l i n e d d i d not work w e l l . An e r r o r i n one l i n e of an inp u t program produced e r r o r s throughout a whole s e c t i o n of program. One way to avoid l o s i n g impor-tant reserved words i s to make an attempt at completing the i n c o r r e c t statement r a t h e r than d e l e t i n g i t and searching f o r a r e s t a r t i n g p l a c e . A very simple r o u t i n e was w r i t t e n to do t h a t . A=B. • 4 7 A = B ; A * B * C = A B C = 8 = B 6 A 0 2 e » 9 B 7 A * 3 « D . 10 B 1 1 1 A 9 = D 5 = a 3 * e D; 10 B 1 11 A 8 = D 4 = • 3 c > A = B C=F. 9 = B * * * E R R O R * S Y N T A X E R R O R A T S Y M B O L C 7 A C 3 > C 8 = F 6 C . 3 C . A B C . 10 B C 11 A . * * * E R R O R * S Y N T A X E R R O R A T S Y M B O L . A = B . 9 . A 5 . c 3 * • A= B * 9 = B 7 A ; 3 . ; ' A * B= • 10 A B * * * E R R O R * S Y N T A X E R R O R A T S Y M B O L . C * D * E 9 = a 5 = o 3 * • = F. 10 D E 1 1 C = 9 = F 5 = e 3 • o E N D 1 E N D / / E X E C U T I O N T E R M I N A T E D F i g u r e 7.1 S a m p l e p r o g r a m s h o w i n g E r r o r R e c o v e r y 48 The recovery r o u t i n e , when f i r s t c a l l e d , i n s e r t e d the s h o r t e s t p o s s i b l e c o r r e c t symbol as the next i n p u t symbol. The symbol was i n s e r -ted only as f a r as the parser was concerned. No a c t u a l change was made to the input t e x t . A second successive c a l l to the recovery r o u t i n e caused the o r i g i n a l input symbol to be dis c a r d e d . • I f a symbol was mis-p e l l e d or mis s i n g , and only one symbol was allowed i n the context, then e r r o r c o r r e c t i o n r e s u l t e d . In other cases, e r r o r recovery o n l y o c c u r r e d . The recovery r o u t i n e , as described above, was implemented w i t h the grammar of Figure 6.2. The recovery r o u t i n e d i d not f u n c t i o n w e l l because most p o s s i b l e syntax e r r o r s could not be i n t e r p r e t e d as s i n g l e e r r o r s . For example, an i n c o r r e c t statement such as "A = B D." r e s u l t e d i n the r e c o g n i t i o n of "A = B" as a statement. Attempts at r e s o l v i n g "D." fo l l o w e d by a c o r r e c t statement f a i l e d . The recovery r o u t i n e was modifi e d to enter a search mode a f t e r i n s e r t i n g a c o r r e c t symbol. During search mode, a c a l l to the recovery r o u t i n e caused a l l symbols stacked s i n c e the l a s t r e d u c t i o n to be d i s -carded. Search mode was l e f t a f t e r N (N was set to 5) symbols were read from the input t e x t . An example of the PLALR(l) p a r s e r w i t h e r r o r recovery imple-mented on the IBM 360 f o r the grammar of Figure 6.2 i s shown i n Figure 7.1. A l l i n p u t begins near the l e f t margin. The semantic r o u t i n e p r i n -ted the production number with the l e f t and r i g h t symbols of the produc-t i o n . The parser was used i n an i n t e r a c t i v e mode t h e r e f o r e the semantic r o u t i n e output f o l l o w s each input l i n e . The f i r s t four statements show the four c o r r e c t types of statements. N o t i c e that a l l s p e c i a l charac-ters not defined i n the grammar may be used as d e l i m i t e r s . The f i r s t e r r o r was the omission of a p e r i o d or a semicolon. The par s e r i n s e r t e d 4 9 a semicolon and recovered from the syntax e r r o r (recovered from, not corrected, the erro r because a period may have been intended). The second error could have been a typing mistake where a space was entered instead of an equal s i g n . The parser recognized an <id l i s t > before discovering the error. At that point i t inse r t e d an equal sign and discovered an i l l e g a l period. The period was discarded, the <identifier> A was used as the ri g h t part of <other assignments , =B was discarded, and the period was used to complete the statement. Error recovery took place but one statement was translated i n c o r r e c t l y . The t h i r d error was the omission of an (identifier>. The parser i n s e r t e d an <identifier> and corrected the e r r o r . Even though the parser was able to correct the missing<identifier>in the l a s t e r ror, the semantic routine would not f i n d a period i n i t s <ide.ntifie.r> symbol table and a t r a n s l a t i o n e r r o r would s t i l l r e s u l t . An error recovery routine which discards i n c o r r e c t statements rather than completes them would have recovered from the second error without a f f e c t i n g the following statement. No d i f f i c u l t i e s a r i s e when a program consists only of a statement l i s t . But when a hierarchy of statement types i s used, one discarded statement may prevent many following statements from being translated c o r r e c t l y . The parser recognizes only syntax e r r o r s . L o g i c a l errors i n a program can never be recognized but other types of err o r s must be detected. A t y p i c a l t r a n s l a t o r must check a l l information input as a program for such things as s i z e l i m i t s on numbers, possible length l i m i t s on < i d e n t i f i e r ^ s and declaration of <identifier>s before use elsewhere. These errors must be detected with the semantic routines. There i s no simple way of recognizing them within the parser. 50 8. RESULTS The constants i n MINCOM were set to correspond to the req u i r e -ments of a Data General Nova minicomputer. A macro generator was w r i t -ten to tr a n s l a t e the intermediate language output by MINCOM to Nova assembly language. An assembly language monitor was wri t t e n f o r the Nova. Only the teletype input and output, high speed paper tape reader and punch were used by the monitor therefore the FILE(N,M) input and output i n the XPL language could not be used i n programs meant to run with the Nova monitor. The VSLR(l) and PLALR(l) parsers were t r a n s l a t e d i n t o Nova assembly language. 'The size of the PLALR(l) parser i s given i n Figure 8.1. The PLALR(l) parser i s the VSLR(l) parser with a search procedure added. The search procedure occupied about 220 words of memory. 16 b i t words Program s i z e 1650 Page zero space used 190 Str i n g constants, s t r i n g variables and parse stacks 600 Monitor 670 Figure 8.1 PLALR(l) Nova Memory Requirement The XPL grammar i n Appendix I I , the B.C. Telephone system b u i l d e r grammar i n Appendix I, and a graphics language grammar [11] were input to the XPL MSP analyzer and to the minicomputer LR(1) analyzer The table s i z e f o r each grammar i s given i n Figure 8.2. A l l data was assumed e f f i c i e n t l y packed. The MSP table s i z e f o r the B.C. Telephone system b u i l d e r grammar i s approximate since the grammar was not MSP(2,1; 1,1). 51 X P L M S P ( 2 , 1 ; 1 , 1 ) 2 7 5 4 b y t e s 1 6 1 5 b y t e s P L A L R t 1 ) B . C . T e l s y s t e m b u r I d e r 1 9 4 8 b y t e s 1 2 2 4 b y t e s G r a p h i c s l a n g u a g e 2 0 3 7 b y t e s 1 3 5 3 b y t e s F i g u r e 8 . 2 T a b l e s i z e s f o r v a r i o u s g r a m m a r s The XPL grammar and the B.C. Telephone system b u i l d e r grammars were found to be VSLR(l) . The graphics language was SLR(l) . The tables for the B.C. Telephone system b u i l d e r were combined with the PLALR(l) parser and t r a n s l a t e d i n t o Nova assembly language by MINCOM and the macro generator. An example of the r e s u l t a n t program running on the Nova i s given i n Figure 8.3. Trie parser was used i n an i n t e r a c t i v e mode. The parser p r i n t e d a semicolon when input was required. The semantic routine p r i n t e d the production number with the l e f t and r i g h t symbols i n the production. The example shows how data statements are analyzed by the parser. The semantic routine that was used with the parser i s given below: S Y N T H E S I Z E : P R O C E D U R E ( P R / / ) ; D E C L A R E V A R N E S T E D ; D E C L A R E P R / / F I X E D ; 0UTPUT=' * | | P R / / | | 1 ' | | V A R ( M P ) || 1 1 j | V A R ( S P ) ; E N D S Y N T H E S I Z E ; The example only shows which productions are recognized by the parser. I t does not i n d i c a t e the meaning of the input statements or what a B . C . Telephone system b u i l d e r a c t u a l l y does. A complete system b u i l d e r would contain a much larger semantic routine (written i n X P L ) . The X P L C A S E statement would be used to execute a set of statements associated with a production number. The system b u i l d e r semantic routine would output messages to d i r e c t the user a f t e r appropriate productions had 52 been recognized. It would analyze the parse stacks and tr a n s l a t e the input i n t o a system data base. The example shows that a parser f or a complex t r a n s l a t o r may be produced by the programs of the minicomputer TWS and run on a minicomputer with small memory s i z e . . D I C T ' T U S K ' * ' PA V I L ' * ' AL. ARM 1 • 31 T U S K TUSK 32 T U S K P A V I L 32 P A V I L A L A R M 1 ' .ALARM 2 ' * ' 7 1 F T X A ' * ' N O R M A L ' 32 A L A R M 1 A L A R M 2 32 A L A R M 2 71 F T X A 32 7 1 F T X A NORMAL . END 1 A N O R M A L END A . END 2 END END • M A S T E R K A M L O O P S * R E G I S T E R - B I T 5 -16 K A M L O O P S • V E R S I O N V 0 8 A 17 « V 0 3 A • S T A T I O N G R E E N * 0* L A S T - C O N T R O L - EQ UI P P E D - I S I S 2 0 S T A T I O N IS • . . S T A T I O N J I M * 1 * L A S T - CON T R O L - E Q U I P P E D - I S .39 . 2 0 S T A T I O N 19 . .END 15 END 5 • EN D 3 END END • S T A T I O N T U S K 2 3 T U S K TUSK . .GROUP 0 -2 4 • 0* A , ' 7 I E T X A * < ' . A L A R M 1 '> ' NORMAL . 71 F 29 A A 28 A 31 7 1 F T X A 7 1 F T X A 31 A L A R M 1 A L A R M 1 31 NORMAL 7 1 F NORMAL 7 1 F » E N D 3 3 < NORMAL 7 1 F 2 7 A • 2 5 . . 21 T U S K END 6 . END 3 END END END . 1 END F i g u r e 8 . 3 a S a m p l e R u n U s i n g G r a m m a r o f A p p e n d i 54 • L I N K 1 . I S - T Y P E F V 1 A 58 F V 1 4 F V 1 4 - T X A I S T U S K * 5*1 3* E V A L U A T E - A F T E R - D E L A Y . 2 63 61 F V 1 4 - T X A 59 I F - C X R I S WATT*. 5*. 1 7* E V A L U A T E - A F T E R - D E L A Y - 3 63 61 I F - C X R 60 I F - T X A I S PAVIL;>..5* i A> E V A L U A T E - A F T E R - D E L A Y 5 63 61 I F - T X A 60 . .END 57 F V 1 4 END 8 o END 2 END END o T C O N T R O L - S E Q U E N C E S U T 8 . 7 7 S U T B S U T B . V E R I F I C A T I O N ' S T A R T - U P . O F A L T E R N A T E T X M T R ' * ' T U S K 31 S T A R T - U P O F A L T E R N A T E T X M T R S T A R T - U P O F A L T E R N A T E T X M T R 32 S T A R T - U P O F A L T E R N A T E T X M T R T U S K RB21.. 78 . T U S K 79 R B 2 1 RB21 N821 .. 8 0 R B 2 ! NB21 • END 7 6. S U T B END 10 . END 3 END E N D END 1 END F i g u r e 8.3b S a m p l e R u n U s i n g G r a m m a r o f A p p e n d i x I 55 9. CONCLUSIONS The purpose of t h i s thesis was to develop a t r a n s l a t o r w r i t i n g system (TWS) f o r minicomputers. The TWS was to consist of programs which were to run on a large computer such as an IBM 360 and produce t r a n s l a t o r s f o r any s p e c i f i e d minicomputer. In order to make the TWS target computer independent, the high l e v e l language XPL was used for the semantic portion of a t r a n s l a t o r . An XPL to Intermediate language compiler was wri t t e n to a i d i n the trans-l a t i o n of XPL programs i n t o minicomputer assembly language programs. Three approaches were t r i e d i n an attempt to make a wide class of grammars acceptable to the TWS. The f i r s t grammar analyzer produced was a modified version of the MSP(2,1;1,1) analyzer of the XPL TWS. I t produced tables that were smaller than those produced by the XPL TWS analyzer but the grammars accepted by the analyzer were d i f f i c u l t to write. The parsing algorithm used by DeRemer [6] for SLR(l) grammars was modified to accept a smaller class of grammars. The VSLR(l) analyzer accepted the B.C. Telephone system b u i l d e r grammar. A small a d d i t i o n to the parser (requiring no change i n the tables produced by the analyzer program) changed i t i n t o an LR(1) parser. Grammars for p r a c t i c a l trans-l a t o r s w i l l correspond i n complexity to the B.C. Telephone system b u i l d e r grammar used i n th i s thesis as an example. Since that grammar was VSLR(l), the LR(1) grammar analyzer and parser produced f o r t h i s thesis should be adequate for most p r a c t i c a l a p p l i c a t i o n s . Further work i s required to modify MINCOM and extend the parsing algorithm and analyzer to LR(k) grammars. The version of MINCOM produced f o r this thesis does not allow memory to memory 56 i n s t r u c t i o n s i n the intermediate language. Since some minicomputers (such as-the DEC PDP/11) use memory to memory i n s t r u c t i o n s , a saving i n memory space and execution time could be obtained f o r those computers by extending MINCOM. A PLALR(k) parser was o u t l i n e d i n the t h e s i s . The idea was not developed further because of time l i m i t a t i o n s and because LR(1) gram-mars seemed to be adequate f o r real-time t r a n s l a t o r s . Very complex gram-mars could require LR(k) parsers. For these grammars, the look-ahead sets are very large, therefore the parser look-ahead scheme could very w e l l be the only f e a s i b l e approach for minicomputer a p p l i c a t i o n s . An i n t e r e s t i n g comparison could be made between the execution times of a PLALR(k) parser and a parser using stored look-ahead sets. 5 7 APPENDIX I E x a m p l e T r a n s l a t o r Grammar GRAMMAR A N A L Y S I S J M VE * S I OM A N A L Y Z E R V E R S I O N OF D S C 72 TODAY I S 0 3 - 2 1 - 7 3 P R O D U C T I O N S $P 1 <P> : := < S T ATE ME "-IT L I S T > END 2 <ST A T E M E N T _L I S T > . _ ]_: = _<S TA T l : ME NT > '_ _ _• _ 3 ' " ~ | " " <S f AT EH"E NT L I S f > < S T A T E!•'• EN t> < S T A T E M E N T > : := . P I C T <'.)ICT END> 5 I . '•' A S T E R < MASTER ENO> 6 I . S T A T I O N < S T A T I O N EMD> 7 I < I R STA T r v, L; M T L I S T > 8 I . L I N K < L I N K ENO> 9 I . T C O N T R D L <TCC NT P.0 L END> 10 | . T C Q N T R Q L - S E C U E N C E < T C 0 N T R O L _ S E Q U E N C E END> 11 < IF S T A T E M E N T L I S T > : : = < I F S T * R T > <1F L I S T ENO> 12 ~ <TF ST 4R t~> : •• = ~ L I N~< -TYPE ' < I D E ^ f I F 1ER > 13 I C C * ' M A N ) - T Y i > E <1 DENT I F I E R > _ _ <7)j-cT—gSJ7>> fTZ 7PHRASE™i~St> . £Ni5 • 15 <MAST£R tN")> _^:_=__ <M A S T E R STAR T>_ < VFR. S I 0N> < M A_S T H R_ L I S T > . END _ 1 6 < M A S T E R S T A R T > : : = < IOENT I F I E R > R E G I S T E R - B I T <NUMBER> 7 7 <VTRS7ON> : T i — . v5f<sio¥^i-MNfT?TTpr> I R _ <«ASTER L j_ST> : : = _ . <'•! S T A T O _ _ 1 9 " j " ~ < MAS T E R L I ST >"". ~ <K S T A T E > " " " " ~ ~ " 20 <M ST ATE > : := S T A T I O N < I OF NT I F I ER> <MUMBER> L A S T - C O N T R O L . - C O i l IPPE D - I S <NUMBER> ?.l < S T A T I O N EN.)> : : = ' < S T A T I O N > <CR :H IP> < S T A T I C \ P O I N T L I ST> . END 22 I < S T A T I O N > < L I S T > . END ? 3 <ST AT I ON > ' : : = < I Or.NT IF I ER> ? H < S T A T I O N P O I N T J . I S I > : _: = < P O I N T > 2 h " " I <ST AT I OM" P O I N T L I S T > < P 0 I N T > 27 < P Q I N T > : : = < P O I N T S T A R T > < P H K A S C I. I ST> <PDI.NT E NI) > < P U I N T s r \ p r > :.- = <M.JM'}CR> <AI.ARM> 2r-> < A I ARM> ""•':"='' A " 3 0 I ! 58 <r>HRASE L I 5T> : : = <ST^I!C-> : 2 I < P H R A S C L I ST> <STR ING> 3 3 < P D I N T END> : : = < < P H R A S E L 1 S T > > < P H R A S E L I S T > 3 4 < L I S T > : : = N E V E R - O I S A 3 L E < H I J ' 4 0 E R > <N l )MBEP> 3 5 1 < I . I S T > N E V E R - O I S A B L E <NUMBER> <NUMI3ER> 3 6 < I F L I S T £NQ> : : = < I F L I S T > . END 3 7 < I F L I S T > : : = < I F S TAT EME NT> 3 8 1 < I F L I S T > < I F S T A T E KE NT> 3 9 < I F S T M F y i 5 N T > : : = . I F < E X P R E SS I DN> < V E R B > < S P E C I A L 10 L I S T > 4 0 I . I F - N i U < E X°R E SS 1 0 N> <VERB> < S P E C I A L 10 L I S T > A l I SET < I D E N T I F I E R > 4 2 1 SET < I O E N T I F I E R > C H E C K - A F T E R - D E L A Y <NUM8ER> 4 3 < E X P R E S S rCN> : : = <EXPRESSIL1.NI> OR <TERM> 4 4 1 <TERM> 4 5 <TERM> : : = <»R I M 4 ft Y > AND <TERM> 4 6 . 1 < P R I M A R Y > ' 4 7 < P R I M A R Y > : : = <IL) = N T I F I E R > 4 8 1 ( < E X P R E S S I O N > ) 4 9 <VER ' 3 > : : = T H E N - P R I NT 5 0 1 T H E N - S E R V I C E - F A I L - S O - P R I N T 51 1 THE N - - 1 I N 0 H - A L A P . M - S 0 - P R I N T 52 1 T H E N - R E F U S E - A M D - P R I N T 5 3 < S P E C I A L ID L I S T > : : = < S P E C I A L I0> 54 1 < S P E C I A L ID L I S T > < S P E C I A L ID> 5 5 < S P E C I A L io> : : = < S T R I N G > ' 5 6 I % < S T R I N G > 5 7 < L I N K END> : := < L I M K S T A R T > <l. INK L I S T > . FND 5 8 <L I NK S T A R T > : : = <NiJM3ER> I S - T Y P E < I D E N T I F I E R > . 5 9 < L I N K L I S T > < L I N K > 6 0 • I < L I N X L I S T > < L I K K > 61 < L I N K > : : = <T!3 ENT I F IER > I S < I DENT I F IER > <ENDI NG> " ~ " " " " " 6 2 < E N 0 I N G > : : = <NlJ'-l R*ER> <MUMBER> 6 3 1 <NU:HER"> O J M M K E * > E V A L U AT E - A F T EP - D E L AY <NIJM3EP.> 6 4 1 <N . r - , ! H'R> <N!J-'-!(}ER> A B N O R M A L 6 5 1 <N'J'-i3r:R> <N l .J i«FR> NORMAL 6 6 < TF. IJMTRO L ENO> : : = <START> < V E R I F Y > < G P E R A T E > <POINT L I S T > <FND> 6 7 <ST ART> : : = < I 0 E N T J F 1 E R > I S - T Y P E <1 DEN T I E I ER> 6 « <E\'D> : : = ••.••.o 6 9 < O P E R A T E > : : = . Q U E R \ T E <MU'',BER> AT < ! D E N T IT I CR> 7 0 1 . R E L E A S E < N U M B ! T R > AT < I D E N T I F I E k > n 1 . O P E R A T E N U L L 72 1 . R E L E A S E N U L L 7 3 < P O I N T L I S T > : : = < P O I N T IMFD> 7 4 I < P O I N T L I S T > < P O I N T I N F O 7 5 < P O I N T l , N r J > : : = <I DEN T I F I E R > IS < I D E N T I F I E R > <NUMBER> <NUMBER> 7 6 < T C O N T ? O L S E Q U E N C E END> : : = < S E O S T A R T > < V E R I F Y > < I D L I S T > . END 7 7 < S E Q _ S T A R T > : : = < I DENT I F I E P . > 7 8 < V E R I F Y > : : = . V E R I F I C A T I O N <PHRA SE L I S T > 7 9 < ID L I S T > : : = <I D E N T IF I EP-> 8 0 I < ID L I S T > < I D E N T I F I E R > 59 APPENDIX II XPL Grammar GRAMMAR ANALYSIS J 'A VERSICN ANALYZER VERSION OF DEC 72 TODAY IS 01-26-73 P R O D U C T I O N S 1 < PR PGR AN > CSTATEMENT LIST> EOF 2 <ST AT E ME NT LIST> ::= < S T A. T E E N T > 3 I <STATE'•'ENT I. I ST> <STATEMENT> 4 ' <ST AT EME NT> ::= <3ASIC STATE H E N T > J5 I < I F STATE E NT > 6 < B A S I C S T A T E M E N T : : = < A S S I G N M E N T > ; 7 J „ < G P ' n i J P > :  « ~ ~ " ~ " | < P R O C E ' D U R E D E F I N I T I 0 N " >Y~ 9 I < R ? T U R N S T A T E M E N T > ; 1_0 I < C ' , L L S T A T E ' • ' E N T > ; . " l l I <Gn T O STATE'1ENT> "; 12 i < D E C L A R A T I O N S T A T E M E N T ; I — L _ 14 - I < L A R E L D E F I N I T I 0 M > < B A S I C S T A T E M E N O l_5 < I F S T A T E M E N T> : : - < I F C L A U S E > < S T A T F ^ ' E N T > 16 " " I < I F C L A U S E > <f~Ri.l E " A P T > <S TAT l:«ENT> 17 I < L A B E L '.)'.'• ~- I ' . ' I T ! ( . " > < I F S T A T E M E N T S I B < ! F C L A U S ! > ' : : = " I F ' < T X P " R E S " S I O N > " T H E N 19 < TR ! IE ? A " T > : : = < B A S I C S T A T E M E N T > E L S E 20 < G R O U P > : : = < G R O U P F E A D > < E N D I N G > Y F < G R " ' O U " P " " " H " E ' V ) > " :'f=00" ; 2 2 I DO <STEP DEFINITIONS ; 23 I CO < -Mil E Cl AUSE> ;  "2V I 00 <C A SE SELl"CTOR> ; 2 5 j . <GROUP HEAD> < ST ME ME NT> ?:~b<STEPOE F I N"I t I O'T>yr=~"~<"vTp:TA"4T 21 < 1 T E R A T I C N C C N T • " ! ' , ,> T 0 < E X P R E S S I 0N>_ 2 8 " ' ~ " f f n < F"X"P ?. E S S K I N > B Y <EX~5p ~ S S I ().'.'> _ 2 9 <WM I L E C L A U S E > ~ i.YLU L E <rx^R E S S . ! 0N> 30 < C A S E S E L E C T O R > C A S E < C X •> E S S I 0 N > i i <PRnCE>JRE O E F I M I T I CN> : : = < P § O C : ) '"IR'E ! I E A O > <"s"f AT ' E ' - I E ' I T L"1S~T> <vND"lNG>" 3 2 j < P R O C E D U R E H E A O : : = < P R O C F 0 ' . » \ E \ ' A " F > : _ _ _ 33 * ~ I * * < •••.•:•*. :r \.\" ~ > < r v.'-- > "; ' " " " " " 3 4 | O ' K O C C D U R E N A ' - ' E > < P A.-. A !'• f: T R I. I S T > \ 3') |- < DR0 C l i IP f. , N ( V . ! = > < P A R A M E T E R I. 1 S T > < T Y ' > E > ; 60 3 6 <PROCPOUR5 -J A' 1 =5 > : := < L •* B EL OEF INIT[ON> P R O C E D U R E 3 7 <P A R A '-'ET E R L I S T > ::= < P A R A M E T E R IIE\0> < IOFNT I FIER> ) < P A R A '•! ET ER H E A D> : : = (  " 3 9 " I <f»ARAMETFR UEAO> < I O f N T I F (EP > , AO < E N O I ,-IG> _ L L f 't^'P ~ A T " ~ |" "END" < I O E N T I F T E R > A 2 I < LA :5 E L D E F I N I T I O N S < E N D I N G > A 3 <TrA~jEL—D E E I N ! T 10 N > T T S T f O E N T I F T E R T - : : A A < R E T ij R N S T A T E M E N T > ::= _RE T U R N A 5" " ~ " | RE TURN < E X P R : i S $ 10N> A 6 <C A L L ST A T E N EN T> C A L L < V A R I A 8 L E >  A 7 <GO TO S T A T E M E N O ::= <G0 TO> < I D E N T I F I ER> A 3 <G0" f0> GO~fO A 9 I GOTO ~50 <DE C L A R A T 1 0 N~ S T A T EM EN T> i T S O- C LT I F ^ < DTCTATTTT"! 0 M EI.FMFNT> 5 1 • I < O E C L A R A T I O N S T A T E H E N T > , D E C L A R A T I O N EL EM EN T> " 5 2 " "< DE "c LA p. A TTO N ~ E I E M I N7 > TT= < WP"E~O E'C'LXR A T fo N> * R " 5 3 I < I D E N T I F I E R > L I T E R A L L Y < S T R I N G > ~54 <TY!'E D E C L A R A T I O N ^ T T ^ T H ^ T TF- IE R "S^^CYFTC IU IO'N > " f Y P E > : 5 5 I <0 0 0 N O HEA0> <NU'- !BER> ) <TY:-C> 5 6 j < T Y P E D E C L A R A T I 0 M > < I N I T I A L L 1 S T > 5 7 < T Y P E > F I X E D j_3 J C H A R A C T E R 5 9 I LAR'EL " 6 0 I N E S T E D 6 1 _ _ 1 E X T E R N A L _ • ' "b? T <01 T H E A D> < NUMB ' i : R. > T ' " "' ~ '"".". 6 3 I <CHAR ACT ER HEAD> < N U M Q E R > ) _ _ _ _ _ _ _ _ _ _ _ _ _ ^ - p j r — j 6 5 <CMARACT_E:^ HEAO> J_:= CH_AR A C J E R J _ _ 6 6 <R">UMD H E AO > ::= I D E N T I F I E R S P E C I F I C A T I 0N> ( "67 < IDF"NT I E l ER f P E C l F I C A T 10N> FTS <I IV~.\'f I F I ER> 6-3 I < I D E N T ! F I E R L I S T > < I D ENT I F I E P > ) 6 9 <"I D'E N T T ^ TE"R" I T ST > "fTS ( 7 0 I <1D E N T I F 1 E R L ! S T > < [ 0 E N T [ F I E R > , "Ti < 1 N T T I \L L I s T > rr^ <TfiIT I A!. H = AO> < CON'S T AN T> ) 72__ < I N I T I AL_HE_\D> : : =_ I N I T I A L ( _ _ ?» " ' " " 1 ~"<iNTTTAT'"n^ """ ------ - — 7 A < A S S I G N M E N T > :: = <VARIAP,I.E> < R E 0 L A C E > < E xup. E S S I C N> 61 75 76 7 7 <RFPL.\CE> : : = <LEFT PART> : :--I <L EFT P\RT> < A S S I G N ,M F M T > <VAR[ABLE> , 1-i 7 9 a o 81 <EX PRESS I'N> <LOGICAL FACTOP> <EXPRESS !0N> | <l.n6ICAI. F A C T 0 P. > <LOGICAL FACTOR> = O.OCICAL SECOMf)ARY> I <LOGICAL FAC'miO I <t.OGICAL SECOMDARY> 8 2 8 3 <L0GIC4l Si":CCNDARY> ::= <L0GICAL PR I MAR. Y> I -. <LOGICAL PR I M AR Y > 8't . <LOGICAL PR IM ARY > <ST RI MG EXPRESSION> 85 . . | <ST P. IMG EXPRESSION <RE I. AT I O N <STK INC. E X P P F SSI O N 8 6 8 7 8 8 <R EIA T 10N> I! V A 8 9 9 0 91 I ~> ~ | -i < 1 -• > 9?. 93 1 < = 1 > = ' ' 94 95 < S T R I N G EXPRESSION ::= < AR I TH'-IET I C EXPRESSION ' 1 < STRING EXPRESS I0N> II <AR I T l-l NET I C EXPRESSIONS 9 6 9 7 9 8 < AR ITH IE TIC EXPRESSION <T E R M > i < ART Tli"'ET I C EXPRESSION; + < T E R '•! > 1 • <ARITHMETIC EXPRE SSICN> - <TERM> 99 100 | + < r •; '•' > I - <TER'-'> 101 10 2 10 3 <TERM> : : = I ___ . I . < PP. I M AR Y> < T £ R N * < P •'< 1 M A R Y > <TERN / <PRIMARY> 104 I <TERM> MOQ <PR 1MARY> 10 5 <PRIMARY> : := <COMSTANT> 10 6 107 I <VARIABLE> 1 ( <EXPR£SSION> ) * 'ion*" 109 <CONSTANT> ::= <STRI\G> 1 <NIJMBER> 110 111 <VARI A a L F.> <I 0 E NTI F IC K > 1 < S'JB SCR I PT HE AO < EX PRC SS I O N ) 11.2 113 <SOBSCRIPT H E A D > - < i V \ T ; ^ ' : . ! > i I < Sua SCRIPT HE A0> <EXPPESSK>>!> , CPU T [ vrj ijs-0 W \S 0. 9_1 SECONDS. TOTAL T I'' £ I S 0 .9 T 5~F 0 0 Nf) S. " 62 REFERENCES 1. MacDonald Dettwiler and Associates, "B.C. Telephone System B u i l d e r " , User's Manual, October 4, 1969. 2. Herbert E. Pike, J r . , "Process Control Software", Proc. IEEE, V o l . 58, pp. 87-97, January 1970. 3. W.M. McKeeman, -J.J. Horning, D.B. Wortman, A 'Compiler Generator, P r e n t i c e - H a l l Inc., 19 70. 4. Peter Naur, "Revised Report on the Algorithmic Language A l g o l 60", Communications of the ACM, Vol. 6, pp. 1-17, January 1963. 5. D.E. Knuth, "On the Trans l a t i o n of Languages from L e f t to Right", Information and Control, V o l . 8, pp. 607-639, 1965. 6. F r a n k l i n L. DeRemer, "Simple LR(k) Grammars", Communications of the ACM, Vol. 14, pp. 453-460, July 1971. 7. C.J. Shaw, "JOVIAL - A Programming Language for Real-time Command Systems", Annual Review i n Automatic Programming, 1963. 8. C. Gordon B e l l , A l l e n Newell, Computer Structures: Readings and  Examples, McGraw-Hill, 1971, p. 9. 9. J. Madderom, "User's Manual - A Translator Writing System f o r M i n i -computers", UBC, 1973. 10. Jean-Pierre Levy, "Automatic Correction of Syntax Errors i n Program-ming Languages", Ph.D. Thesis, C o r n e l l U n i v e r s i t y , December 1971. 11. A.J. Pieke, "Design and Implementation of an Int e r a c t i v e Graphics Language for Computer Aided Design", Masters Thesis, U.B.C, June 1973. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0101550/manifest

Comment

Related Items