Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Construction of LR(k) parsers with application to Algol 68 Ramer, David Robert 1973

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1973_A6_7 R34_4.pdf [ 6.91MB ]
Metadata
JSON: 831-1.0052015.json
JSON-LD: 831-1.0052015-ld.json
RDF/XML (Pretty): 831-1.0052015-rdf.xml
RDF/JSON: 831-1.0052015-rdf.json
Turtle: 831-1.0052015-turtle.txt
N-Triples: 831-1.0052015-rdf-ntriples.txt
Original Record: 831-1.0052015-source.json
Full Text
831-1.0052015-fulltext.txt
Citation
831-1.0052015.ris

Full Text

r CONSTRUCTION OF LR (k) PARSERS WITH APPLICATION TO ALGOL 68 by DAVID ROBERT RAMER B.Sc., U n i v e r s i t y of B r i t i s h Columbia, 1971 A t h e s i s submitted i n p a r t i a l f u l f i l l m e n t of tohe requirements f o r the degree of Master of Science i n the Department of Computer Science we accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1973 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia Vancouver 8, Canada Date A 1 1 g,i St 9 , 1973 i i ABSTRACT The purpose of t h i s t h e s i s i s to r e p o r t on the study, implementation arad use of the LR (k) p a r s i n g technique with e r r o r r e c o v e r y as a p p l i e d to the computer programming language ALGOL 68. The LR (k) p a r s i n g technique i s a powerful method of automatic c o n s t r u c t i o n of p a r s e r s f o r context f r e e grammars. The methodology provided by De Remer i s implemented f o r a l l c l a s s e s of LR (k) grammars. The p r a c t i c a l implementation of the t r a n s l a t o r l i m i t s k to 15. The n e c e s s i t y of having e r r o r recovery f o r any p a r s i n g technique cannot be o v e r s t a t e d . An e r r o r r e c o v e r y technique i s provided and demonstrated f o r ALGOL 68. T h i s technique uses the p a r s i n g t r a n s l a t o r to a i d i n the d e c i s i o n process to determine the t r a n s f o r m a t i o n from a s t r i n g of symbols not i n the language, to a s t r i n g of symbols i n the language. The LR (k) p a r s i n g technique and the e r r o r r e c o v e r y technique are a p p l i e d to ALGOL 68 and prove t o be p r a c t i c a l techniques i n the c o n s t r u c t i o n of compilers f o r computer programming languages. i i i TABLE OF CONTENTS INTRODUCTION ..... 1 CHAPTER 1 - THEORY AND IMPLEMENTATION .4 1.1 N o t a t i o n and D e f i n i t i o n s .4 1.2 LR (k) Grammars 7 1.3 C h a r a c t e r i s t i c F i n i t e S t a t e Machines ..............9 1.4 LR (0) Grammars and DPDA's .........................13 1.5 SLR (k) Grammars and Lookahead .....................21 1.6 LALR (k) Grammars 42 1.7 LR (k) Grammars and S t a t e - s p l i t t i n g ................46 1.8 Empty P r o d u c t i o n s ................................. 52 CHAPTER 2 - PARSER GENERATOR IMPLEMENTATION ............... 55 2.1 The Par s e r ........................................55 2.2 Further O p t i m i z a t i o n s 58 2.3 Input of CFG *...60 2.4 Output .<> .........6 3 CHAPTER 3 - ERROR RECOVERY 64 3.1 Some Ideas 64 3.2 E r r o r Recovery Algorithm .......................... 3.3 R e s u l t s Obtained .....75 CHAPTER 4 - ALGOL 68 IMPLEMENTATION .......................83 4.1 The Grammar .......................................83 4.2 The Compiler ............86 CONCLUSION ,....,...,90 BIBLIOGRAPHY . . ... .,,...,...,.,......,91 APPENDIX ., .......92 i v LIST OF FIGURES F i g u r e s 1.1 Example of a CFSM ........................12 1.2 CFSM f o r LR{0) grammar 18 1.3 DPDA f o r LR(0) grammar ..................19 1.4 B a s i s Sets of CFSM f o r SLR(1) grammar ........... 26 1.5 Symbol T r a n s i t i o n s of f i g u r e 1.4 ................ 28 1.6 Nonterminal T r a n s i t i o n L i s t s of f i g u r e 1.4 ......29 1.7 Lookahead symbol s e t f o r s t a t e 14 ............... 30 1.8 Symbol t a b l e entry f o r s t a t e 14 of DPDA .........30 1.9 Lookahead (k=1) symbol s e t f o r s t a t e 24 .........31 1.10 Lookahead (k=2) symbol s e t f o r s t a t e 24 .........31 1.11 DPDA f o r SLR{2) grammar ..32 1.12 Bas i s Sets of CFSM f o r LALR (2) grammar ..........36 1.13 Symbol T r a n s i t i o n s o f f i g u r e 1.12 ............... 39 1.14 Nonterminal T r a n s i t i o n L i s t s o f f i g u r e 1.12 .....40 1.15 Lookahead symbol s e t f o r s t a t e 19 . . . . . . . . » . 4 1 1.16 Lookahead symbol s e t f o r s t a t e 21 ...............41 1.17 Lookahead symbol s e t f o r s t a t e 24 ...............41 1.18 S t a t e Source L i s t s of f i g u r e 1.12 ...............44 1.19 Lookahead symbol s e t f o r s t a t e 19 ...............45 1.20 Lookahead symbol s e t f o r s t a t e 21 ...............45 1.21 Lookahead symbol s e t f o r s t a t e 24 ...............45 1.22 CFSM of LR(1) grammar ,,.,,48 1.23 S t a t e Source L i s t s of f i g u r e 1.22 .........49 1.24 Lookahead (k=1) symbol s e t f o r s t a t e 6 ..........49 1.25 Lookahead (k=2) symbol s e t f o r s t a t e 6 ..........49 1.26 M o d i f i e d CFSM of LR{1) grammar .................. 50 1.27 S t a t e Source L i s t s of f i g u r e 1.26 ...,,,..,,....,51 1.28 Lookahead symbol s e t f o r s t a t e 6 ................52 1.29 Lookahead symbol s e t f o r s t a t e 18 ...............52 1.30 CFSM f o r SLR(1) grammar , 53 1.31 DPDA of f i g u r e 1.30 54 3.1 F1 and F2 f o r ALGOL 68 implementation ....,.,..,,71 3.2 E v a l u a t i o n r e s u l t s of 1st s y n t a c t i c e r r o r .,...,.77 3.3 E v a l u a t i o n r e s u l t s of 2nd s y n t a c t i c e r r o r .......79 3.4 E v a l u a t i o n r e s u l t s of 3rd s y n t a c t i c e r r o r .....,,79 ACKNOWLEDGEMENT I would l i k e to thank my s u p e r v i s o r . Dr. J . E. L. Peck, f o r h i s a s s i s t a n c e i n t h i s r e s e a r c h . I am a l s o g r a t e f u l to the NRC f o r t h e i r f i n a n c i a l support f o r t h i s p r o j e c t . 1 INTRODUCTION In w r i t i n g c o m p i l e r s f o r computer programming languages, the w r i t e r i s f a c e d with the task of c o n s t r u c t i n g a computer program which w i l l parse any given sentence i n the language. At the same time i t must determine the semantics of t h a t sentence i n order t h a t i t may be executed. To f a c i l i t a t e t h i s , we may c o n s t r u c t a p a r s e r generator, i . e . , a computer program which, when given a c o n t e x t f r e e grammar (CFG) d e s c r i b i n g the language, generates a program and/or t a b l e s to perform a s y n t a c t i c parse. The type of context f r e e languages d i s c u s s e d here w i l l be r e s t r i c t e d to the c l a s s o f LR(k) languages d e s c r i b e d by LR(k) co n t e x t f r e e grammars. The method of performing a s y n t a c t i c parse of a sentence i n an LR (k) language i s known as an LR(k) p a r s i n g method (t e c h n i q u e ) . The p a r s i n g method used i s c l a s s e d as a bottom-up method as opposed to a top-down method. The time to parse a sentence i n the language i s l i n e a r l y bounded, i . e . , the time to parse any sentence i s d i r e c t l y p r o p o r t i o n a l to the l e n g t h of the sentence. Knuth [ K J f i r s t f o r m a l l y i n t r o d u c e d the n o t i o n of LR(k) grammars and De Remer [D] l a t e r presented a p r a c t i c a l c o n s t r u c t i o n technique of t r a n s l a t o r s f o r LR (k) grammars. The work r e p o r t e d h e r e i n i s based on t h a t done by De Remer. De Remer prov i d e s us with the methodology f o r the c o n s t r u c t i o n of t r a n s l a t o r s f o r LR (k) grammars with proof of 2 c o r r e c t n e s s of the methodology and demonstrates i t s use by a p p l i c a t i o n . De Remer c l a s s i f i e s the h i e r a r c h i a l complexity of LR(k) grammars and p r o v i d e s t h i s h i e r a r c h y with names. From s i m p l e s t t o the most complex they are LR (0) , SLR (k) or simple LR (k) , LALR(k) or lookahead LR(k) and LR(k) . As the complexity of a grammar i s d i s c o v e r e d , so too i s the complexity of the computation and c o n s t r u c t i o n of the t r a n s l a t o r i n c r e a s e d . De Remer shows us a method by which the t r a n s l a t o r under c o n s t r u c t i o n i s adapted to the complexity of the grammar. However, the method terminates i n f a i l u r e i f the grammar i s found not to be LR(k), e.g., the grammar i s ambiguous. In our a p p l i c a t i o n of De Remer's methodology, the c l a s s of grammars i s extended to i n c l u d e those with empty pr o d u c t i o n s . De Remer pr o v i d e s a model of the complete t r a n s l a t o r as a t a b l e - d r i v e n i n t e r p r e t e r though we have not made use of a l l the f e a t u r e s of t h i s model and r e p l a c e d them with some of our own. One m o d i f i c a t i o n a l l o w s an i n c r e a s e i n speed where a d e c i s i o n i s to be made on one symbol which i s a member of a l a r g e s e t of symbols. The r e s u l t a n t t r a n s l a t o r model i s enhanced with a powerful e r r o r recovery technique provided by the author. E r r o r r e c o v e r y i s necessary when the t r a n s l a t o r encounters a s t r i n g of symbols not i n the language. I t uses the t r a n s l a t o r and weight t a b l e s provided by the w r i t e r t o transform a s t r i n g of symbols not i n the language i n t o a s t r i n g of symbols i n the language. T h e o r e t i c a l l y , there i s no l i m i t to the s i z e of k f o r an LR(k) grammar. However, the p r a c t i c a l l i m i t f o r k i n the 3 t r a n s l a t o r model i s 15 though t h i s l i m i t may be i n c r e a s e d by m o d i f i c a t i o n of the t r a n s l a t o r . In Chapter 1, the theory and implementation of the parse r generator i s d i s c u s s e d . The complexity of the parser generator i n c r e a s e s as the types o f LR (k) grammars are d i s c u s s e d from the s i m p l e s t t o the more complex. The b a s i c model of the parser i s presented with examples. In Chapter 2, the p r a c t i c a l use of the pa r s e r generator i s d i s c u s s e d and the p a r s i n g a l g o r i t h m i s presented as a t a b l e lookup i n t e r p r e t a t i o n of the t a b l e s generated by the parse r generator. Chapter 3 i s a d i s c u s s i o n of the e r r o r recovery technique with examples taken from the U n i v e r s i t y of B r i t i s h Columbia ALGOL 68 compiler implementation. Chapter 4 c o n t a i n s a d e s c r i p t i o n of the U n i v e r s i t y of B r i t i s h Columbia ALGOL 68 compiler and how the use of the LR (k) p a r s i n g technique i s r e f l e c t e d i n the design and s t r u c t u r e of the compiler. The LALR(3) grammar given i n the Appendix i s d i s c u s s e d i n r e l a t i o n to the ALGOL 68 Report [R] and the LR (k) p a r s i n g technique. 4 CHAPTER 1 THEORY AND IMPLEMENTATION In t h i s chapter we review n o t a t i o n and d e f i n i t i o n s , d e s c r i b e the n o t i o n of an LR (k) grammar, generate a CFSM ( c h a r a c t e r i s t i c f i n i t e s t a t e machine) and d i s c u s s the t r a n s f o r m a t i o n s t h a t need be a p p l i e d to the CFSM as the complexity of the grammar i s d i s c o v e r e d . 1.1 Notation and D e f i n i t i o n s I t i s assumed the reader i s f a m i l i a r with the p r o p e r t i e s of symbols, s t r i n g s of symbols, languages, f i n i t e s t a t e machines (FSM) and d e t e r m i n i s t i c pushdown automata (DPDA). I f the reader wishes to f a m i l i a r i z e h i m s e l f with these n o t i o n s he may r e f e r to Hopcroft and Ullman [H S O ] , A c o n t e x t - f r e e grammar (CFG) i s a quadruple (N,T,P,S) where N i s a f i n i t e s e t of nonterminal symbols, T i s a f i n i t e s e t of t e r m i n a l symbols, P i s a f i n i t e s e t of p r o d u c t i o n s and S i s the s t a r t symbol, a member of H. A production i s w r i t t e n A -> w where A i s i n N and w i s a s t r i n g i n V* where V = N U T. The s e t V* denotes the s e t of a l l s t r i n g s i n V, i n c l u d i n g the empty s t r i n g . Without l o s s o f g e n e r a l i t y the p r o d u c t i o n s are a r b i t r a r i l y 5 numbered from 1 to s and the f i r s t p r o d u c t i o n i s of the form S -> \ S* 4 where S» i s a s u b o r d i n a t e s t a r t i n g symbol and S, and occur nowhere e l s e i n P. In the f o l l o w i n g , l a t i n c a p i t a l s denote nonterminals, lower case L a t i n l e t t e r s (a-d) and d i g i t s denote t e r m i n a l s , lower case L a t i n l e t t e r s (u-z) and d i g i t s denote s t r i n g s and lower case L a t i n l e t t e r e denotes the empty s t r i n g . In the examples, however the author has taken the l i b e r t y to denote both t e r m i n a l s and nonterminals by L a t i n c a p i t a l s , lower case L a t i n l e t t e r s and d i g i t s , where i t i s c l e a r from c o n t e x t which are t e r m i n a l s and which are nonterminals. Given a p r o d u c t i o n A -> w, an immediate d e r i v a t i o n of a s t r i n g Z = u w v from Z* = u A v i s w r i t t e n Z * -> Z. A d e r i v a t i o n , w r i t t e n Z' ->* Z, i s the t r a n s i t i v e completion of an immediate d e r i v a t i o n where t h e r e e x i s t s t r i n g s vO, v1, ... , vn such t h a t Z« = vO -> v1 -> ... -> vn = Z f o r n >= 0. The c a n o n i c a l d e r i v a t i o n i s the r i g h t d e r i v a t i o n , w r i t t e n Z* ->*R Z, i n which f o r i = 1,2, ... n each v i has an immediate d e r i v a t i o n from v i - 1 v i a a p p l i c a t i o n of a p r o d u c t i o n to the rightmost nonterminal i n v i - 1 . The language L (G) generated by the grammar G i s d e f i n e d as { w | w i s i n T* and S ->* w }. A s e n t e n t i a l form i s a s t r i n g w of t e r m i n a l s and nonterminals such that S ->* w. . G i s unambiguous i f and only i f each s e n t e n t i a l form has a unique c a n o n i c a l d e r i v a t i o n . A c a n o n i c a l form i s a r i g h t s e n t e n t i a l form which i s any s t r i n g c a n o n i c a l l y d e r i v a b l e from S. 6 The parse of a s t r i n g of symbols i s the r e c o r d of how the s t r i n g was d e r i v e d . The c a n o n i c a l parse of a s e n t e n t i a l form i s the r e v e r s e of the sequence of productions used i n a c a n o n i c a l d e r i v a t i o n of the s e n t e n t i a l form, De Remer d e f i n e s c h a r a c t e r i s t i c s t r i n g s as f o l l o w s : d e f i n e a s p e c i a l s e t of symbols {#1,#2, ... ,#s} not i n V such t h a t #1 i s a s s o c i a t e d with p r o d u c t i o n 1, #2 with 2, e t c . L e t the p'th pr o d u c t i o n be A -> w, and l e t Z' = u A v and Z = u w v be c a n o n i c a l forms such t h a t t h e r e e x i s t s a c a n o n i c a l d e r i v a t i o n S ->*R Z' ->R Z. Then uwfp i s d e f i n e d to be a c h a r a c t e r i s t i c s t r i n g of Z. The st a c k s t r i n g of uwfp i s aw, and v i s c a l l e d an in p u t s t r i n g of Z. A c h a r a c t e r i s t i c s t r i n g o f Z "... i n d i c a t e s t h a t there e x i s t s a c a n o n i c a l d e r i v a t i o n of Z i n which i t i s immediately preceded by another form Z* which can be formed as f o l l o w s : remove from the end of the stack s t r i n g uw the s u b s t r i n g w which matches the r i g h t p a r t of pro d u c t i o n p, r e p l a c e w with the l e f t p a r t of Z, and concatenate the r e s u l t with the i n p u t s t r i n g v. »• He then r e f e r t o w -> Z as a r e d u c t i o n . The c a n o n i c a l p a r s e r s t a r t s with w i n L(G) and i t e r a t i v e l y determines a c h a r a c t e r i s t i c s t r i n g of the c u r r e n t c a n o n i c a l form, outputs the p r o d u c t i o n i n d i c a t e d by the l a s t symbol of t h a t c h a r a c t e r i s t i c s t r i n g , makes the corresponding r e d u c t i o n , and steops when the new c a n o n i c a l form i s Z=S. 7 1.2 LR (k) Grammars The parse o f a s t r i n g of symbols i s performed by a pa r s e r . I t i s convenient' at t h i s time to have a model d e s c r i p t i o n of the parser t h a t we are attempting to c o n s t r u c t . The model i s of a DPDA which c o n s i s t s of an i n t e r p r e t e r , a pushdown stack and a se t of t a b l e s from which the i n t e r p r e t e r performs t a b l e lookup. The DPDA i s c o n s t r u c t e d from a c h a r a c t e r i s t i c f i n i t e s t a t e machine (CFSM) which i s a machine which t r a n s l a t e s from the context f r e e language G to another language, e.g., the emission of a sequence of pr o d u c t i o n numbers which denote the parse of a s t r i n g of symbols i n the language. The CFSM i s c o n s t r u c t e d by s t r i n g manipulation of the CFG. De Remer shows t h a t the s e t of c h a r a c t e r i s t i c s t r i n g s of a CFG i s a r e g u l a r language and t h a t we can c o n s t r u c t a FSM f o r t h a t language. The FSM i s then used to determine c h a r a c t e r i s t i c s t r i n g s of c a n o n i c a l forms. Knuth [ K ] d e f i n e s a grammar to be LR (k) i f and only i f the c a n o n i c a l d e r i v a t i o n o f a s e n t e n t i a l form i s always uniquely determined by the s t r i n g to i t s l e f t (the c h a r a c t e r i s t i c s t r i n g ) and the k t e r m i n a l c h a r a c t e r s t o i t s r i g h t . I f each p a r s i n g d e c i s i o n r e q u i r e d k symbol look ahead, the s i z e of the parser (tables) would be immense f o r moderate LR (k) grammars f o r k g r e a t e r than 1. F o r t u n a t e l y , f o r reasonable grammars, l e s s than k symbol look ahead i s r e q u i r e d f o r most parser d e c i s i o n s , though the f u l l k symbol look ahead i s r e q u i r e d f o r o t h e r s . Thus, c o n s i d e r a b l e s a v i n g s i n storage can be gained by t a k i n g 8 advantage of t h i s f a c t . For our p a r s e r model, the k symbol lookahead w i l l be computed as a sequence of one or more lookahead s t a t e s i n the DPDA. Each lookahead s t a t e w i l l be executed i n t u r n l o o k i n g one symbol f u r t h e r down the i n p u t s t r i n g each time. Each lookahead s t a t e w i l l use a s e t of symbol t r a n s i t i o n s c a l l e d a lookahead symbol s e t . At most k lookahead s t a t e s w i l l be executed, but k lookahead s t a t e s need not always be executed, e.g., only look ahead as f a r as i t i s necessary t o determine the context. The complexity of a grammar may be c l a s s i f i e d i n t o one of the f o l l o w i n g : a) LR (0) , (b) SLR (k) or simple LR (k), <c) LALR(k) or lookahead LR (k), and LR (k). De Remer makes a f u r t h e r d i s t i n c t i o n here by d e s c r i b i n g a s u b c l a s s LmR(k) where m stands f o r the amount o f l e f t context t o be c o n s i d e r e d . However, t h i s c l a s s may be c o n s i d e r e d a member of the g e n e r a l LR (k) c l a s s . The o r d e r i n g a l s o r e p r e s e n t s the degree of d i f f i c u l t y (from e a s i e s t to most* d i f f i c u l t ) to generate a parser f o r that grammar. The most i n t e r e s t i n g and most f r e q u e n t l y used grammars are those which are e i t h e r SLR (k) or LALR (k). Grammars which are LR (k) have a higher c o s t i n space which may be too high a p r i c e to pay over an SLR (k) or LALR (k) grammar f o r the same language. The nature of each of the above grammars may be s t a t e d as f o l l o w s : a) LR (0) grammars r e q u i r e no lookahead, b) SLR(k) grammars r e q u i r e k symbol lookahead, the computation of the lookahead symbol s e t s i s q u i t e easy, c) LALR (k) grammars r e q u i r e 9 k symbol lookahead, the computation of the lookahead symbol s e t s i s c omplicated (ionly the l e f t c o n t e x t may be considered) , and d) LR(k) grammars are l i k e LALR (k) grammars with the f u r t h e r c o m p l i c a t i o n t h a t the number of s t a t e s i n the parser must be i n c r e a s e d to p r o v i d e f u r t h e r p a r t i t i o n i n g of l e f t c o ntext (De Remer r e f e r s to t h i s process as s t a t e - s p l i t t i n g ) . 1.3 C h a r a c t e r i s t i c F i n i t e S t a t e Machine (CFSM) De Remer d e f i n e s a CFSM of a CF grammar G as "a reduced, d e t e r m i n i s t i c FSM which r e c o g n i z e s the s e t of c h a r a c t e r i s t i c s t r i n g s of G". The c o n s t r u c t i o n of the CFSM i s b a s i c to the four major c a t e g o r i e s of grammars (LR(0), SLR (k) , LALR (k) and LR (k) ) . Each s t a t e of the CFSM i s d e s c r i b e d by a c o n f i g u r a t i o n s e t which i s a set of ( p o s s i b l y empty) c o n f i g u r a t i o n s . The s t a t e s of the CFSM are l i n k e d together by t r a n s i t i o n s under symbols w i t h i n t h e i r c o n f i g u r a t i o n s e t s . A c o n f i g u r a t i o n i s a prod u c t i o n i n P with a s p e c i a l marker (a dot) i n i t s r i g h t p a r t . The i n i t i a l c o n f i g u r a t i o n s e t (synonymous with the s t a r t i n g s t a t e of the CFSM) i s {S ->. |- S* -J} and has the name 1. Each c o n f i g u r a t i o n s e t i s the union of a b a s i s s e t and a c l o s u r e s e t . The b a s i s s e t i s {A -> w u.w* | t h i s s et i s the successor of a c o n f i g u r a t i o n s e t c o n t a i n i n g {A -> w.u w*} where A -> w u w• i s i n P) ( i . e . , there i s a t r a n s i t i o n under V from s t a t e 1 to s t a t e 2 and there are no other t r a n s i t i o n s under a J- i n t o s t a t e 2 nor are there any t r a n s i t i o n s from s t a t e 1 under a (- to a s t a t e 10 other than s t a t e 2 ) . The b a s i s s e t s form a p a r t i t i o n i n g over the CFSH as there w i l l be no two b a s i s s e t s the same (each b a s i s s e t i s unique and the CFSH i s , t h e r e f o r e , reduced). The c l o s u r e set i s {A ->.w | & -> w i s i n P and there e x i s t s a c o n f i g u r a t i o n with a marker before the A i n e i t h e r the b a s i s s e t or the c l o s u r e s e t } . I f the b a s i s s e t does not c o n t a i n any c o n f i g u r a t i o n s with the dot preceding a nonterminal then the c l o s u r e s e t i s empty. Each c o n f i g u r a t i o n s e t (other than the empty c o n f i g u r a t i o n set) has one or more successor c o n f i g u r a t i o n s e t s , In any c o n f i g u r a t i o n s e t , c o n s i d e r those c o n f i g u r a t i o n s where the dot precedes the same symbol. These c o n f i g u r a t i o n s w i l l have a unique t r a n s i t i o n under t h a t symbol to another c o n f i g u r a t i o n s e t i n which the b a s i s s e t c o n s i s t s of those c o n f i g u r a t i o n s with the dot moved one symbol to the r i g h t (e.g., the succe s s o r to S -> .J- S 1 | i s S -> J-.S* -J). I f the dot f o l l o w s a l l the symbols on the r i g h t hand s i d e then the su c c e s s o r i s the empty s e t under the t r a n s i t i o n o f the # symbol f o r t h a t p r o d u c t i o n . The f symbol t r a n s i t i o n i s to a s t a t e which determines what to do next once the r e d u c t i o n i s complete. In the a c t u a l c o n s t r u c t i o n of the CFSM, the s t a t e t r a n s i t i o n w i l l be l e f t out u n t i l such time as the CFSM i s converted to a DPDA. The s t a t e s of the CFSM can be c l a s s i f i e d as follows:; a) a read s t a t e i s a s t a t e with t r a n s i t i o n s only under symbols i n V, b) a reduce s t a t e i s a s t a t e with a t r a n s i t i o n under one # symbol only, and c) an inadequate s t a t e i s a s t a t e with more 11 than one t r a n s i t i o n under # symbols or a s t a t e with a t l e a s t one t r a n s i t i o n under a # symbol and one or more t r a n s i t i o n s under t e r m i n a l symbols. The followilng grammar generates the CFSM i n F i g u r e 1.1: (1) S -> A E B (2) E -> T (3) E -> E + T (4) T -> P (5) T -> T * P (6) P -> I (7) P -> OPEN E CLOSE. Note t h a t s t a t e s 1, 2, 1; 5, 6, 8, 9 and 11 are read s t a t e s , s t a t e s 3, 10, 12 and 14 are reduce s t a t e s , and s t a t e s 7 and 13 are inadequate s t a t e s . 12 I T \ STATE| V-CONFIGURATION SET _ , — : — | — _ — ;. 1 | S -> mh E B 2 I P -> -I P -> .OPEN E CLOSE | S -> A. E B | E -> . E + T | T -> .P | E "> . T I T "> . T * P 3 I P -> I. 1 I P- > . I J P -> .OPEN E CLOSE | P -> OPEN .E CLOSE | E ~> . E + T | T -> .P j E -> .T I T -> .T * P 5 I E -> E.+ T I S -> A E.B 6 I T -> P.. 7 I T -> T. * P I E -> T. 8 | E -> E. + T I P "> OPEN E.CLOSE 9 I P -> . I | P -> .OPEN E CLOSE ( T -> . P J E -> E + .T I T -> . T * P 10 | S -> A E B. 11 | P -> -I< P -> .OPEN E CLOSE T "> T *.P 1 2 | P -> OPEN E CLOSE. 1 3 | T -> T. * P I E -> E + T. 14 | T "> T * P. | SYMBOL A I OPEN E P T #6 I OPEN E P T + B #4 * #2 + CLOSE I OPEN P T #1 I OPEN P #7 * # 3 #5 STATE 3 4 5 6 7 3 4 8 6 7 9 10 11 9 12 3 4 6 1 3 3 4 14 11 F i g u r e 1 . 1 13 1.4 LR(O) Grammars and DPDA's The CFSM f o r an LR(0) grammar does not c o n t a i n any inadequate s t a t e s ; thus, a l l the i n f o r m a t i o n r e g u i r e d to parse a s t r i n g of symbols i n the language i s present i n the CFSM. The m o d i f i c a t i o n of the CFSM to a d e t e r m i n i s t i c push down automaton (DPDA) i s the same f o r a l l the c l a s s e s of LR (k) grammars. The DPDA i s an i n t e r p r e t e r , a pushdown stack and a s e t of t a b l e s from which the i n t e r p r e t e r performs t a b l e lookup. The i n t e r p r e t e r i s given as an a l g o r i t h m i n Chapter 2. I t i s the c o n s t r u c t i o n of the t a b l e s t h a t we are concerned with here. The t a b l e s must r e p r e s e n t the FSM and, t h e r e f o r e , r e p r e s e n t the s t a t e s and the symbol t r a n s i t i o n s between those s t a t e s . I t i s q u i t e n a t u r a l , t h e r e f o r e , t o pl a c e the s t a t e s i n one t a b l e (the s t a t e table) and the symbol t r a n s i t i o n s i n another t a b l e (the symbol t a b l e ) . The number of s t a t e s w i t h i n the DPDA w i l l be l a r g e r than w i t h i n the CFSM as s t a t e s are c o n s t r u c t e d to perform s p e c i a l a c t i o n s . For example, the reduce s t a t e w i t h i n the CFSM i s s p l i t i n t o two s t a t e s i n the DPDA, a pop s t a t e and a lookback s t a t e and f o r a p a r t i c u l a r p r o d u c t i o n the lookback s t a t e may or may not be generated. The pushdown stack r e p r e s e n t s a f i n i t e memory of the parse of a s t r i n g o f symbols. When we read a symbol, we place the s t a t e name of the s t a t e t h a t read the symbol on the top of the pushdown st a c k . When we perform a r e d u c t i o n (by a pop s t a t e ) , some of the s t a t e names on the top of the pushdown stack may be popped o f f (the number of names popped i s equal to one l e s s than 1tt the number of symbols on the r i g h t hand s i d e of the pr o d u c t i o n reduced). The pro d u c t i o n number i s output and the process of lookback i s then performed. Lookback i s the process by which the l e f t c o ntext i s considered i n order to determine what to do next. However, we do not need t o parse the l e f t c o n t e x t again, as the pushdown stack c o n t a i n s the h i s t o r y o f the parse t o t h i s p o i n t . T h i s r e d u c t i o n mechanism d i f f e r s from the c a n o n i c a l parse i n that we do not p l a c e the nonterminal (on the l e f t hand s i d e of the prod u c t i o n being reduced) a t the head of the i n p u t s t r i n g and read i t . Observe the c o n f i g u r a t i o n s e t f o r s t a t e 2 of f i g u r e 1 . 1 . In t h i s s t a t e t h e r e i s a t r a n s i t i o n under " I " to s t a t e 3 (e.g., " I " i s read i n s t a t e 2 and 2 i s placed on the top of the s t a c k ) . In s t a t e 3, pr o d u c t i o n number 6 i s output but no s t a t e names have been popped from the stack. The s t a t e name on the top of the stack ( s t a t e 2) i s the s t a t e i n which the f i r s t symbol ("I") on the r i g h t hand s i d e of the pr o d u c t i o n (P -> I) was read. I t i s a l s o the s t a t e i n which there i s a nonterminal t r a n s i t i o n (under; "P") f o r the nonterminal on the l e f t hand s i d e of the p r o d u c t i o n being reduced (production number 6 ) . , we f o l l o w the r e s u l t i n g nonterminal t r a n s i t i o n and continue i n t h a t s t a t e ( s t a t e 6) . Thus, from the above d i s c u s s i o n , i t i s apparent that the s t a t e s of the CFSM each have an e q u i v a l e n t s e t of one or more s t a t e s w i t h i n the DPDA. The s t a t e s w i t h i n the DPDA are c l a s s i f i e d i n t o types as are the s t a t e s w i t h i n the CFSM. The 15 CFSM c o n t a i n s read, reduce and inadequate s t a t e s which have e q u i v a l e n t s e t s of s t a t e s {read}, {pop and lookback or stop} and {lookahead, pop and/or read}, r e s p e c t i v e l y , w i t h i n the DPDA. We w i l l now c o n s i d e r these types of s t a t e s w i t h i n the DPDA and the f u n c t i o n they are t o perform. The f i r s t type i s the read s t a t e which has two forms: the f i r s t form takes the symbol a t the head of the s t r i n g and does a s e q u e n t i a l search through the symbol t a b l e f o r a match, whereas, the second form uses the same symbol as an index i n t o the symbol t a b l e f o r a match. In the f i r s t form, a match c o n s i s t s of a p a i r (the symbol and the s t a t e name t r a n s i t i o n ) and i n the second form, the match c o n s i s t s o n l y o f the s t a t e name t r a n s i t i o n . The two types of read s t a t e s allow us to economize on the amount o f space used i n the symbol t a b l e and to improve the speed of the parser. In both cases, the symbol t a b l e e n t r i e s are t e r m i n a l symbols and/or s t a t e name t r a n s i t i o n s of those t e r m i n a l symbols i n the c o n f i g u r a t i o n s e t f o r t h a t s t a t e . The nonterminal symbols and t h e i r s t a t e name t r a n s i t i o n s do not appear w i t h i n the read s t a t e t a b l e e n t r i e s . The read indexed s t a t e r e q u i r e s an o r d e r i n g be performed on the t e r m i n a l symbols (the present implementation performs an a l p h a b e t i c a l o r d e r i n g ) . The read s t a t e c o n s i s t s of 3 p i e c e s of i n f o r m a t i o n : a) read ( s e q u e n t i a l or indexed) i n s t r u c t i o n code, b) number of symbols th a t may be read by t h i s s t a t e , and c) an O f f s e t i n t o the symbol t a b l e where the t a b l e e n t r i e s s t a r t . The second type of s t a t e i s the pop s t a t e . The pop s t a t e 16 c o n s i s t s of three p i e c e s of i n f o r m a t i o n : a) pop i n s t r u c t i o n code, b) the number of s t a t e names to be popped, and c) an o f f s e t i n t o the symbol t a b l e where a p a i r denotes i ) the pr o d u c t i o n number to be output and i i ) the s t a t e t r a n s i t i o n to be f o l l o w e d a f t e r the r e d u c t i o n has been performed. The s t a t e t r a n s i t i o n may be to any one of a stop s t a t e , a read s t a t e , a pop s t a t e , a lookback s t a t e , a push s t a t e or a lookahead s t a t e . To decide what the s t a t e t r a n s i t i o n i s to be i s e q u i v a l e n t to d e c i d i n g whether or not a lookback s t a t e i s necessary f o r a l l those p r o d u c t i o n s which have the same nonterminal on the l e f t hand s i d e . To understand what i s i n v o l v e d , i t i s convenient to c o n s t r u c t a t a b l e f o r each nonterminal. Each t a b l e w i l l c o n s i s t of two columns of e n t r i e s : the f i r s t column i s f o r the s t a t e names where the t r a n s i t i o n o r i g i n a t e s and the second column i s f o r the s t a t e names where the matching t r a n s i t i o n s terminate. On o b s e r v a t i o n of these t a b l e s , we f i n d t h a t each t a b l e has one of the f o l l o w i n g c h a r a c t e r i s t i c s : i ) the t a b l e i s empty (only f o r the g o a l symbol S ) , i i ) a l l t r a n s i t i o n s terminate i n the same s t a t e (column 2 e n t r i e s are a l l the same), and i i i ) the t r a n s i t i o n s terminate i n more than one s t a t e and at l e a s t one s t a t e i n column 2 has as many as or more t r a n s i t i o n s to i t than any o t h e r s t a t e i n column 2. In the f i r s t two cases i ) and i i ) above, a lookback s t a t e need not be c r e a t e d as i n i ) the s t a t e t r a n s i t i o n i s to the stop s t a t e and i n i i ) the s t a t e t r a n s i t i o n i s to the one and onl y 17 s t a t e appearing i n the second column. In i i i ) , a lookback s t a t e must be c r e a t e d and s t a t e t r a n s i t i o n s generated to i t from a l l pop s t a t e s f o r productions with t h a t nonterminal on the l e f t hand s i d e . The lookback s t a t e c o n s i s t s of two pi e c e s of i n f o r m a t i o n : a) the lookback i n s t r u c t i o n code and an o f f s e t i n t o the symbol t a b l e where the t a b l e e n t r i e s c o n s t r u c t e d above are pl a c e d . The s i z e o f the t a b l e e n t r i e s may be reduced by the f o l l o w i n g technique: 1) f i n d the s t a t e with the most t r a n s i t i o n s i n t o i t , 2) remove from the t a b l e a l l t r a n s i t i o n s i n t o t h a t s t a t e , and 3) p l a c e the s t a t e name i n the second column a t the end of the t a b l e with a 0 pl a c e d i n column 1 adj a c e n t . The lookback s t a t e i s executed by t a k i n g the s t a t e name on the top of the pushdown stack and s e a r c h i n g the t a b l e e n t r i e s f o r a, match i n the f i r s t column or u n t i l the 0 i s reached; the adjacent e n t r y c o n t a i n s the s t a t e t r a n s i t i o n to be taken. Once the lookback a l g o r i t h m has been executed f o r each nonterminal, the nonterminal t r a n s i t i o n s may be d e l e t e d from the CFSM. The stop s t a t e i s onl y entered a f t e r p r o d u c t i o n number 1 (S -> J- S* i) i s output and simply terminates the parse. The f o l l o w i n g L'R-(O) grammar has the CFSM i n f i g u r e 1.2 and the s t a t e and symbol t a b l e s i n F i g u r e 1.3: (1) S -> START E STOP (2) E -> A AA (3) E -> B BB (4) AA -> C AA 18 (5) AA -> D (6) BB -> C BB (7) BB -> D. The t e r m i n a l s have been assigned the f o l l o w i n g o r d e r i n g : ( 1 ) A, ( 2 ) B, ( 3 ) C, (4) D, (5) START and (6) STOP. — i i 1 STATE| CONFIGURATION SET j SYMBOL 1 i STATE | i 1 | S -> .START E STOP | START T i 2 \ 2 E -> .& AA I A 3 1 E -> .B BB | B | * 1 S -> S T A B L E STOP J E 1 5 1 3 AA -> .C AA I c s 6 | AA -> . D t D j 7 J E -> A.AA | AA 8 1 4 BB -> .C BB I c J 9 1 BB -> .D I D 1 1 0 | E -> B. BB ) BB I 11 J 5 S -> START E.STOP I STOP 1 1 2 | 6 AA -> .C AA I c J 6 I AA -> .D I D 1 7 I AA -> C.AA | AA 1 1 3 | 7 AA -> D. I #5 i — I 8 E -> A AA. | # 2 i | 9 BB -> ,C BB I c J 9 I BB -> .D I D j 10 | BB -> C.BB | BB i 14 | 1 0 BB -> D. I #7 i | 11 E -> B BB. | #3 ! | 1 2 S -> START E STOP. f #1 1 I 13 AA -> C AA. ] #4 ! I 14 BB -> C BB. I #6 j — j _ _ j - _ F i g u r e 1 . 2 19 STATE TABLE SYMBOL TABLE TYPE " I T 1 | NUMBER]OFFSET | |—. : +-4— -j • 5 1 IREAD | 1 | 1 2 |READ INDEXED | 2 ! 2 3 ) READ INDEXED | 2 I 6 4 |READ INDEXED J 2 I 10 5 (READ | 1 I 14 6 |READ INDEXED | 2 I 6 7 J POP | 0 | 19 8 | POP | 1 I 20 9 |READ INDEXED | 2 I 10 10 | POP | 0 I 21 11 J POP | 1 ! 22 12 | POP | 3 I 23 13 | POP | 1 | 24 14 | POP | 1 I 25 15 |LOOKBACK | 0 I 15 16 |LOOKBACK | o I 17 17 |STOP J 0 I 0 J | SYMBOL|STATE f y— + 1 1 |START | 2 2 |0 | 3 3 I 4 1 0 4 10 | 0 5 1 o | 0 6 10 ] 0 7 |0 1 6 8 17 | 0 9 1 o J 0 10 10 1 0 11 10 | 9 12 1 10 | 0 13 |0 | 0 14 |STOP | 12 15 16 I 13 16 |0 | 8 17 |9 i 14 18 ]0 | 11 19 15 1 15 20 12 | 5 21 17 1 16 22 13 1 5 23 11 I 17 24 |4 I 15 25 16 | 16 i 1— F i g u r e 1.3 From the CFSM i n f i g u r e 1.2 we observe that s t a t e s 1, 2, 3, 4, 5, 6 and 9 are read s t a t e s . There are e q u i v a l e n t read s t a t e s i n the DPDA i n f i g u r e 1.3. . S t a t e 1 i s a s e q u e n t i a l read s t a t e which reads one symbol s t a r t i n g from o f f s e t 1 of the symbol t a b l e . The symbol t a b l e entry c o n s i s t s of "START" and a s t a t e name t r a n s i t i o n to s t a t e 2. I f a match i s not found then the i n p u t s t r i n g i s not a s t r i n g i n the language. S t a t e 2 i s a read indexed s t a t e which reads one of a p o s s i b l e 2 symbols s t a r t i n g from o f f s e t 2 of the symbol t a b l e . In the symbol t a b l e , the e n t r i e s f o r the read indexed s t a t e s have been compressed i n t o adjacent columns. For s t a t e 2, the symbol t a b l e e n t r i e s are 20 s t a t e names 3, 4, 0, 0, 0 and 0, These s t a t e names are accessed by the t e r m i n a l symbol at the head of the i n p u t s t r i n g using i t s e q u i v a l e n t numeric valu e , e.g., "A" has a numeric value o f 1 and accesses the s t a t e name 3. I f a 0 i s accessed then the s t r i n g i s not a s t r i n g i n the language. S t a t e s 7, 8, 10, 11, 12, 13 and 14 are reduce s t a t e s i n the CFSM and i n the DPDA they have e q u i v a l e n t s e t s of s t a t e s {pop and lookback ( s t a t e 15) }, {pop}, {pop and lookback ( s t a t e 16) }, {P°P}# (P°P and s t o p ( s t a t e 17)}/ {pop and lookback (state 15)} and {pop and l o o k b a c k ( s t a t e 16)}, r e s p e c t i v e l y . Thus, three s t a t e s have been added t o the DPDA, two lookback s t a t e s and a stop s t a t e . The nonterminals i n the grammar are S, E, AA and BB. Observation of the CFSM shows t h a t there are no t r a n s i t i o n s under the nonterminal S and/ t h e r e f o r e , the s t a t e t r a n s i t i o n i n the DPDA from s t a t e 12 i s to a stop s t a t e ( s t a t e 17). Now c o n s t r u c t a t a b l e as d e s c r i b e d above f o r the nonterminal "E". i " — T H T | COLUMN 1 | COLUMN 2 | 1 2 t~5 \ L . J j _ . J I t i s c l e a r from t h i s t a b l e that s t a t e s w i t h i n the DPDA which output p r o d u c t i o n numbers 2 and 3 w i l l have s t a t e t r a n s i t i o n s to s t a t e 5 ( i . e . , no lookback i s n e c e s s a r y ) . However, the t a b l e f o r the nonterminal "AA" i n d i c a t e s that s p e c i a l a t t e n t i o n must be made f o r s t a t e s which output p r o d u c t i o n numbers 4 and 5. 21 i • — T T | COLUMN 1 | COLUMN 2 | | 3 | 8 | | 6 | 13 | I : i L i J I f a 3 i s on the top of the pushdown stack, then the t r a n s i t i o n i s to s t a t e 8 and i f a 6 i s on the top of the pushdown stack, the t r a n s i t i o n i s to s t a t e 13. A lookback s t a t e i s generated with symbol t a b l e e n t r i e s s t a r t i n g a t o f f s e t 15. S i m i l a r l y , the lookback s t a t e 16 i s generated f o r s t a t e s which output production numbers 6 and 7 ( i . e . , the nonterminal BB i s on the l e f t hand side) .. 1.5 SLR(k) Grammars and Lookahead In SLR (K) , LALR (K) and LR (K) grammars (k > 0) , one or more s t a t e s w i t h i n the CFSM are inadequate. The inadequate s t a t e s w i t h i n the CFSM w i l l each be transformed i n t o a lookahead s t a t e i n the DPDA. The lookahead s t a t e w i l l have s t a t e t r a n s i t i o n s to any of a new lookahead s t a t e ; a pop s t a t e or a read s t a t e . To perform the symbol lookahead,- we must f i r s t compute the complete s e t of t e r m i n a l symbols which may occur at the head of the in p u t s t r i n g and a s s o c i a t e with each of these symbols the a c t i o n which i s to take plac e ( i . e . , read the symbol or perform a r e d u c t i o n ) . Observation of the s e t of symbols may show a c l a s h of a c t i o n s under the same t e r m i n a l symbol; i n t h i s case, a s t a t e t r a n s i t i o n to a new lookahead s t a t e i s c o n s t r u c t e d i n which t h i s new lookahead s t a t e w i l l look one symbol f u r t h e r down the s t r i n g under a new s e t of t e r m i n a l symbols. Thus, the inadequate s t a t e 22 may become a l a r g e t r e e of lookahead s t a t e s with a p o s s i b l y high branching f a c t o r . SLR (k) grammars r e q u i r e the l e a s t amount of work i n computing the lookahead s e t s . To compute a lookahead s e t , we c o n s t r u c t a t a b l e which c o n s i s t s of 3 columns: the f i r s t column i s f o r the t e r m i n a l symbols, the second i s f o r a s t a t e name and the t h i r d i s f o r the a c t i o n (read or reduce pr o d u c t i o n "number"). E n t r i e s w i l l now be made i n t o the t a b l e using the f o l l o w i n g a l g o r i t h m : Step 1 : f o r each t e r m i n a l symbol i n the inadequate s t a t e under c o n s i d e r a t i o n p l a c e thfe symbol i n column 1 , the s t a t e t r a n s i t i o n under t h a t t e r m i n a l symbol i n column 2 and read a c t i o n i n column 3. Step 2: f o r ieach of the r e d u c t i o n s i n the inadequate s t a t e do step 3. Step 3: compute a l i s t of s t a t e names of a l l those s t a t e s which have a t r a n s i t i o n i n t o them under the nonterminal on the l e f t hand s i d e of the pr o d u c t i o n being reduced (these l i s t s are more e a s i l y computed durin g the c o n s t r u c t i o n o f the CFSH). For each one of these s t a t e s perform s t e p 4. Step 4:s i f t h i s s t a t e has been v i s i t e d before under the same a c t i o n ( r e d u c t i o n of pr o d u c t i o n "number") then r e t u r n ; otherwise, mark t h i s s t a t e "as having been v i s i t e d under the r e d u c t i o n of pr o d u c t i o n "number" and f o r each t e r m i n a l symbol i n t h i s s t a t e perform step 5 and f o r each r e d u c t i o n perform step 6. 23 Step 5: p l a c e the t e r m i n a l symbol i n column 1, the s t a t e t r a n s i t i o n under t h a t t e r m i n a l i n column 2 and the a c t i o n (reduce p r o d u c t i o n "number") i n column 3. Step 6: c o n s t r u c t a l i s t as i n step 3 f o r the nonterminal on the l e f t hand s i d e of the p r o d u c t i o n being reduced. For each of these s t a t e s perform step 4. The r e s u l t i n g t a b l e must now be s o r t e d : Step 7: s o r t the t a b l e by column 1 such that a l l e n t r i e s under the same t e r m i n a l occur i n c o n s e c u t i v e order. T h i s p a r t i t i o n s the t a b l e i n t o as many s e t s as there are d i f f e r e n t symbols i n column 1 of the t a b l e . Step 8: f o r each s e t , s o r t by column 2 and then column: 3 such that the s t a t e names occur i n i n c r e a s i n g order. T h i s p a r t i t i o n s the s e t i n t o s u b s e t s . I f there are d u p l i c a t e e n t r i e s i n the subsets then only one need be r e t a i n e d . From the t a b l e c o n s t r u c t e d above, we may now determine a) what the lookahead symbol s e t i s , b) whether or not f u r t h e r lookahead computation i s r e q u i r e d , and p o s s i b l y c) whether or not the grammar i s SLR (k). Each of a ) , b) and c) above are r e s o l v e d as f o l l o w s : Step 9: f o r each s e t of e n t r i e s under a t e r m i n a l symbol do step 10. Step 10: p l a c e the symbol i n the symbol t a b l e with the s t a t e t r a n s i t i o n to one of a) a read s t a t e i f the a c t i o n i s read f o r a l l e n t r i e s i n column 3, b) a reduce s t a t e f o r p r o d u c t i o n "number" i f a l l e n t r i e s i n column 3 are 24 reduce p r o d u c t i o n "number" where "number" i s the same, and otherwise do s t e p 11. Step 11: f o r those e n t r i e s i n column 2, i f t h e r e i s more than one entry with the same s t a t e name and the a c t i o n s i n column 3 d i f f e r , then the grammar i s not SLR (k) and the a l g o r i t h m f o r SLR (k) lookahead f a i l s . However, i f t h e r e are no such c l a s h e s but t h e r e are a t l e a s t two d i f f e r e n t ^ c t i o n s i n column 3 under t h i s t e r m i n a l symbol, then f u r t h e r lookahead computation must be performed. A new lookahead s t a t e w i l l be c r e a t e d to perform lookahead one symbol f u r t h e r down the s t r i n g and a t r a n s i t i o n t o i t under that t e r m i n a l symbol i s placed i n the symbol t a b l e . To compute the lookahead symbol s e t , the above a l g o r i t h m s (step 3 to step 11) must be executed using column 2 e n t r i e s as those s t a t e s i n which computation commences and the column 3 e n t r i e s as the a s s o c i a t e d a c t i o n . I f the SLR (k) a l g o r i t h m was s u c c e s s f u l f o r a l l the inadequate s t a t e s then the CFSH may now be converted to a DPDA. Many LALR (k) and LR (k) grammars have CFSM's with inadequate s t a t e s whose lookahead symbol s e t s may be computed using the SLR (k) a l g o r i t h m , but the s t a t e which has a c l a s h must use a more r e f i n e d lookahead computation technique. The f o l l o w i n g examples demonstrate the a c t i o n of the SLR(k) symbol lookahead s e t computation. The f i r s t example i s of an SLR (2) grammar with the f o l l o w i n g p r o d u c t i o n s : (1) PROGRAM -> START CLAUSE STOP 25 2) CLAUSE -> OPEN SERIES CLOSE 3) SERIES -> DECLLIST GOON UNITSERIES 4) DECLLIST -> DECL 5) DECLLIST -> DECLLIST COMMADECL 6) DECL -> DECLARER IDENLIST 7) DECLARER -> REAL 8) DECLARER -> INT 9) DECLARER -> OPEN UNIT CLOSE DECLARER 10) DECLARER -> PROC DECLARER 11) IDENLIST -> IDEN 12) IDENLIST -> IDENLIST COMMA IDEN 13) UNITSERIES -> UNIT 14) UNITSERIES -> UNITSERIES GOON UNIT 15) UNIT -> ASSIGNATION 16) UNIT -> FORMULA 17) UNIT -> PRIMARY 18) ASSIGNATION -> IDEN BECOMES UNIT 19) FORMULA -> PRIMARY OP PRIMARY 20) FORMULA -> FORMULA OP PRIMARY 21) PRIMARY -> IDEN 22) PRIMARY -> PRIMARY CLAUSE 23) PRIMARY -> CLAUSE. T h i s grammar c o n t a i n s s y n t a c t i c c o n s t r u c t i o n s used i n some computer programming languages. The b a s i s s e t s of the CFSM are given i n f i g u r e 1.4 and the t e r m i n a l and nonterminal symbol t r a n s i t i o n s are giv e n i n f i g u r e 1.5. S t a t e s 14, 17, 18, 24, 33, 37 and 38 are inadequate. 26 | STATE| I • + BASIS SET 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 PROGRAM -> .START CLAUSE STOP PROGRAM -> START.CLAUSE STOP CLAUSE -> OPEN.SERIES CLOSE PROGRAM -> START CLAUSE.STOP DECLARER -> INT. DECLARER -> OPEN.UNIT CLOSE DECLARER DECLARER -> PROC.DECLARER DECLARER -> REAL. DECLLIST -> DECL. DECLLIST -> DECLLIST.COMMA DECL SERIES -> DECLLIST.GOON UNITSERIES DECL -> DECLARER.IDENLIST CLAUSE -> OPEN SERIES.CLOSE PROGRAM -> START CLAUSE STOP. ASSIGNATION -> IDEN.BECOMES UNIT PRIMARY -> IDEN. UNIT -> ASSIGNATION. PRIMARY -> CLAUSE. FORMULA -> FORMULA.OP PRIMARY UNIT -> FORMULA. FORMULA -> PRIMARY.OP PRIMARY PRIMARY -> PRIMARY.CLAUSE UNIT -> PRIMARY. DECLARER -> OPEN UNIT.CLOSE DECLARER DECLARER -> PROC DECLARER. DECLLIST -> DECLLIST COMMA.DECL SERIES -> DECLLIST GOON.UNITSERIES F i g u r e 1.4 continued on next page 1 T 1 K " 1  | STATE J BASIS SET 1 23 | IDENLIST -> IDEN. | 24 | IDENLIST -> IDENLIST.COMMA IDEN | | DECL -> DECLARER IDENLIST. | 25 | CLAUSE -> OPEN SERIES CLOSE. | 26 | ASSIGNATION -> IDEN BECOMES.UNIT | 27 J FORMULA -> FORMULA OP.PRIMARY | 28 | FORMULA -> PRIMARY OP.PRIMARY | 29 | PRIMARY -> PRIMARY CLAUSE. | 30 | DECLARER -> OPEN UNIT CLOSE.DECLARER | 31 | DECLLIST -> DECLLIST COMMA DECL. | 32 | UNITSERIES -> UNIT. | 33 | UNITSERIES -> UNITSERIES.GOON UNIT | | SERIES -> DECLLIST GOON UNITSERIES. | 34 | IDENLIST -> IDENLIST COMMA.IDEN I 35 | ASSIGNATION -> IDEN BECOMES UNIT. | 36 | PRIMARY -> IDEN. | 37 | PRIMARY -> PRIMARY.CLAUSE \ | FORMULA -> FORMULA OP PRIMARY. I 38 | PRIMARY -> PRIMARY.CLAUSE I | FORMULA -> PRIMARY OP PRIMARY. | 39 | DECLARER -> OPEN UNIT CLOSE DECLARER | 40 | UNITSERIES -> UNITSERIES GOON.UNIT | 41 | IDENLIST -> IDENLIST COMMA IDEN. | 42 | UNITSERIES -> UNITSERIES GOON UNIT. F i g u r e 1.4 28 [ S T A T E I SYMBOL T R A N S I T I O N S ] 1 1 | START £2) | | 2 | OPEN (3) C L A U S E (4) , | | 2 | I N T (5) OPEN (6) PROC (7) R E A L (8) D E C L (9) D E C L L I S T (10) j | I D E C L A R E R ( 1 1 ) S E R I E S (12) | | 4 j S T O P (13) | J 5 l #8 | | 6 | I D E N (14.) OPEN (3) A S S I G N A T I O N (15) C L A U S E (16) | J | FORMULA (17) P R I MARX{18) U N I T ( 1 9 ) | | 7 | I N T (5) OPEN (6) PROC (7) R E A L (8) D E C L A R E R (20) | J 8 | #7 | I 9 | #4 | | 10 | COMMA ( 2 1 ) GOON (22) I | 11 | I D E N (23) I D E N L I S T ( 2 4 ) I | 12 | C L O S E (2 5 ) | | 13 | #1 | | 14 | BECOMES (26) #21 | | 15 | #15 1 | 16 J #23 ! I 17 j OP (27) #16 | | 18 \ OP (28) OPEN (3) C L A U S E (29) #17 I j 19 | C L O S E (;30) | | 20 | #10 I | 21 j I N T (5) OPEN (6) PROC (7) R E A L (8) D E C L ( 3 1 ) | | | D E C L A R E R (11) | | 22 | I D E N (14) OPEN (3) A S S I G N A T I O N (15) C L A U S E (16) | | | F O R M U L A ( 1 7 ) P R I M A R Y ( 1 8 ) U N I T (32) U N I T S E R I E S ( 3 3 ) | | 23 | #11 | | 24 | COMMA (34) #6 J l 25 | #2 I | 26 | I D E N (14) OPEN (3) A S S I G N A T I O N (15) C L A U S E (16) | | | FORMULA (17) PRIMARY (18) U N I T (35) | \ 27 | I D E N ( 3 6 ) O P E N ( 3 ) C L A U S E ( 1 6 ) P R I M A R Y ( 3 7 ) | \ 28 J I D E N (36) OPEN (3) C L A U S E (16) PRIMARY (38) I | 29 J #22 | J 30 | I N T (5) OPEN (6) PROC (7) R E A L (8) D E C L A R E R (39) | | 31 | #5 | | 32 | #13 | | 33 | GOON (40) #3 | J 34 | I D E N (41) | J 35 | #18 | | 36 | #21 I | 37 | OPEN (3) C L A U S E (29) #20 | | 38 | OPEN (3) C L A U S E ( 2 9 ) #19 | | 39 | #9 | | 40 | I D E N (14) OPEN (3) A S S I G N A T I O N (15) C L A U S E (16) | | | FORMULA (17) PRIMARY (18) U N I T (42) | | 41 J #12 | | 42 | #14 | F i g u r e 1.5 29 From the CFSM we o b t a i n the l i s t s of nonterminal t r a n s i t i o n s . The l i s t s are given i n t a b u l a r form i n f i g u r e 1.6. NONTERMINAL HAS TRANSITIONS TO STATES ASSIGNATION CLAUSE DECL DECLLIST DECLARER FORMULA IDENLIST PRIMARY, PROGRAM SERIES UNIT UNITSERIES 15 4 16 29 9 31 10 11 20 39 17 24 18 37 38 12 19 32 35 42 33 I F i g u r e 1.6 Using the above a l g o r i t h m and the nonterminal t r a n s i t i o n s from f i g u r e 1.6, we can compute the lookahead symbol s e t s . Consider the inadequate s t a t e 14 and c o n s t r u c t the lookahead symbol s e t as f o l l o w s : p l a c e "BECOMES" i n column 1, s t a t e name 26 i n column 2 and read i n column 3 (a t a b l e e n t r y w i l l be s t a t e d as {BECOMES, 26, READ}). Now, f o r p r o d u c t i o n number 21, c o n s i d e r the n o n t e r m i n a l ("PRIMARY") t r a n s i t i o n s 18, 37 and 38. From s t a t e 18, we add {OP, 28, REDUCE 21}, {OPEN, 3, REDUCE 21} and reduce p r o d u c t i o n number 17. Here we c o n s i d e r the nonterminal ("UNIT") t r a n s i t i o n s 19, 32, 35 and 42. State 19 y i e l d s {CLOSE, 30, REDUCE 21}. We continue i n t h i s manner and o b t a i n the t a b l e i n f i g u r e 1.7 with 3 e n t r i e s f o r {OPEN, 3, REDUCE 21}, of which 2 are removed. We would have had other d u p l i c a t e e n t r i e s i f i t were not f o r f i r s t checking i n step 4 to see i f we have a l r e a d y v i s i t e d a s t a t e under a c t i o n "REDUCE 21". 30 | SYMBOL | NEXT | ACTION BECOMES | 26 | READ CLOSE J 25 | REDOCE 21 CLOSE | 30 | REDUCE 21 GOON | 40 | REDUCE 21 OP | 27 J REDUCE 21 OP j 28 J REDUCE 21 OPEN | 3 | REDUCE 21 Figur e 1.7 lookahead symbol s e t f o r s t a t e 14 The m u l t i p l e e n t r i e s f o r "CLOSE" and "OP" do not cause any d i f f i c u l t y as there are no c l a s h e s i n the a c t i o n column. The r e s u l t i n g symbol t a b l e e n t r i e s w i l l appear as i n f i g u r e 1.8. S i m i l a r symbol s e t s are computed f o r inadequate s t a t e s 17, 18, 33, 37 and 38. However, inadequate s t a t e 24 r e q u i r e d f u r t h e r lookahead symbol s e t computation. The f i r s t s e t computed i s given i n f i g u r e 1.9. The comma symbol does not gi v e s u f f i c i e n t i n f o r m a t i o n to determine the d i r e c t i o n of the parse. Using the "NEXT" column e n t r i e s , the second lookahead symbol s e t i s computed. As we see from f i g u r e 1.10, 2 symbol lookahead i s s u f f i c i e n t to determine the d i r e c t i o n of the parse. | SYMBOL | STATE TRANSITION | BECOMES J TO A READ STATE WHICH READS BECOMES I CLOSE | TO A REDUCE STATE WHICH REDUCES 21 | GOON | " | OP | " | OPEN | " i ; i i : : . Fi g u r e 1.8 symbol t a b l e e n t r y f o r s t a t e 14 Despite the f a c t t h a t the grammar i s SLR (2), we can see tha t the amount of symbol lookahead need not always be as gr e a t as k f o r an SLR (=k) grammar (where k i s g r e a t e r than 1). The complete DPDA i s giv e n i n f i g u r e 1.11. [ SYMBOL MNEXT ]~~ACTION ] | COMMA | 21 | REDUCE 6 | | COMMA | 34 | READ | | GOON | 22 | REDUCE 6 | Fig u r e 1.9 lookahead (k=1) symbol s e t f o r s t a t e 24 [ SYMBOL | NEXT | ACTION '| | IDEN | 41 I READ | | 1ST | 5 | REDUCE 6 | | OPEN | 6 J REDUCE 6 | | PROC | 7 | REDUCE 6 | | REAL | 8 | REDUCE 6 | i ; \ . i _ i L — : i F i g u r e 1.10 lookahead (k=2) symbol s e t f o r s t a t e 24 32 STATE TABLE SYMBOL TABLE — - , — —T———" 1 1 | TYPE | NUMBER jOFFSET | l J : J 1 SYMBOL|STATE | 1 | BEAD T | 1 1 • 1 1 30 | 1 r T | BECOMES 44 | 2 ] HEAD | 1 1 31 | 2 |CLOSE | 36 | 3 | BEAD | 4 I 32 | 3 |GOON | 36 | 4 {BEAD | 1 I 36 | 4 10P | 36 j 5 | POP | 0 I 70 | 5 |0PEN | 36 | 6 | BEAD I 2 I 37 | 6 JCLOSE | 57 | 7 | READ I 4 I 32 | 7 JGOON J 57 | 8 | POP I o I 71 | 8 JOP | 45 | 9 | POP I o I 72 | 9 \CLOSE J 58 | 10 | READ i 2 I 39 | 10 |GOON ] 58 | 11 | READ I 1 I 41 | 11 |OP | 46 I 12 | READ I 1 I 42 I 12 |OPEN } 46 | 13 | POP I 3 I 73 | 13 |COMMA | 43 1 14 |LOOKAHEAD 1 | 5 I 1 I 14 |GOON | 59 | 15 | POP i o 1 74 j 15 |IDEN | 47 | 16 | POP I o 1 75 J 16 J INT | 59 J 17 |LOOKAHEAD 1 I 3 1 6 | 17 JOPEN | 59 | 18 |LOOKAHEAD 1 I 4 1 9 | 18 JPROC | 59 | 19 | READ | 1 1 ^7 | 19 |REAL 1 59 | 20 | POP I 1 1 78 | 20 JCLOSE j 60 i 21 | READ 1 4 1 32 | 21 \GOON | 48 | 22 | READ 1 2 1 37 | 22 |CLOSE I 61 | 23 | POP 1 o 1 79 | 23 |GOON J 61 \ 24 |LOOKAHEAD 1 1 2 1 13 } 24 |OP | 61 I 25 | POP 1 2 I 81 | 25 |OPEN | 49 | 26 | READ 1 2 I 37 | 26 |CLOSE | 62 f 27 IREAD 1 2 I "9 I 27 |GOON | 62 | 28 IREAD 1 2 I 49 | 28 |OP J 62 | 29 | POP | 1 I 82 | 29 J OPEN | 50 | 30 | READ I 4 I 32 f 30 |START J 2 I 31 | POP I 2 I 83 | 31 | OPEN | 3 | 32 | POP I o I 84 | 32 | INT f 5 I 33 | LOOKAHEAD 1 I 2 I 20 | 33 | OPEN \ 6 I 34 | READ J 1 I 52 | 34 |PROC | 7 I 35 | POP I 2 I 86 | 35 | REAL | 8 I 36 | POP I o | 87 | 36 |STOP | 13 | 37 |LOOKAHEAD 1 | 4 | 22 } 37 |IDEN J 14 | 38 |LOOKAHEAD 1 I 4 I 26 | 38 |OPEN | 3 I 39 | POP I 3 I 90 | 39 |COMMA j 21 I 40 J READ I 2 | 37 | 40 |GOON J 22 | 41 | POP | 2 I 91 | 41 JIDEN | 23 | 42 | POP ! 2 I 92 | 42 |CLOSE | 25 | 43 ILOOKAHEAD 2 I 5 I 15 | 43 |BECOMES 26 | 44 | READ J 1 I 43 | 44 |OP | 27 J 45 | READ I 1 I 44 | 45 |OP 1 28 | 46 | READ I 2 I 45 | 46 |OPEN | 3 I 47 (READ I 1 I 48 | 47 | CLOSE J 30 | 48 | READ | 1 I 51 | 48 |COMMA 1 34 | i i 1 F i g u r e 1.11 continued on next page 33 I * 1 T — ~ 1 1 TYPE | NUMBER|OFFSET | 49 | READ j 1 | 31 | 50 | READ I 1 I 31 I 51 |LOOKBACK | 0 | 53 | 52 1 LOOKBACK t 0 | 58 | 53 |LOOKBACK I o | 60 | 54 1 LOOKBACK I 0 | 63 | 55 | STOP I 0 | 0 I 56 | LOOKBACK I 0 | 66 | 57 |POP I 0 | 76 | 58 JPOP I o i 77 | 59 |POP I 1 I 80 | 60 |POP I 2 l 85 | 61 |POP I 2 | 88 | 62 1 POP I 2 I 89 J L I,, — — i —i — i — i - — II- — — L. ... -X-. L_- ' j J SYMBOL|STATE| h ~ + 1 49 | IDEN 36 50 |OPEN 3 51 | GOON 40 52 | IDEN 41 53 12 4 54 |46 | 29 55 |49 29 56 |50 29 57 |0 16 58 121 I 31 59 I o 9 60 |7 20 61 | 30 39 62 |0 11 63 | 27 37 64 128 1 38 65 |0 18 66 |22 32 67 126 35 68 |40 42 69 JO | 19 70 I 8 53 71 |7 I 53 72 1 *» 10 73 1 1 55 74 |15 56 75 | 23 | 54 76 I 16 56 77 I 17 56 78 I 10 53 79 I 11 24 80 (6 52 81 |2 51 82 | 22 54 83 |5 10 84 I 13 33 85 13 12 86 |18 15 87 121 54 88 | 20 17 89 | 19 17 90 19 53 91 1 12 24 92 1 14 33 Fi g u r e 1.11 34 The second example i s an LALR (2) grammar. We are using i t here to demonstrate the d e t e c t i o n of f a i l u r e of the SLR(k) a l g o r i t h m . The p r o d u c t i o n s of the grammar are as f o l l o w s : 1) PROGRAM -> START CLAUSE STOP 2) CLAUSE -> OPEN SERIES CLOSE 3) SERIES -> DECLLIST GOON UNITSERIES 4) DECLLIST -> DECL ' 5) DECLLIST -> DECLLIST COMMADECL 6) DECL -> DECLARER IDENLIST 7) DECLARER -> REAL 8) DECLARER -> INT 9) DECLARER -> OPEN UNIT CLOSE DECLARER 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 DECLARER -> PROC DECLARER IDENLIST -> IDEN IDENLIST -> IDENLIST COMMA IDEN UNITSERIES -> UNIT UNITSERIES -> UNITSERIES GOON UNIT UNIT -> ASSIGNATION UNIT -> FORMULA UNIT -> PRIMARY ASSIGNATION -> IDEN BECOMES UNIT FORMULA-> PRI01FORMULA FORMULA—> PRI02FORMULA FORMULA-> MONADICFORMULA PRI01FORMULA~> PRI010PERAND PRI010P PRI020PERAND PRI010PERAND-> PRI020PERAND PRI010PERAND-> PRIO1FORMULA 35 (25) PRI02F0RMULA-> PRI020PERAND PRI020P MONADICOPERAND (26) PRI020PERAND:-> PR 102 FOR MO LA (27) PRI020PERAND-> MONADICOPERAND (28) MONADICOPERAND-> PRIMARY (29) MONADICOPER*AND-> MONADICFORMDLA (30) MONADICFORM0LA-> MONADICOP MONADICOPERAND (31) PRIMARY -> IDEN (32) PRIMARY -> PRIMARY CLAUSE (33) PRIMARY -> CLAUSE. The b a s i s s e t s o f the CFSM are giv e n i n f i g u r e 1. 12, the symbol t r a n s i t i o n s are given i n f i g u r e 1.13 and the nonterminal t r a n s i t i o n s are given i n f i g u r e 1.14. S t a t e s 14, 19, 21, 22, 24, 25, 29, 37, 45 and 48 are inadequate. D e s p i t e the s i m i l a r i t y between t h i s grammar and the grammar i n the p r e v i o u s example, t h i s grammar i s not SLR (k). The SLR (k) a l g o r i t h m was s u c c e s s f u l f o r s t a t e s 14, 22, 25, 29, 37, 45 and 48, but f a i l e d f o r s t a t e s 19, 21 and 24. The lookahead symbol s e t s computed f o r s t a t e s 19, 21 and 24 are given i n f i g u r e s 1.15, 1.16 and 1.17, r e s p e c t i v e l y . 36 i ~ " i *•*— - - — — — — ^ |STATE| BASIS SET | PROGRAM -> .START CLAUSE STOP 1 2 3 4 5 6 7 8 9 10 11 12 13 14' 15 16 17 18 19 20 21 22 PROGRAM -> START.CLAUSE STOP CLAUSE -> OPEN.SERIES CLOSE PROGRAM -> START CLAUSE.STOP DECLARER -> INT. DECLARER -> OPEN.UNIT CLOSE DECLARER DECLARER -> PROC.DECLARER DECLARER -> REAL. DECLLIST -> DECL. DECL -> DECLARER.IDENLIST DECLLIST -> DECLLIST.COMMA DECL SERIES -> DECLLIST.GOON UNITSERIES CLAUSE -> OPEN SERIES.CLOSE PROGRAM -> START CLAUSE STOP. ASSIGNATION -> IDEN.BECOMES UNIT PRIMARY -> IDEN. MONADICFORMULA -> MONADICOP.MONADICOPERAND UNIT -> ASSIGNATION. PRIMARY -> CLAUSE. UNIT -> FORMULA. FORMULA -> MONADICFORMULA. MONADICOPERAND -> MONADICFORMULA. PRI020PERAND -> MONADICOPERAND. PRIMARY -> PRIMARY.CLAUSE UNIT -> PRIMARY. MONADICOPERAND -> PRIMARY. FORMULA -> PRI01FORMULA. PRI010PERAND -> PRI01FORMULA. F i g u r e 1.12 continued on next page STATET ' B A S I S S E T ~ ] 23 | PRIOIFOffMULA -> PR 101 OPERAND. PR 1 0 1 0 P P R I 0 2 0 P E R A N D | 24 | FORMULA -> P R I 0 2 F 0 R M U L A . \ | P R 1 0 2 O P E R A N D -> PR102FORMULA. j 25 | P R I 0 2 F 0 R M U L A -> P R I 0 2 0 P E R A N D . P R I 0 2 0 P MONADICOPERAND 1 | P R I 0 1 0 P E R A N D -> P R I 0 2 0 P E R A N D . I 26 | D E C L A R E R -> OPEN U N I T . C L O S E D E C L A R E R | 27 | DECLARER -> PROC D E C L A R E R . | 28 | I D E N L I S T -> I D E N . | 29 | I D E N L I S T -> IDENLIST.COMMA I D E N | | D E C L -> D E C L A R E R I D E N L I S T . | 30 | D E C L L I S T -> D E C L L I S T COMMA.DECL J 31 | S E R I E S -> D E C L L I S T G O O N . U N I T S E R I E S | 32 | C L A U S E -> OPEN S E R I E S C L O S E . | 33 | A S S I G N A T I O N -> I D E N B E C O M E S . U N I T I 34 | PRIMARY -> I D E N . - f 35 | MONADICOPERAND -> MONADICFORMULA. | 36 | MONADICFORMULA -> MONADICOP MONADICOPERAND., | 37 | PRIMARY -> P R I M A R Y . C L A U S E | | MONADICQPERAND -> PRIMARY. > | 3 8 | PRIMARY -> PRIMARY C L A U S E . | 39 | P R I 0 1 F O R M U L A -> P R I 0 1 0 P E R A N D P R I O 1 O P . P R I 0 2 0 P E R A N D | 40 | PRI02F0B?MULA -> P R I 0 2 0 P E R A N D P R I 0 2 0 P . MONADICOPERAND 1 41 | D E C L A R E R -> OPEN UNIT C L O S E . D E C L A R E R | 42 | I D E N L I S T -> I D E N L I S T COMMA.IDEN | 43 | D E C L L I S T -> D E C L L I S T COMMA D E C L . | 44 | U N I T S E R I E S -> U N I T . | J : 4 , I F i g u r e 1.12 continued on next page 38 [ S T A T E ! B A S I S S E T ~ ] 1 45 | UNITSERIES -> UNITSERIES.GOON UNIT I | | SERIES -> DECLLIST GOON UNITSERIES. | | 46 | ASSIGNATION -> IDEN BECOMES UNIT. | | 47 j PRI020PERAND -> PRI02FORMULA. | | 48 | PRI02FORMULA -> PRI020PERAND.PRI020P MONADICOPERAND J | | PRIOlFORMDLA -> PRI01OPERAND PRI010P PRI020PERAND. | l l ' 1 | 49 | PRI02FOR>MULA -> PRI020PERAND PRI020P MONADICOPERAND. | | 50 | DECLARER -> OPEN UNIT CLOSE DECLARER. | ] 51 | IDENLIST -> IDENLIST COMMA IDEN, | | 52 | UNITSERIES -> UNITSERIES GOON.UNIT | | 53 | UNITSERIES -> UNITSERIES GOON UNIT. | F i g u r e 1.12 39 I ' T~ |STATE| SYMBOL TRANSITIONS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 REAL (8) DECL (9) DECLARER (10) START (2) OPEN (3) CLAUSE (4) INT (5) OPEN (6) PROC (7) DECLLIST (11) SERIES (12) STOP(13) #8 IDEN(14) MONADICOP (15) OPEN (3) ASSIGNATION(16) CLAUSE(17) FORMULA (18) MONADICFORMULA (19) MONADICOPERAND(20) PRIMARY (21) PRIO 1 FORMULA (22) PRI010PERAND(23) PRI02FORMULA(24) PRI020PERAND (25) UNIT (26i) INT (5) OPEN (6) PROC (7) REAL (8) DECLARER (27) #7 #4 IDEN (28) IDENLIST(29) COMMA (30) GOON (31) CLOSE (32) #1 BECOMES (33) #31 IDEN (34i) MONADICOP (15) OPEN (3) CLAUSE (17) MONADICFORMULA(35) MONADICOPERAND (36) PRIMARY (37) #15 #33 #16^ #21 #29 #27 OPEN (3) #19 #24 PRI010P (39) #20 #26 PRI020P(40) CLOSE (41) #10 #11 COMMA (42) #6 INT (5) OPEN (6) PROC (7) REAL (8) DECL (43) DECLARER (10) IDEN(14s) MONADICOP (15) OPEN (3) ASSIGNATION (16) CLAUSE (17) FORMULA (18) MONADICFORMULA (19) MONADICOPERAND (20) PRIMARY (21) PRIO 1FORMULA(22) PRI010PERAND(23) PRI02FORMULA (24) PRI020PERAND (25) UNIT (44) UNITSERIES (45) #2 IDEN (14) MONADICOP (15) OPEN (3) ASSIGNATION(16) CLAUSE (17) FORMULA (18) MONADICFORMULA (19) MONADICOPERAND (20) PRIMARY (21) PRI01FORMULA (22) PRI010PERAND(23) PRI02FORMULA (24) PRI020PERAND (25) UNIT (46) #31 CLAUSE(38) #17 #28 #23 F i g u r e 1.13 continued on next page 40 | : T | STATE| J ~ -+ 35 36 37 38 39 40 SYMBOL TRANSITIONS 41 42 43 44 45 46 47 48 49 50 51 52 53 CLAUSE (38) #28 CLAUSE(17) PRIMARY (37) #29 #30 OPEN (3) #32 IDEN (34) MONADICOP(15) OPEN (3) MONADICFORMULA(35) MONADICOPERAND(20) PRI02FORMULA(47) PRI02OPERAND(48) IDEN (34) MONADICOP (15) OPEN (3) CLAUSE (17) MONADICFORMULA (35) MONADICOPERAND (49) PRIMARY(37) INT (5) OPEN (6) PROC (7) REAL (8) DECLARER (50) IDEN (51) #5 #13 GOON (52) #3 #18 #26 PRIO2OP(40) #22 #25 #9 #12 IDEN (14) MONADICOP (15) OPEN(3) ASSIGNATION(16) CLAUSE(17) FORMULA(18) MONADICFORMULA (19) MONADICOPERAND(20) PRIMARY (21) PRI01FORMULA(22) PRIO10PERAND(23) PRI02FORMULA (24) PRI020PERAND (25) UNIT (53) #14 F i g u r e 1.13 NONTERMINAL HAS TRANSITIONS TO STATES j. ; , = + _ ASSIGNATION I 16 CLAUSE t 4 17 38 DECL | 9 43 DECLARER I 10 27 5 DECLLIST I 11 FORMULA 1 18 IDENLIST | 29 MONADICFORMULA J 19 35 MONADICOPERAND | 20 36 4 PRIMARY I 21 37 PR101FORMULA | 22 PRIO10PERAND I 23 PRI02FORMULA | 24 47 PRI020PERAND I 25 48 PROGRAM j SERIES | 12 UNIT | 26 44 4 UNITSERIES t 45 53 i ; J F i g u r e 1.14 41 | SYMBOL | NEXT J ACTION J CLOSE | 32 | REDUCE 21 | CLOSE | 32 | REDUCE 29 | CLOSE | 41 | REDUCE 21 | CLOSE I 41 J REDUCE 29 | GOON | 52 | REDUCE 21 | GOON I 52 | REDUCE 29 | PRI010P | 39 J REDUCE 29 I PRI020P I 40 1 REDUCE 29 F i g u r e 1.15 lookahead symbol s e t f o r s t a I I e 19 |" SYMBOL j NEXT -T-I ACTION t 1 —i 1 r | CLOSE I 32 REDUCE 17 | CLOSE | 32 | REDUCE 28 1 | CLOSE I «M j REDUCE 17 1 l CLOSE I 41 | REDUCE 28 1 | GOON I 52 j REDUCE 17 1 | GOON I 52 | REDUCE 28 1 | OPEN | 3 | READ I } PRI010P | 39 | REDUCE 28 1 | PRI020P | 40 | REDUCE 28 1 -X- i F i g u r e 1.16 lookahead symbol s e t f o r s t a t e 21 I T | SYMBOL | NEXT CLOSE 1 32 CLOSE | 32 CLOSE j 41 CLOSE | 41 GOON | 52 GOON | 52 PRI010B | 39 PRI020P | 40 Fig u r e 1.17 lookahead symbol s e t f o r s t a t e | ACTION +-REDUCE 20 REDUCE 26 REDUCE 20 REDUCE 26 REDUCE 20 REDUCE 26 REDUCE 26 REDUCE 26 1 i I I i 24 From f i g u r e s 1.15, .1.16 and 1.17 we can observe a number of c l a s h e s i n each lookahead symbol s e t . A c l a s h occurs when, f o r the same t e r m i n a l symbol, there are two d i f f e r e n t a c t i o n s i n column 3 and, i n column 2, the next s t a t e s to commence computation of the next lookahead symbol s e t s are the same. I t 42 i s by the o b s e r v a t i o n of such c l a s h e s as these t h a t the SLR(k) a l g o r i t h m i s prevented from going i n t o an i n f i n i t e loop s e a r c h i n g f o r lookahead symbol s e t s of a grammar which i s not SLR (k) . 1.6 LALR (k) Grammars I f an inadequate s t a t e f a i l s t o be SLR(k), then we may attempt LALR(k) lookahead. The computation of the lookahead symbol s e t f o r an inadequate s t a t e i s s i m i l a r t o the SLR (k) a l g o r i t h m with the e x c e p t i o n t h a t we must c o n s i d e r the l e f t c ontext to determine which nonterminal t r a n s i t i o n s to c o n s i d e r f o r each of the r e d u c t i o n s wi t h i n the inadequate s t a t e s and r e d u c t i o n s encountered while t o u r i n g other s t a t e s . The ,CFSM i s modified such t h a t the s t a t e t r a n s i t i o n s may be f o l l o w e d i n both d i r e c t i o n s and such that we can "mark" a s t a t e when we v i s i t i t . The marking of a s t a t e i s e q u i v a l e n t to marking a path through the CFSM. The path or paths marked out w i l l only i n c l u d e the l e f t context and, thus, p a r t i t i o n the CFSM i n t o two s e t s of s t a t e s ; those t h a t we are i n t e r e s t e d i n and those we are not i n t e r e s t e d i n . Now, c o n s i d e r an inadequate s t a t e . The t e r m i n a l symbols w i t h i n the inadequate s t a t e are t r e a t e d the same as i n the SLR (k) a l g o r i t h m . However, f o r each r e d u c t i o n w i t h i n the inadequate s t a t e , we must f o l l o w the p r o d u c t i o n back through the CFSM to those s t a t e s i n the CFSM where the f i r s t symbol on the r i g h t hand s i d e of the p r o d u c t i o n i s read. As we f o l l o w the U3 production back, we "mark" each of the s t a t e s v i s i t e d as we go. For each s t a t e i n t h i s s e t , we then f o l l o w the nonterminal t r a n s i t i o n under the nonterminal symbol on the l e f t hand s i d e of the p r o d u c t i o n . These s t a t e t r a n s i t i o n s l e a d us to a s e t of s t a t e s which should be a subset of the s t a t e s c o n s i d e r e d i n the SLR(k) a l g o r i t h m . I f i t i s not, then the grammar i s not LALR (k) . For each s t a t e w i t h i n the subset we t r e a t t e r m i n a l symbols as i n the SLR(k) a l g o r i t h m , but the r e d u c t i o n s w i t h i n those s t a t e s must be tr a c e d back through the CFSM using the f o l l o w i n g r u l e s I f any s t a t e with a s t a t e t r a n s i t i o n l e a d i n g i n t o the c u r r e n t s t a t e has been "marked", then f o l l o w only those s t a t e t r a n s i t i o n s on which the s t a t e s have been marked; otherwise, f o l l o w and mark a l l s t a t e s with s t a t e t r a n s i t i o n s l e a d i n g i n t o the c u r r e n t s t a t e . The t a b l e c o n s t r u c t e d i s then t r e a t e d the same as i n s t e p s 7 through 11 with r e d u c t i o n s t r e a t e d as above. I f a c l a s h i s now found i n ste p 11, then the grammar i s not LALR (k). I f the LALR(k) a l g o r i t h m was s u c c e s s f u l f o r a l l the inadequate s t a t e s (where the SLR (k) a l g o r i t h m f a i l e d ) then the CFSM may now be converted t o a DPOA. The f o l l o w i n g example uses the same grammar as i n the prev i o u s example.. The CFSM i s the same, but a l i t t l e more i n f o r m a t i o n about the CFSM i s r e q u i r e d . We r e q u i r e f o r each s t a t e a l i s t of s t a t e s with t r a n s i t i o n s i n t o i t . Such i s given i n f i g u r e 1.18. 44 | STATEJ IS ENTERED FROM STATES | STATE| i t IS ENTERE 1 r 1 1 I - i +-1 27 | 10 ] 2 | 1 1 28 1 10 | 3 ] 2 6 15 21 31 33 37 1 29 | 10 | | 39 40 52 1 30 J 11 I 4 I 2 1 31 | 11 j 5 J 3 7 30 41 I 32 J 12 I 6 | 3 7 30 41 I 33 | 14 ] 7 | 3 7 30 41 I 34 | 15 39 40 I 8 | 3 7 30 41 I 35 | 15 39 40 I 9 I 3 I 36 | 15 I 10 | 3 30 I 37 J 15 39 40 j 11 I 3 I 38 | 21 37 | 12 J 3 I 39 | 23 I 13 | 4 I 40 | 25 48 i 14 I 6 31 33 52 I 41 | 26 i 15 | 6 15 31 33 39 40 52 I 42 | 29 1 16 | 6 31 33 52 I 43 | 30 I 17 | 6 15 31 33 39 40 52 I 44 | 31 1 18 I 6 31 33 52 I 45 | 31 I 19 I 6 31 33 52 I 46 | 33 | 20 1 6 31 33 39 52 I 47 | 39 1 21 | 6 31 33 52 I 48 | 39 I 22 I 6 31 33 52 I 49 | 40 1 23 | 6 31 33 52 I 50 | 41 1 24 \ 6 31 33 52 I 51 | 42 1 25 I 6 31 33 52 J 52 j 45 1 26 | 6 I 53 I 52 F i g u r e 1.18 Osing the LALR (k) a l g o r i t h m , we can c o n s t r u c t the lookahead symbol s e t f o r s t a t e 19 as f o l l o w s : from s t a t e 19 we w i l l f o l l o w p r o d u c t i o n numbers 21 and 29 back through the CFSM. Prod u c t i o n number 21 has one symbol on fehe r i g h t hand s i d e so we w i l l go back through the CFSM only to those s t a t e s which l e a d i n t o s t a t e 19. They are 6, 31, 33 and 52. For each of these s t a t e s we mark them and f o l l o w the nonterminal ("FORMULA") t r a n s i t i o n t o s t a t e s 18, 18, 18 and 18, r e s p e c t i v e l y . In s t a t e 18, we f o l l o w p r o d u c t i o n number 16 back through the CFSM only to those s t a t e s which have been marked. S t a t e 18 i s entered from s t a t e s 6, 31, 33 and 52. Under the nonterminal "UNIT", we f o l l o w the t r a n s i t i o n s to s t a t e s 26, 44; 46 and 53, r e s p e c t i v e l y . S t a t e 26 y i e l d s {CLOSE, 41, REDUCE 21}. P r o d u c t i o n number 13 i s reduced i n s t a t e 44 and le a d s to s t a t e 45 which y i e l d s {GOON, 52, REDUCE 21} and the r e d u c t i o n of pr o d u c t i o n number 3., We continue i n t h i s v e i n u n t i l we get the t a b l e as i n f i g u r e 1.19. The LALR(k) al g o r i t h m i s repeated f o r s t a t e s 21 and 24, y i e l d i n g the lookahead symbol s e t s i n f i g u r e s 1.20 and 1.21, r e s p e c t i v e l y . [ SYMBOL ^~ T~NEXT ~\ ACTION ] | CLOSE | 32 | REDUCE 21 | | CLOSE I 41 | REDUCE 21 | \ GOON | 52 | REDUCE 21 | | PRI010P | 39 | REDUCE 29 \ I PRI020R | 40 | REDUCE 29 | Fi g u r e 1.19 lookahead symbol s e t f o r s t a t e 19 i " • '• 1 T " 1 | SYMBOL | NEXT | ACTION J | CLOSE | 32 | REDUCE 17 | | CLOSE | 41 | REDUCE 17 | | GOON | 52 1 REDUCE 17 | | OPEN J 3 | READ | l PRI010P | 39 | REDUCE 28 | j PRI020P | 40 | REDUCE 28 | Fi g u r e 1.20 lookahead symbol s e t f o r s t a t e 21 I — H 1 1 • T J SYMBOL | NEXT J ACTION | | CLOSE | 32 j REDUCE 20 | } CLOSE | 41 | REDUCE 20 | | GOON | 52 | REDUCE 20 | | PRI010P | 39 | REDUCE 26 | 1 PRI020P | 40 | REDUCE 26 | I \ X I ; J F i g u r e 1.21 lookahead symbol s e t f o r s t a t e 24 46 1.7 LR (k) Grammars and S t a t e - s p l i t t i n g The f a i l u r e o f the LALR (k) a l g o r i t h m i n d i c a t e s t h a t the grammar i s not LR(k) or t h a t the reduced CFSM does not permit the l e f t context to be unambiguous. I f we can p a r t i t i o n o f f some of the l e f t c o n t e x t , then perhaps c l a s h e s w i t h i n the lookahead symbol s e t s w i l l d isappear. Within the CFSM there w i l l be a number o f s t a t e t r a n s i t i o n s (one or more) l e a d i n g i n t o an inadequate s t a t e (for which the LALR(k) lookahead a l g o r i t h m f a i l e d ) . I f there i s only one s t a t e t r a n s i t i o n (loops are ignored) l e a d i n g i n t o the inadequate s t a t e , then we must f o l l o w t h a t s t a t e t r a n s i t i o n back through the CFSM, as f a r as i t i s necessary, u n t i l we f i n d a s t a t e with more than one s t a t e t r a n s i t i o n l e a d i n g i n t o i t (loops w i t h i n these s t a t e s are i g n o r e d ) . I f none can be found then the grammar i s not LR (k). For example, c o n s i d e r a CFSM with s t a t e s {a, b, c, d, ..-} with s t a t e t r a n s i t i o n s from a to c, b to c, c t o d, d to c, oth e r s from d to s t a t e s other than c and with no other s t a t e t r a n s i t i o n s l e a d i n g to c and d. Now l e t d re p r e s e n t an inadequate state.* We f o l l o w the s t a t e t r a n s i t i o n s l e a d i n g i n t o d back through the CFSM ( i g n o r i n g l o c a l loops) u n t i l we f i n d a s t a t e with more than one s t a t e t r a n s i t i o n l e a d i n g i n t o i t (state c ) . We w i l l make a copy of s t a t e s c and d ( s t a t e s c* and d f , r e s p e c t i v e l y , with s t a t e t r a n s i t i o n s from c' to d« and d» to C and from d* to other s t a t e s as i n d, but th e r e w i l l be no s t a t e t r a n s i t i o n s from c to d 1 , c f to d, d» to c or d to c » ) . A s t a t e t r a n s i t i o n l e a d i n g i n t o s t a t e c* from s t a t e b i s c o n s t r u c t e d and the s t a t e t r a n s i t i o n from b t o c i s removed. Now there i s onl y one s t a t e t r a n s i t i o n l e a d i n g to c from a ( i g n o r i n g the loop from d) . S i m i l a r l y , f o r the s t a t e s c* and d» there i s only one s t a t e t r a n s i t i o n l e a d i n g to c» from b ( i g n o r i n g d » ) . St a t e d» r e p r e s e n t s a new inadequate s t a t e . The r e s u l t o f t h i s a c t i o n i s a) the CFSM i s no lon g e r reduced, and b) the s i z e of the CFSM has i n c r e a s e d . De Remer termed t h i s a c t i o n " s t a t e - s p l i t t i n g " . Once the s t a t e - s p l i t t i n g has been performed, the LALR(k) a l g o r i t h m may be repeated. I f i t c o n t i n u e s to f a i l , then s t a t e - s p l i t t i n g may be attempted again, u n t i l no f u r t h e r s p l i t t i n g a c t i o n can be performed. I f the LR(k) a l g o r i t h m i s s u c c e s s f u l f o r a l l the inadequate s t a t e s (for which the LALR (k) al g o r i t h m f a i l e d ) then the CFSM may now be converted to a DPDA. The f o l l o w i n g LR(1) grammar has i t s CFSM and source t r a n s i t i o n s i n f i g u r e s 1.22 and 1.23, r e s p e c t i v e l y : (1) S -> START EE STOP (2) EE -> A AA D (3) EE -> A BB C (4) EE -> B AA C (5) EE -> B BB D (6) AA -> E AA (7) AA -> E (8) BB -> E BB (9) BB -> E. ]STATE| CONFIGURATION SET | SYMBOL | STATE | -+ -+— 1 1 S - > . .START EE 2 EE -> .A AA D EE -> • A BB C EE -> .B A A C EE -> . B BB D S --> START.EE 3 AA -> .E A A A A -> .E BB ~> .E BB BB -> . E EE -> A. A A D EE -> A. BB C AA -> . E A A AA -> . E BB -> .E BB BB -> . E EE -> B. AA C EE -> B. BB D 5 S --> START EE. 6 AA -> . E AA AA -> • E BB -> .E BB BB -> . E AA -> E. AA BB -> E. BB AA -> E. | BB -> E. 7 I EE -> A AA.D 8 EE -> A BB.C 9 | EE -> B AA.C 10 | EE -> B BB.D 11 I s --> START EE 12 | AA -> E AA. , 13 | BB -> E BB. 14 I EE -> A AA D. START A B EE E AA BB AA BB STOP E AA BB #7 #9 D C C D #1 #6 #8 #2 2 3 4 5 6 7 8 9 10 11 6 12 13 14 15 16 17 « . F i g u r e 1.22 continued on next page 49 |STATE| \ - — + -1 15 | EE I 16 | EE I 17 | EE i i ~ , — T -| STATE| IS 1 1 i I 2 i 1 i 3 | 2 I 4 | 2 I 5 j 2 I 6 | 3 4 | 7 | 3 I 8 | 3 I 9 | 4 CONFIGURATION SET -> B BB D. STATE | SYMBOL | #3 I | #4 F i g u r e 1.22 - J — i — - — | | 10 I 4 I 11 I 5 I 12 | 6 I 13 I 6 1 14 | 7 I 15 I 8 ] 16 | 9 1 17 I 10 L . _ i 1 F i g u r e 1.23 St a t e 6 i s inadequate. The LALR (k) a l g o r i t h m f a i l s with the lookahead symbol s e t i n f i g u r e 1.25 which was preceded by the lookahead symbol s e t i n f i g u r e 1.24 under "C". [ SYM I C I C I D I D I E F i g u r BOL I NEXT I 15 I 16 I 14 I 17 I 6 REDUCE 9 REDUCE 7 REDUCE 7 REDUCE 9 READ e 1.24 lookahead (k=1) symbol s e t f o r s t a t e 6 ACTION | SYMBOL | NEXT | ACTION | | STOP 1 1 1 | REDUCE 7 | | STOP | 11 | REDUCE 9 | F i g u r e 1.25 lookahead (k=2) symbol s e t f o r s t a t e 6 S t a t e - s p l i t t i n g i s attempted. The source t o s t a t e 6 i s 50 s p l i t up such t h a t only s t a t e s 3 and 6 l e a d i n t o 6 and s t a t e 4 has a d i f f e r e n t t r a n s i t i o n under "E" to s t a t e 18, a new s t a t e . The modified CFSM i s i n f i g u r e 1.26 and the modified source t r a n s i t i o n s are i n f i g u r e 1.27. S t a t e 18 i s inadequate. Using the LALR(k) a l g o r i t h m , we get the lookahead symbol s e t s f o r s t a t e s 6 and 18 i n f i g u r e s 1.28 and 1.29, r e s p e c t i v e l y . [ S T A T E T " CONFIGURATION SET I SYMBOL T~STATE ~|* 1 — — T # . . . . . - — — r 1 1 I S - > . START EE STOP | START 1 2 ! 2 | EE -> .A AA D 1 A 1 3 I | EE -> .A BB C | j | EE -> .B AA C 1 B ! 4 I | EE -> .B BB D I I I S -•> START.EE STOP | EE 1 5 1 3 | AA -> .E AA \ E 1 6 | | AA -> . E j j BB -> . E BB j j BB -> • E j | | | EE -> A. A A D | AA 1 7 1 1 EE -> A. BB C | BB 1 8 i 4 | AA -> .E A A 1 E 1 18 I | AA -> .E j ] | BB -> . E BB { | | | BB -> . E j | | EE -> B. AA C | AA I 9 J 1 EE -> B.BB D | BB J 10 I 5 | S --> START EE.STOP | STOP I 11 1 6 | AA -> . E AA 1 E I 6 | | A A -> . E | ] | BB -> .E BB | j | | BB -> . E J | | | AA -> E. AA | AA I 12 | | BB -> E. BB | BB 1 13 j | AA -> E. | #7 I 1 1 BB -> E. 1 #9 1 J 7 | EE -> A AA.D ] o 1 14 1 8 | EE -> A BB.C 1 c | 15 i : — x — _ j _ _ x _ F i g u r e 1.26 continued on next page 51 i ;—~x" |STATE| r +-CONFIGURATION SET -—I 9 EE -> B AA.C I c I 16 I a 10 EE -> B BB.D I D I 17 1 1 i 11 S --> START EE STOP. I #1 i - 1 1 I 12 A A -> E AA. i #6 I - 1 1 1 13 BB -> E BB. I #8 I -  1 • 14 EE -> A AA D. J #2 ! 1 1 1 15 EE -> A BB C. | #3 I ~  1 • 16 EE -> B AA C. J #4 1 " 11 i 17 EE -> B BB D. | #5 1 ~ 1 1 • 18 AA -> .E AA I E | 18 1 1 AA -> .:E | j 1 BB -> .E BB , j | 1 BB -> . E J | 1 AA -> E. AA } AA 1 12 1 BB -> E. BB 1 BB I 13 1 A A -> E. 1 #7 I — 1 BB -> E. | #9 I 1 SYMBOL t STATE | . H 4 F i g u r e 1.26 i T i T~' ' — r • 1 |STATE| IS ENTERED FROM STATES |STATE| IS ENTERED FROM STATES | j. i + _ x _ 1 I 1 | \ 10 I 4 I I 2 1 1 I 11 I 5 1 I 3 | 2 ) 12 I 6 1 I 4 I 2 I 13 I 6 1 1 5 I 2 1 14 I 7 1 1 6 | 3 6 I 15 l 8 J 1 7 I 3 I 16 I 9 1 1 8 I 3 I 17 I 10 1 1 9 I 4 I 18 | 4 18 I L ; j i ._ .. . i Fi g u r e 1.27 52 I ' -i T -" : i : —• 1 | STATE | NEXT | ACTION | y 1_ , + -| 4 I C | 15 | REDUCE 9 | I D J 14 | REDUCE 7 | | E | 6 | READ I F i g u r e 1.28 lookahead symbol s e t f o r s t a t e 6 | STATE | NEXT | ACTION *| i * — + - - i — i — ^ l C I 16 | REDUCE 7 | j D | 17 | REDUCE 9 | | E | 18 J READ | F i g u r e 1.29 lookahead symbol s e t f o r s t a t e 18 1.8 Empty P r o d u c t i o n s Sometimes i t i s convenient to have a p r o d u c t i o n or productions of the form A -> e w i t h i n a CFG. However, the occurrence of such productions does not complicate a p p r e c i a b l y the c o n s t r u c t i o n of the p a r s e r . For example, s t a t e s may have nonterminal t r a n s i t i o n s out of them, but those s t a t e s need not have t e r m i n a l t r a n s i t i o n s out of them when the c o n f i g u r a t i o n A ->.e occurs w i t h i n them. To f a c i l i t a t e the pushdown s t a c k a new s t a t e type i s c r e a t e d . T h i s s t a t e type i s c a l l e d a push s t a t e which c o n s i s t s of two p i e c e s of information:! a) the push i n s t r u c t i o n code and b) an o f f s e t i n t o the symbol t a b l e g i v i n g i ) a s t a t e name which i s t o be pushed onto the stack and i i ) a s t a t e t r a n s i t i o n to a reduce s t a t e where the p r o d u c t i o n number i s output. The f o l l o w i n g SLR(1) grammar c o n t a i n s an empty p r o d u c t i o n : (1) S -> A E B 53 (2) E -> C (3) E -> D (4) D -> (5) D -> D W (6) C -> V D. The CFSM i s i n i 1 — _ STATE] CONFIGURATION SET | SYMBOL | STATE J ;_ - + — - — — . .j _ —\— 1 I s -> .A E B ! A ] 2 2 I c -> .V D I v | 3 I E -> .C I c | 4 I E -> .0 I o I 5 I D -> .D W I ] I s -> A.E B i E j 6 I B -> • | #U | — 3 I c -> V.D I D | 7 I D -> .D W | j I D -> • J #4 J — U I E -> C I «2 | -5 I D -> D. W 1 W ] 8 I E -> D. 1 #3 | — 6 I s -> A E.B 1 B | 9 7 I D -> D. W 1 w | 8 I c -> V D. 1 #6 I — 8 I D -> D W. 1 #5 | — 9 I s -> A E B. 1 #1 I — I -X~~— F i g u r e 1.30 54 STATE TABLE SYMBOL TABLE i — i 1 j TYPE | NUMBERJ OFFSET | | SYMBOL|STATE| 1 j READ 1 T~* | 1 | 8 t 1 i B — i 1 1 15 1 2 J LOOKAHEAD 1 I 3 | 1 | 2 IV I 10 | 3 | PUSH I 0 1 16 | 3 ] w | 15 f 4 | REDUCE 1 0 I 17 | 4 IB | 17 | 5 |LOOKAHEAD 1 j 2 | 4 1 5 1 w | 11 | 6 | READ 1 1 1 11 1 6 IB I 18 | 7 JLOOKAHEAD 1 1 2 | 6 1 7 1 w I 12 | 8 |REDUCE 1 1 1 20 | 8 U I 2 | 9 jREDUCE 1 3 | 21 | 9 1 v | 3 I 10 | READ 1 1 ! 9 I 10 | W I 8 | 11 | READ I 1 I 10 | 11 1 B I 9 | 12 | READ I 1 I 10 | 12 13 I 7 J 13 |LOOKBACK i 0 | 12 | 13 JO 1 5 | 14 | STOP I 0 | 0 I 14 1 10 1 16 1 15 ] PUSH I 0 | 14 | 15 14 j 13 | 16 |REDUCE 1 o 1 15 | 16 1 3 | 16 | 17 |REDUCE 1 0 J 18 | 17 12 I 6 | 18 |REDUCE i 1 I 19 I 18 | 3 I 6 | i 19 16 I 4 | 20 15 I 13 | 21 | 1 1 14 l i i i F i g u r e 1.31 55 CHAPTER 2 PARSER GENERATOR IMPLEMENTATION In t h i s chapter, we w i l l d i s c u s s the parser and the p a r s i n g a l g o r i t h m , input of the CFG, f u r t h e r o p t i m i z a t i o n s and output of the LR(k) p a r s e r generator. We have assumed t h a t the l e x i c a l a n a l y s i s of the s t r i n g has been completed before the s y n t a c t i c parse of the s t r i n g s t a r t s and t h a t the output of the l e x i c a l a n a l y s i s i s a sequence of i n t e g e r s . However, f o r simple languages the compiler w r i t e r need only c a l l the l e x i c a l a n a l y s e r each time a symbol i s read. T h i s i s assuming, of course, that t h e r e are always enough symbols w i t h i n the symbol stream to perforim lookahead. 2. 1 The Parser The type of parser i n s t r u c t i o n s ( i . e . , read, pop, etc.) have been d e s c r i b e d p r e v i o u s l y . Here, we w i l l mention them again, only to d e s c r i b e how they appear w i t h i n the IBM System 360 computers. A l l p a r s e r i n s t r u c t i o n s f i t i n t o 4 bytes (a f u l l w o r d ) with the f o l l o w i n g format: B i t s 0-3: i n s t r u c t i o n code. B i t s 4-7:; number of symbols lookahead; otherwise, zero. B i t s 8-15: number of symbols i f read or lookahead, number of s t a t e names to be popped i f reduce; otherwise, z e r o . 56 B i t s 16-31: o f f s e t i n t o symbol t a b l e f o r a l l i n s t r u c t i o n s except stop which i s zero. The symbol t a b l e has d i f f e r e n t types of e n t r i e s f o r each of the d i f f e r e n t types of i n s t r u c t i o n s : &) read and lookahead the number of symbols denotes the number of f u l l w o r d e n t r i e s . Each f u l l w o r d entry c o n t a i n s a halfword p a i r , a symbol and a s t a t e name. B) read indexed and lookahead indexed the symbol t a b l e e n t r i e s are a l i s t of halfwords the leng t h being the number of t e r m i n a l symbols w i t h i n the grammar p l u s 1. Each halfword i s 0 or a s t a t e name. C) reduce or pop the symbol t a b l e e n t r y i s two halfwords, the pro d u c t i o n number and a s t a t e name. D) lookback the symbol t a b l e e n t r i e s form a l i s t of halfword p a i r s of s t a t e names; the s t a t e name to be compared with the top of the stack and the s t a t e name denoting the t r a n s i t i o n t o be taken. In the l a s t e n t r y , the f i r s t halfword i s 0. E) push the t a b l e entry c o n s i s t s of a halfword p a i r of s t a t e names; the s t a t e name to be pushed onto the stack and the s t a t e name denoting the t r a n s i t i o n to be taken. The p a r s i n g a l g o r i t h m i s s t a t e d as f o l l o w s : 57 Step 1: s e t s t a t e to 1, the stack empty and c u r r l e x to the head of the lexeme stream. Step 2: s e t type to b i t s 0-3, num t o b i t s 8-15 and o f f s e t to b i t s 16-31 of s t a t e t a b l e [ s t a t e ]. Go t o step (2 + t y p e ) . Step 3 (read): s e arch symbol t a b l e , s t a r t i n g at s y m t a b l e [ o f f s e t ] , f o r a match to lexeme at c u r r l e x . I f noQe found then syntax e r r o r ; otherwise, push s t a t e onto top of stack and s e t s t a t e to s t a t e name adjacent the symbol entry i n the symbol t a b l e . Set c u r r l e x t o c u r r l e x + 1 . Go to step 2. Step 4 (read indexed); use the lexeme a t c u r r l e x as an index i n t o the symbol t a b l e s t a r t i n g a t s y m t a b l e [ o f f s e t ]. I f the e n t r y i s 0 then syntax e r r o r ; otherwise, push s t a t e onto the st a c k and s e t s t a t e to the s t a t e name. Set c u r r l e x to c u r r l e x + 1 . Go to step 2. Step 5 (pop) : pop num s t a t e s o f f the s t a c k . Output the pro d u c t i o n number ( f i r s t halfword of symtable[ o f f s e t ] ) and s e t s t a t e t o s t a t e name (second halfword of symtable[ of f se t ]) . Go to step 2. Step 6 (lookahead) : search symbol t a b l e , s t a r t i n g a t s y m t a b l e [ o f f s e t ], f o r a match t o the lexeme at ( c u r r l e x - 1 + ( b i t s 4-7 of s t a t e t a b l e [ s t a t e ] ) ) . I f none i s found then syntax e r r o r ; otherwise, s e t s t a t e to s t a t e name adjacent the symbol e n t r y i n the symbol t a b l e . Go to step 2. Step 7 (lookahead indexed): use the lexeme a t ( c u r r l e x - 1 • ( b i t s 4-7 of s t a t e t a b l e [ s t a t e ])) as an index i n t o the 58 symbol t a b l e s t a r t i n g a t symtable[ of f s e t ]. I f the entry i s 0 then syntax e r r o r ; otherwise, s e t s t a t e to the s t a t e name. Go t o ste p 2. Step 8 (lookback): search the symbol t a b l e , s t a r t i n g at s y m t a b l e [ o f f s e t ], f o r a match ( f i r s t halfword) to the top entry i n the st a c k or u n t i l 0 i s found and s e t s t a t e to adjacent s t a t e name (second h a l f w o r d ) . Go to step 2. Step 9 ( s t o p ) : do j u s t t h a t : s t o p . Step 10 (push): push the s t a t e name ( f i r s t halfword) at s y m t a b l e [ o f f s e t ] onto the stack and s e t s t a t e to adjacent s t a t e name. Go to step 2. I f the a l g o r i t h m terminates i n step 9, then the s t r i n g i s a sentence i n the language; otherwise; i f the parse was i n t e r r u p t e d by a syntax e r r o r , then the parse may be terminated or e r r o r recovery may be attempted (see CHAPTER 3 ) . 2.2 F u r t h e r O p t i m i z a t i o n s From the above d i s c u s s i o n , an obvious o p t i m i z a t i o n which may be performed i s the m u l t i p l i c a t i o n of a l l s t a t e names by four (assuming t h a t we s t a r t e d with s t a t e names such as s t a t e 1, s t a t e 2, e t c . ) . T h i s speeds up the in d e x i n g i n t o the s t a t u t a b l e . The compiler w r i t e r i s given c o n t r o l o f the d e c i s i o n as to when to use a read or readindexed (lookahead or lookahead indexed) i n s t r u c t i o n . The upper l i m i t (number o f symbols i n the 59 set) f o r a s e q u e n t i a l read or lookahead i s read i n and a symbol set with equal or more symbols w i l l be accessed v i a the index mechanism. A d d i t i o n a l l y , two read or read indexed s t a t e s may use the same symbol t a b l e e n t r i e s i f the number of t e r m i n a l symbols and the t e r m i n a l symbol t r a n s i t i o n s are the same, thus saving i n space. I t i s not n e c e s s a r i l y an o p t i m i z a t i o n but a convenience t o be able to c o n t r o l the type of lookahead a l g o r i t h m used. Two i n t e g e r s are i n p u t . The f i r s t denotes the a l g o r i t h m to commence c o n s t r u c t i o n of lookahead and the second denotes the al g o r i t h m such that i f i t f a i l s , go no f u r t h e r . The i n t e g r a l values are a s s o c i a t e d as f o l l o w s : 0-LR (0), 1-SLR (k) , 2-LALR (k) and 3-LR (k). For example, i f we are only i n t e r e s t e d i n SLR (k) or LALR (K) grammars then we can minimize the amount of work r e q u i r e d to e i t h e r c o n s t r u c t the parse r or terminate i n f a i l u r e (e.g., the i n t e g r a l p a i r 2 2 denote " s t a r t with LALR (k) lookahead but do not attempt s t a t e - s p l i t t i n g i f i t f a i l s " ) . There i s only one parse r i n s t r u c t i o n i n which the speed of the p a r s e r may be impeded. The lookback s t a t e does a s e q u e n t i a l search through the symbol t a b l e and other than the o p t i m i z a t i o n d e s c r i b e d e a r l i e r , i t i s d i f f i c u l t t o i n c r e a s e the speed o f the lookback process without i n c r e a s i n g the complexity and/or s i z e of the p a r s e r . Further o p t i m i z a t i o n s on p a r s e r s f o r LR(1) grammars may be performed as d e s c r i b e d i n [ A ] . However, i t was f e l t t h a t i t was d e s i r a b l e to keep the schema g e n e r a l and, t h e r e f o r e , d i d not 6 0 apply these o p t i m i z a t i o n s . De Remer performs an o p t i m i z a t i o n which causes i r r e l e v a n t r e d u c t i o n s to be bypassed. I have not implemented t h i s f e a t u r e though i t i s not too d i f f i c u l t to perform. The compiler w r i t e r should have the o p t i o n of whether or not i t i s a p p l i e d . 2.3 Input of the CFG The format of the CFG i n p u t t o the LR (k) p a r s e r generator must adhere to the f o l l o w i n g r u l e s : A) the CFG i s represented as a sequence of p r o d u c t i o n s , the f i r s t p r o d u c t i o n i s of the form S->f-S»4 where S, fe and i are used no where e l s e w i t h i n the grammar. The symbol r e p r e s e n t a t i o n s of S r |- and -| f o l l o w the same r u l e s as a p p l i e d to other nonterminals and t e r m i n a l s (see D) below) . B) Each p r o d u c t i o n or sequence of p r o d u c t i o n s with the same nonterminal on the l e f t hand s i d e may have the format: nonterminal symbol, c o l o n , sequence of a l t e r n a t i v e s separated by semicolons, p o i n t . C) a nonempty a l t e r n a t i v e i s a sequence of t e r m i n a l and nonterminal symbols separated by commas. A nonempty a l t e r n a t i v e may not i n c l u d e the empty symbol; where as, an empty a l t e r n a t i v e i s the r e p r e s e n t a t i o n f o r an empty p r o d u c t i o n . D) t e r m i n a l and nonterminal symbols may be any sequence of upper case l e t t e r s , lower case l e t t e r s , d i g i t s and 61 any o t h e r c h a r a c t e r not i n c l u d i n g ":", ";'*, " " ) " , " (", "<" or ">". Nonterminals are d i s t i n g u i s h e d from t e r m i n a l s by t h e i r occurrence on the l e f t hand s i d e of a production. E) comments which are not p a r t of the r e p r e s e n t a t i o n of a symbol must be bracketed by (" and " ) " or and ">". The b r a c k e t s and ever.y c h a r a c t e r between them are d e l e t e d from the r e p r e s e n t a t i o n of a symbol. Comments may occur anywhere. F) a blank " " i s r e t a i n e d as a c h a r a c t e r of a symbol i f and only i f i t occurs w i t h i n a sequence of nonblank c h a r a c t e r s which are c h a r a c t e r s w i t h i n a nonterminal or t e r m i n a l symbol. A sequence of 2 or more blank c h a r a c t e r s are reduced to 1 and the above r u l e i s a p p l i e d . G) t h e r e i s no r e p r e s e n t a t i o n of the empty symbol. The f o l l o w i n g grammar i s an example of the format a c c e p t a b l e as i n p u t to the parser generator: (001) program (001) : s t a r t , clause(002) , stop . (002) clause(001,002,018,019): open , se r i e s ( 0 0 3 ) , c l o s e . (003) s e r i e s (002,003) : d e c l l i s t (004) , go on , u n i t series(022) • (004) d e c l list(003,004,005,005): d e c l (006) ; (005) d e c l l i s t (004) , comma , d e c l (006) . ! (006) d e c l (004,005,006) : d e c l a r e r (007) , iden l i s t (020) . 62 (007) declarer(006,007,008^009,009,010,010): r e a l ; (008) i n t ; (009) open , unit(011) , c l o s e , declarer(007) ; (010) proc , d e c l a r e r (007) . (011) unit(009,011,012,013,014,022,023) : assignation(014) ; (012) formula (015) ; (013) N primary(017) . (014) a s s i g n a t i o n ( 0 1 1 , 0 1 4 ) : iden , becomes , u n i t (011) . (015) formula(012,015,016,016) : primary (017) , op , primary (017) ; (016) formula (015) , op , primary (017) . (017) primary(013,015,015,016,017,018,018,019): i d e n ; (018) primary (017) , c l a u s e (002) ; (019) c l a u s e (002) . (020) iden list(006,020,021,021): iden ; (021) i d e n l i s t (020) , comma , i d e n . (022) u n i t s e r i e s (003,022,023,023): unit(011) ; (023) u n i t s e r i e s (022) , go on , u n i t (011) . Each p r o d u c t i o n i s preceded by a comment c o n s i s t i n g of the pr o d u c t i o n number. The nonterminal on the l e f t hand s i d e i s f o l l o w e d by a comment which c o n t a i n s the c r o s s - r e f e r e n c i n g of that nonterminal w i t h i n the grammar. On the r i g h t hand s i d e , each nonterminal i s f o l l o w e d by a comment which c o n t a i n s the pr o d u c t i o n number of the f i r s t p r o d u c t i o n with t h a t nonterminal 63 on the l e f t hand s i d e . Once the grammar has been read i n the symbols are ordered a l p h a b e t i c a l l y . The t e r m i n a l symbols are ordered from 1 t o n where n i s the number of t e r m i n a l symbols. The nonterminal symbols are ordered from n+1 to m where m-n i s the number of nonterminal symbols. The # symbols are then ordered from m+1 to p where p-m i s the number of pro d u c t i o n s i n the grammar. P r o v i s i o n c o u l d be made to allow the compiler w r i t e r the o p t i o n of s p e c i f y i n g h i s own o r d e r i n g of the t e r m i n a l symbols, e.g., i f he has a l r e a d y w r i t t e n code using a p a r t i c u l a r o r d e r i n g , then t h i s would save him from r e w r i t i n g t h a t code or having t o t r a n s l a t e to the new o r d e r i n g . 2.4 Output The output of the LR(k) pa r s e r generator, assuming i t was s u c c e s s f u l , i s t h r e e a r r a y s of i n t e g e r s and s h o r t i n t e g e r s i n PL360 [W]. The f i r s t a r r a y i s the s t a t e t a b l e , the second i s the symbol t a b l e and the t h i r d i s a s p e c i a l t a b l e f o r e r r o r r e c o v e r y (see CHAPTER 3 ) . I f the grammar i s not LR(k) then the parse r generator w i l l p r i n t the s t a t e names of those s t a t e s i n which the lookahead symbol s e t computation f a i l e d and the l a s t lookahead symbol s e t that was computed f o r each one ( i . e . , the one on which the lookahead a l g o r i t h m f a i l e d ) . 64 CHAPTER 3 ERROR RECOVERY In s e c t i o n 2.1, we found two p l a c e s w i t h i n the parser i n which syntax e r r o r s were d i s c o v e r e d ; w i t h i n a read or read indexed s t a t e and w i t h i n a lookahead or lookahead indexed s t a t e . One o f the f i r s t problems which comes to l i g h t i s that i t seems f a i r l y reasonable to attempt r e c o v e r y when the e r r o r was d e t e c t e d at the head of the s t r i n g , but what i f the errJor was dete c t e d on a symbol an a r b i t r a r y d i s t a n c e down from the head of the s t r i n g as i n the lookahead s t a t e ? I t t u r n s out t h a t t h i s i s r e a l l y no problem at a l l as w i l l be shown. 3.1 Some Ideas F i r s t of a l l , l e t us c a l l the symbol on which the e r r o r was det e c t e d , the "symbol i n e r r o r " (though i t i s not n e c e s s a r i l y the symbol t h a t i s wrong). The symbol i n e r r o r and the symbols adjacent to i t may have some t r a n s f o r m a t i o n s performed on them. For example, a sequence of symbols may be i n s e r t e d before the symbol in, e r r o r , a sequence of symbols ( s t a r t i n g with the symbol i n e r r o r ) may be d e l e t e d , the symbol i n e r r o r may be r e p l a c e d by another symbol. We c o u l d l e a v e the symbol stream alone and make adjustments to the pushdown s t a c k . For example, adding s t a t e names would be e q u i v a l e n t to i n s e r t i n g phrases and popping s t a t e 65 names would be e q u i v a l e n t to d e l e t i n g phrases. Ne i t h e r a c t i o n i s d e s i r a b l e s i n c e the c o n t i n u a t i o n of semantics becomes very d i f f i c u l t . We do allow s t a t e names t o be popped, but we cannot adequately push names onto the pushdown stack as we have removed the nonterminal symbol t r a n s i t i o n s from the DPDA. The above a c t i o n s may a l s o be combined to g e t h e r t o form more complex a c t i o n s . There are fou r b a s i c e r r o r recovery a c t i o n s (provided by the parser implemented) which may be performed: a) d e l e t e a sequence of 1 to 5 symbols s t a r t i n g with the symbol i n e r r o r , b) pop 1 to 5 s t a t e names o f f the pushdown s t a c k , c) i n s e r t a symbol before the symbol i n e r r o r , and d) r e p l a c e the symbol i n e r r o r with another symbol. For a c t i o n s c) and d ) , the symbols are o b t a i n e d from the symbol t a b l e f o r that s t a t e i n which the e r r o r was de t e c t e d . A c t i o n s c) and d) are repeated f o r each symbol w i t h i n the read or lookahead s t a t e . A c t i o n a) cannot d e l e t e the l a s t symbol i n the s t r i n g . A c t i o n b) r e q u i r e s a s p e c i a l t a b l e (budlt by the p a r s e r generator) i n which each en t r y i n the t a b l e i s a s t a t e name. When a s t a t e name i s popped from the stac k , that s t a t e name i s used to index the t a b l e to ob t a i n the s t a t e name i n which the parse w i l l be r e s t a r t e d . T h i s i s due to the f a c t t h at there may be a sequence of lookahead s t a t e s l e a d i n g i n t o the read s t a t e whose name i s on the s t a c k ; we want to r e s t a r t i n the i n i t i a l lookahead s t a t e , not the read s t a t e , The upper l i m i t of 5 placed on the d e l e t e and pop a c t i o n s 66 are j u s t i f i e d as f o l l o w s : i f t h e r e i s no l i m i t on the number of symbols which may be d e l e t e d , then i t i s q u i t e p o s s i b l e to d e l e t e the remainder of the program. Such an a c t i o n i s d e t r i m e n t a l to the secondary r e s p o n s i b i l i t y of a compiler implementation, to show the programmer as many as p o s s i b l e or a l l of the e r r o r s w i t h i n h i s program. k l i m i t of 5 symbols appears, from per s o n a l e x p e r i e n c e , to be a s a t i s f a c t o r y l i m i t and the experience d i d not suggest t h a t 4 or 6 would be any b e t t e r . For the pop a c t i o n , the above argument a p p l i e s e q u a l l y w e l l . I f s t a t e names (and the a s s o c i a t e d symbols and phrases on a semantics stack) are popped, then an a r b i t r a r y amount of program which has a l r e a d y been parsed i s thrown away and no semantic parse c o u l d p r o p e r l y take plac e on t h a t p a r t of the program. We c u r r e n t l y allow the f o l l o w i n g s e t s of a c t i o n s : { d e l e t e 1 to 5 symbols}, { i n s e r t 1 to 10 symbols}, { r e p l a c e 1 symbol by another}, {pop 1 to 5 s t a t e names}, { i n s e r t 1 to 10 symbols and d e l e t e 1 to 5 symbols} and { i n s e r t 1 to 10 symbols and r e p l a c e 1 symbol by another}. We, do not allow a pop a c t i o n to be combined with another a c t i o n as the pop a c t i o n i s r a t h e r messy to d e a l with i n the f i r s t p l a c e . I t c o u l d be done, however, but we f e e l t h a t the b e n e f i t s are not a l l t h a t c l e a r and t h a t i t i s r a t h e r disadvantageous to pop s t a t e names. Define the "exposed" symbol as the symbol a) which immediately f o l l o w s the sequence of symbols d e l e t e d , b) which i s the symbol i n e r r o r i f the a c t i o n i s pop, c) which i s the symbol 67 I 1 j u s t i n s e r t e d or d) which i s the symbol r e p l a c i n g the symbol i n e r r o r . For each e r r o r recovery a c t i o n t h a t may be attempted, t h e r e i s u s u a l l y a t l e a s t one a c t i o n which i s b e t t e r than a l l the other a c t i o n s . The problem i s now to find- out which one. One way i s to attempt 1 one a c t i o n and r e s t a r t the p a r s e r , observe how s u c c e s s f u l i t was and repeat the process f o r a l l the other a c t i o n s . The best a c t i o n (most s u c c e s s f u l ) can then be used as the e r r o r r e c o v e r y a c t i o n . There are a number of problems, however. The f i r s t problem i s t h a t most com p i l e r s ( l i k e our own) perform semantic a c t i o n s d u r i n g a s y n t a c t i c parse and do not want these semantic a c t i o n s to go on d u r i n g the " e v a l u a t i o n " parse., Secondly, the d i f f e r e n t types of a c t i o n s cause the s t a c k and i n p u t s t r i n g t o be modified s i g n i f i c a n t l y . We would have to make c o p i e s of them such t h a t they can be r e s t o r e d to the s t a t e they were i n when the syntax e r r o r was d i s c o v e r e d i n order t h a t each e r r o r r e c o v e r y a c t i o n can be p r o p e r l y t e s t e d . To a l l e v i a t e these two problems, we can c o n s t r u c t a parser which performs the " e v a l u a t i o n " parse. We c a l l i t a "symbolic" p a r s e r . The "symbolic" p a r s e r performs a "symbolic" parse, e.g., the e r r o r recovery a c t i o n s are performed upon a "symbolic" i n p u t s t r i n g and a "symbolic" pushdown stack and the "symbolic" p a r s e r parses the "symbolic" i n p u t s t r i n g . The "symbolic" i n p u t s t r i n g and the "symbolic" pushdown s t a c k are merely c o p i e s of the a c t u a l i n p u t s t r i n g and pushdown stack, r e s p e c t i v e l y . The r e s u l t i s that we may measure the e f f e c t i v e n e s s of an a c t i o n 68 without performing semantic a c t i o n s and modifying the a c t u a l i n p u t s t r i n g and pushdown s t a c k . The symbolic p a r s e r w i l l a l s o have the p r o p e r t y that i t w i l l o n l y parse u n t i l another syntax e r r o r occurs or u n t i l a s p e c i f i e d l i m i t of symbols i n the "symbolic" i n p u t s t r i n g have been read, which ever comes f i r s t . The main reasons are t h a t other syntax e r r o r s may cause bad e r r o r recovery a c t i o n s to look good and v i c e versa, and t h a t the number of a c t i o n s which are attempted s y m b o l i c a l l y during each e r r o r r e c o v e r y attempt i s expensive i n terms of compute time. An e r r o r recovery a c t i o n i s meaningless u n l e s s we can e v a l u a t e i t s e f f e c t i v e n e s s i n r e l a t i o n to other e r r o r r e c o v e r y a c t i o n s . The e v a l u a t i o n of an a c t i o n may be based on any one of a number of v a r i a b l e s . In t h i s implementation, we have used a) the number of symbols read, b) the number of r e d u c t i o n s performed, and c) a weight f a c t o r f o r each a c t i o n and/or symbol. On f i r s t c o n s i d e r i n g s y n t a c t i c e r r o r recovery, i t seemed q u i t e n a t u r a l t o decide the " b e s t " e r r o r recovery a c t i o n was t h a t a c t i o n f o r which the symbolic parser read the most symbols. A f t e r a s h o r t p e r i o d of time, we came to understand t h a t t h i s was wrong, t h a t the " b e s t " a c t i o n cannot be evaluated with such a coarse measure because the " b e s t " a c t i o n may not allow as many symbols to be read d u r i n g the "symbolic" parse as another a c t i o n . T h i s i s due i n p a r t to the f a c t t h a t the " b e s t " a c t i o n c o r r e c t e d the c u r r e n t s y n t a c t i c e r r o r , but d i d not c o r r e c t a s y n t a c t i c e r r o r f u r t h e r down the input s t r i n g . However, the other a c t i o n modified the i n p u t s t r i n g or pushdown s t a c k such 69 that the s y n t a c t i c e r r o r f u r t h e r down the i n p u t s t r i n g disappeared. a d d i t i o n a l l y , the s y n t a c t i c and semantic meaning of the r e s u l t i n g program d i d not resemble what the programmer had intended. ' The e v a l u a t i o n o f an a c t i o n should c o n s i d e r the s y n t a c t i c s t r u c t u r e of the program. I t i s unreasonable to have to " a n a l y z e " the phrase s t r u c t u r e s c o n s t r u c t e d to t h i s p o i n t and the phrase s t r u c t u r e s which may be c o n s t r u c t e d and e v a l u a t e an e r r o r recovery a c t i o n based on these phrase s t r u c t u r e s . Thus, we should use i n f o r m a t i o n which i s a v a i l a b l e from the "symbolic" parse. A measure can be made of the number of r e d u c t i o n s performed duri n g a "symbolic" parse. The e v a l u a t i o n i s then the number of r e d u c t i o n s added to the number of symbols read. We c o u l d a l s o c o n s i d e r the number of s t a t e names popped f o r each r e d u c t i o n , but i t does not appear too obvious what the b e n n e f i t s , i f any, are. The e v a l u a t i o n measure thereby produces more emphasis on the s y n t a c t i c s t r u c t u r e , but i t has the bad s i d e e f f e c t of emphasizing a c t i o n s which cause a cascade of r e d u c t i o n s . We can use weights to counter the bad s i d e e f f e c t s . The weights are a f u n c t i o n of the symbol i n s e r t e d , d e l e t e d or r e p l a c e d or a f u n c t i o n of the pop a c t i o n . In a computer programming language, we a t t a c h s y n t a c t i c and semantic import to the symbols i n the language. T h i s i s merely a ranking and we can a s s i g n weights to the symbols i n the language ac c o r d i n g to t h e i r r a n k i n g . For example, i n an ALGOL 68 program, we may have 70 a c h o i c e of two a c t i o n s : i n s e r t a tag symbol or i n s e r t a s k i p symbol. From knowledge of the language, we know that i t i s s a f e r to i n s e r t a s k i p symbol than to i n s e r t a tag symbol. T h e r e f o r e , the s k i p symbol w i l l have a higher weight than the tag symbol. We apply the weights to the symbols i n the i n p u t stream as follows:, i f the symbol i s i n s e r t e d then the weight measure i s the weight value f o r t h a t symbol, i f a sequence of symbols are d e l e t e d then the weight measure i s the sum of the weight values of the symbols and i f a symbol i s r e p l a c e d by another then the weight measure i s the sum of the weights of the two symbols. We can g i v e a constant weight measure f o r the pop a c t i o n . There i s one major flaw i n the above procedure. I t i s t h a t the same emphasis i s placed on a symbol when i t i s being i n s e r t e d as w e l l as when i t i s being d e l e t e d . T h i s has a bad e f f e c t on the e r r o r recovery a c t i o n s chosen. In some s y n t a c t i c a l l y i n c o r r e c t ALGOL 68 programs, i t was found t h a t the e r r o r r e c o v e r y a c t i o n s chosen were d e l e t e a c t i o n s i n which a great deal of semantic i n f o r m a t i o n was l o s t . An a n a l y s i s of the i n d i v i d u a l e r r o r r e c o v e r y a c t i o n s i n d i c a t e d t h a t too p o s i t i v e a weight value was being assigned t o the symbols being d e l e t e d . To remedy t h i s , we . use two weight t a b l e s . The f i r s t weight t a b l e F1 i s used f o r symbols being d e l e t e d or r e p l a c e d . The second weight t a b l e F2 i s used f o r symbols being i n s e r t e d or used t o r e p l a c e another symbol. In summary* the weight f a c t o r f o r the pop a c t i o n i s -10. 71 A l l other weight f a c t o r s are sums of values of f u n c t i o n s F1 and/or F2. F u n c t i o n s F1 and F2 used f o r the ALGOL 68 implementation are given i n f i g u r e 3.1. The weight f a c t o r f o r a d e l e t e a c t i o n iis the sum of the valu e s of F1 given the symbols to be d e l e t e d . The weight f a c t o r f o r an i n s e r t a c t i o n i s the value of F2 given the symbol t o be i n s e r t e d . The weight f a c t o r f o r a r e p l a c e a c t i o n i s the sum of F1 given the symbol to be r e p l a c e d and F2 given the symbol r e p l a c i n g the symbol i n e r r o r . The two weight f u n c t i o n s are s u p p l i e d by the compiler w r i t e r and may be adusted t o o b t a i n the best r e s u l t s r e g a r d i n g the semantic and s y n t a c t i c import of the symbols. | NOM | L _ J SYMBOL — r -I i _ F1 | _L F2 r i T — T 1 1 I AGAIN | -10 j -10 1 2 | AT | -1 I 0 1 3 | BECOMES | -1 I 1 1 4 1 BEGIN | -10 | -10 ! 5 | BITS DENOTATION -1 I 0 1 6 ] BITS j -1 I 0 | 7 | BOOL | -1 I 4 1 8 | BUS | -10 i -JO 1 9 1 BY | -5 | 0 1 10 | BYTES J -1 I 0 ( 1 1 | CASE j -10 J -10 I 12 | CHAR | -1 | 0 1 13 | CLOSE -10 | -10 | 14 | COLON | -1 I 4 I 15 | COMMA j -1 I 6 J 16 | COMPL | -1 I 0 i 17 | COMPLETER J -5 J -5 I 18 | DECLARER OPEN j -10 | -10 1 19 I DO | -5 | 1 I 20 | EITHER -1 I 0 I 21 | ELSE | -10 | -10 I 22 | ELSF j -10 | -10 I 23 | EMPTY } -1 I 0 I 24 | END | -10 | -10 J 25 | EQUALS -1 1 1 I 26 | ESAC | -10 | -10 I 27 | FALSE | -1 I 0 L X , . ; 1 . _ X J F i g u r e 3.1 continued on next page 1 NU>M I SYMBOL r- 28 — | — — — — — — • 1 FI 29 | FILE 30 | FLEX 31 | FOR 32 | FORMAT BEGIN 33 | FORMAT END 34 | FORMAT 35 | FROM 36 | GO ON 37 | GO TO 38 | HEAP 39 | IF 40 | IN 41 | INTEGRAL DENOTATION 42 \ INTEGRAL 43 | IS NOT 44 1 i s 45 | LETTER A 46 | LETTER B 47 j LETTER C 48 | LETTER D 49 | LETTER E 50 | LETTER F 51 | LETTER 6. 52 j LETTER I 53 | LETTER K 54 | LETTER L 55 j LETTER N 56 | LETTER P 57 | LETTER R 58 | LETTER S 59 | LETTER T 60 1 LETTER O 61 I LETTER V 62 J LETTER X 63 1 LETTER. Y 64 | LETTER Z 65 | LOCAL 66 | LONG BYTES 67 | LONG COMPL 68 | LONG LONG BYTES 69 | LONG LONG LONG BYTES 70 | LONG LONG LONG LONG BYTES 71 | LONG REAL DENOTATION 72 | LONG REAL 73 | MINOS 74 | MODE INDICATION 75 | MODE 76 I NIL F1 10 IO 10 10 10 10 -5 -1 F2 | -10 ^ 0 0 0 -10 -10 0 0 2 0 0 -10 -10 0 0 0 0 0 0 0 0 0 -10 -10 0 0 0 -10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F i g u r e 3.1 continued on next page I NUM | I- +-SYMBOL -T-J F1 7 7 | OF I - 1 0 7 8 | OPEN | - 1 0 - 1 0 7 9 | OPERATION I - 5 0 8 0 \ OPERATOR I - 1 0 8 1 | OUSE | - 1 0 - 1 0 8 2 | OUT I - 1 0 | - 1 0 8 3 | PAR | - 1 0 - 1 0 8 4 | PER | - 1 0 - 1 0 8 5 | PLUS I - 1 0 8 6 | POINT I - 1 0 8 7 | PRIORITY I - 5 0 8 8 | PRIORITY 1 OPERATOR I ~ 1 - 1 6 8 9 | PRIORITY, 2 OPERATOR I 1 - 1 6 9 0 | PRIORITY 3 OPERATOR 1 ~ 1 | - 1 6 9 1 | PRIORITY 4 OPERATOR I -1 - 1 6 9 2 | PRIORITY 5 OPERATOR I ~ 1 - 1 6 9 3 } PRIORITY, 6 OPERATOR I -1 - 1 6 9 4 | PRIORITY 7 OPERATOR I "* 1 - 1 6 9 5 | PRIORITY. 8 OPERATOR I -1 - 1 6 9 6 | PRIORITY 9 OPERATOR j - 1 - 1 6 9 7 J PROC I -1 0 9 8 | REAL DENOTATION I _ 1 0 9 9 | REAL I -1 0 1 0 0 | REF I ~ 1 0 1 0 1 J REP | - 1 0 - 1 0 1 0 2 | REPLICATE ALIGNMENT | - 1 0 - 1 0 1 0 3 | REPLICATE LITERAL | - 1 0 - 1 0 1 0 4 | SEMA I - 1 0 1 0 5 \ SEIAL OPEN | - 1 0 - 1 0 1 0 6 | SHORT BITS DENOTATION I - 1 0 1 0 7 | SHORT BITS I - 1 0 1 0 8 J SHORT BYTES I - 1 0 1 0 9 | SHORT INTEGRAL DENOTATION I - 1 0 1 1 0 | SHORT INTEGRAL I - 1 0 1 1 1 | SKIP I -1 1 0 1 1 2 | START | - 1 0 - 1 0 1 1 3 J STOP I ^ 5 0 - 5 0 1 1 4 | STRING DENOTATION j - 1 0 1 1 5 f STRING J - 1 0 1 1 6 | STRUCT I - 5 0 1 1 7 | SUB | - 1 0 - 1 0 1 1 8 | TAG I " 7 5 1 1 9 | THELSE | - 1 0 - 1 0 1 2 0 | THEN | - 1 0 - 1 0 1 2 1 | TO I -1 0 1 2 2 | TRUE I -1 0 1 2 3 | UNION I - 5 0 1 2 4 | VOID I ~ 1 0 1 2 5 | WHILE | - 1 0 - 1 0 F2 F i g u r e 3.1 74 ' 3.2 E r r o r Recovery Algorithm The e r r o r recovery mechanism i s based on the p r i n c i p l e t h a t the parser can p r o v i d e f a r more e f f e c t i v e e r r o r recovery f o r a l l e r r o r s t h a t may a r i s e than an ad hoc e r r o r r e c o v e r y mechanism which i s hand coded f o r the "most l i k e l y " e r r o r s that w i l l occur. We c o n s t r u c t a parse r which i s a copy of the p r e v i o u s l y d e s c r i b e d p a r s e r ; but i t i s so modified that the new parser does not a f f e c t , i n any manner, the s t a t u s of the o r i g i n a l parser (or the semantics c o n s t r u c t e d so f a r ) during a "symbolic" parse. The a l g o r i t h m i s as f o l l o w s : Step 1: make a symbolic copy of the symbol s t r i n g . Step 2: make a symbolic copy of the pushdown s t a c k . Step 3: attempt an e r r o r recovery a c t i o n by m o d i f i c a t i o n of the symbolic s t r i n g or the symbolic s t a c k , Step 4: parse the symbolic s t r i n g using the symbolic s t a c k (parse, at most, up to (5 • number i n s e r t e d ) symbols or u n t i l a syntax e r r o r i s encountered), Step 5: e v a l u a t e the a c t i o n by summing the number of symbols s y m b o l i c a l l y read, the number of r e d u c t i o n s s y m b o l i c a l l y performed and, the weight f a c t o r a s s o c i a t e d with that a c t i o n ( f u n c t i o n s F1 and F2). I f the symbolic parse d i d not read the exposed symbol then the r e s u l t of the e v a l u a t i o n i s -10; otherwise, i f the symbolic parse read the symbol a f t e r the exposed symbol and 5 or more symbols have a l r e a d y been i n s e r t e d then add the number of symbols read to the e v a l u a t i o n r e s u l t again. 75 T h i s a l g o r i t h m i s repeated f o r each a c t i o n and a t a b l e of e v a l u a t i o n s i s c o n s t r u c t e d . The t a b l e i s searched f o r the best a c t i o n (highest numerical value) and that a c t i o n i s e f f e c t e d upon the symbol s t r i n g or pushdown stack. The order of search i s to c o n s i d e r i n s e r t i o n , replacement, d e l e t i o n and pop a c t i o n s i n t h a t order. I f two a c t i o n s have the same e v a l u a t i o n r e s u l t then the f i r s t one i s chosen. For the grammar given i n the APPENDIX, i t was found d e s i r a b l e t o not allow the same symbol to be i n s e r t e d more than once i n the same i n s e r t i o n sequence u n l e s s that symbol i s the only c h o i c e . The main reason i s simply to deter i n f i n i t e i n s e r t i o n sequences. Step 6: i f the a c t i o n was the d e l e t i o n of one or more symbols, the popping of one or more s t a t e s , or the replacement of the symbol i n e r r o r by another symbol, then the a c t u a l parse i s resumed; otherwise, go to step 7. Step 7: perform a symbolic parse of the modified symbol s t r i n g . I f the symbol i n e r r o r i s read then resume the a c t u a l parse; otherwise, go to st e p 8., Step 8: i f more than 10 symbols have been i n s e r t e d then stop p a r s i n g and r e p o r t f a i l u r e ; otherwise go to step 9. Step 9: the symbol i n e r r o r remains the symbol i n e r r o r . Repeat steps 1 t o 6 attempting d e l e t i o n , i n s e r t i o n and replacement, but not stack popping. 3.3 R e s u l t s Obtained The above a l g o r i t h m was a p p l i e d to an LALR (3) parser f o r ALGOL 68. A f t e r some experimenting with the weights, i t was 76 found t h a t f a i r l y reasonable e r r o r recovery was obtained. Some examples f o l l o w : There are t h r e e e r r o r s i n the f o l l o w i n g program: begin i n t b; r e a d ( b ) ; £roc g = (proc void p ) v o i d : ( { b <= 0 | p |: b > 5 | b -:= 6 #; i s miss i n g * q(p) | b •:= 1; g(££oc void #: i s missing* 12) ) ; 12: #unit i s missing* ) ; 9(££22 v o i d : 11); 11: skip_ end. E r r o r r e c o v e r y i s attempted. The e v a l u a t i o n r e s u l t s f o r the f i r s t e r r o r are given i n f i g u r e 3.2 (column two e n t r i e s f o r i n s e r t and r e p l a c e a c t i o n s have a matching name i n f i g u r e 3.1). 77 | ACTION | SYMBOL (S) | READ I 1 REDUCED | WEIGHT I t RESULT r - -| INSERT 1 1 I 5 —-J— _ _ j 2 4 | - 1 0 - + — 19 | INSERT I 2 | 0 J 14 | 0 | - 1 0 J INSERT I 3 | 5 J 1 9 | 1 j 2 5 \ INSERT I 8 | 0 1 14 | - 1 0 ) - 1 0 | INSERT I 9 | 0 J 14 | 0 j - 1 0 | INSERT I 1 3 | 1 1 2 2 | - 1 0 | 13 | INSERT I 14 I 0 J 14 | 4 | - 1 0 J INSERT I 1 5 | 0 1 14 | 6 | - 1 0 | INSERT I 17 I 2 1 1 6 1 - 5 1 3 | INSERT I 19 | 0 1 1 7 | 1 | - 1 0 | INSERT I 2 1 | 0 1 1 7 J - 1 0 | - 1 0 | INSERT I 2 2 | 0 j 17 | - 1 0 - 1 0 | INSERT I 2 4 | 0 1 1 7 | - 1 0 j - 1 0 | INSERT I 2 6 | 0 J 17 | - 1 0 j - 1 0 j INSERT I 2 8 | 0 1 17 | - 1 0 | - 1 0 J INSERT I 3 6 | 5 J 2 1 | 2 | 2 8 J INSERT I 4 0 | 0 J 17 | - 1 0 j - 1 0 | INSERT I 4 3 | 5 1 19 | 0 | 2 4 J INSERT I 4 4 | 5 1 1 9 | 0 j 2 4 | INSERT I 7 8 | 5 J 7 I - 1 0 2 | INSERT I 8 1 | 0 1 14 | - 1 0 | - 1 0 | INSERT I 8 2 | 0 1 14 | - 1 0 - 1 0 | INSERT I 8 4 | 0 1 17 | - 1 0 j - 1 0 | INSERT I 8 8 | 5 1 19 | - 1 6 | 8 I INSERT I 8 9 | 5 1 1 7 | - 1 6 j 6 | INSERT I 9 0 | 5 j 16 | - 1 6 | 5 | INSERT I 9 1 | 5 1 1 5 | - 1 6 | 4 | INSERT I 9 2 | 5 J 14 | - 1 6 3 | INSERT I 9 3 | 5 j 13 | - 1 6 | 2 | INSERT I 9 4 ) 5 1 12 | - 1 6 | 1 J INSERT I 9 5 | 5 I 11 I - 1 6 I 0 1 INSERT I 9 6 | 5 1 1 0 J - 1 6 j - 1 | INSERT I 1 0 1 | 0 J 1 7 | - 1 0 j - 1 0 | INSERT I 1 1 7 | 5 1 7 I - 1 0 | 2 | INSERT I 1 1 9 | 5 ) 2 4 | - 1 0 | 1 9 J INSERT I 1 2 0 J 0 1 17 | - 1 0 | - 1 0 1 INSERT I 1 2 1 | 0 J 14 j 0 - 1 0 | INSERT I 1 2 5 | 0 1 14 | - 1 0 | - 1 0 | REPLACE I 1 I 5 J 3 1 I - 1 7 19 | REPLACE l 2 | 0 1 14 | - 7 | - 1 0 J REPLACE I 3 | 5 1 2 8 | - 6 | 2 7 | REPLACE I 8 | 0 J 14 | - 1 7 j - 1 0 | REPLACE I 9 | 0 J 14 | - 7 j - 1 0 | REPLACE 1 1 3 | 5 J 3 7 | - 1 7 j 2 5 | REPLACE I 14 I 0 1 14 | - 3 | - 1 0 J REPLACE I 1 5 \ 0 1 14 | - 1 | - 1 0 | REPLACE I 1 7 | 1 I 1 6 | - 1 2 | 5 1 REPLACE I 19 | 0 1 1 7 | - 6 j - 1 0 | REPLACE I 2 1 | 0 1 17 | - 1 7 I - 1 0 i i i A. 1 _j L .. _ __ _ F i g u r e 3.2 continued on next page 78 ACTION | SYMBOL (S) | READ J REDUCED | WEIGHT | RESULT i .j :. ^ i _ - | — — - j — — — . __ REPLACE | 2 2 | 0 I 1 7 ! - 1 7 | - 1 0 REPLACE 1 2 4 | 0 I 1 7 | - 1 7 } - 1 0 REPLACE 1 2 6 | 0 I 17 | - 1 7 | - 1 0 REPLACE | 2 8 | 0 I 17 l - 1 7 | - 1 0 REPLACE 1 3 6 j 5 I 2 8 | - 5 | 2 8 REPLACE 1 4 0 | 0 1 17 | - 1 7 \ - 1 0 REPLACE 1 4 3 | 5 1 2 7 | - 7 | 2 5 REPLACE 1 4 4 | 5 i 2 7 | - 7 | 2 5 REPLACE | 7 8 | 4 1 1 0 | - 1 7 | - 3 REPLACE 1 8 1 | 0 | 14 | - 1 7 | - 1 0 REPLACE I 8 2 | 0 | 14 | - 1 7 l - 1 0 REPLACE I 8 4 { 0 | 17 | - 1 7 | - 1 0 REPLACE | 8 8 | 5 I 3 6 | - 2 3 | 18 REPLACE I 8 9 | 5 I 3 5 | - 2 3 | 17 REPLACE I 9 0 | 5 I 3 4 | - 2 3 | 16 REPLACE I 9 1 | 5 | 3 3 | - 2 3 I 1 5 REPLACE I 9 2 I 5 | 3 2 | - 2 3 | 14 REPLACE I 9 3 J 5 I 3 1 | - 2 3 | 13 REPLACE 1 9 4 | 5 1 3 0 | - 2 3 | 1 2 REPLACE 1 9 5 | 5 1 2 9 | - 2 3 | 11 REPLACE 1 9 6 | 5 1 2 8 | - 2 3 | 1 0 REPLACE 1 1 0 1 | 0 I 17 | - 1 7 | - 1 0 REPLACE I 1 1 7 1 4 I 10 | - 1 7 | - 3 REPLACE 1 1 1 9 J 4 I 3 3 I - 1 7 J 2 0 REPLACE I 1 2 0 | 0 | 1 7 | - 1 7 J - 1 0 REPLACE I , 1 2 1 | 0 I 14 | - 7 J - 1 0 REPLACE I 1 2 5 | 0 I 14 | - 1 7 | - 1 0 DELETE I 1 I 5 I 2 4 | - 7 l 2 2 DELETE I 2 I 0 I 0 | - 1 7 | - 1 0 DELETE I 3 | 5 1 4 2 | - 2 4 | 2 3 DELETE I 4 | 5 1 4 5 | - 3 4 | 16 DELETE | 5 | 0 1 0 | - 4 4 | - 1 0 POP I 1 I 5 1 2 5 1 - 1 0 | 2 0 POP I 2 | 0 1 0 | - 1 0 J - 1 0 POP I 3 | 5 1 1 5 | - 1 0 | 10 POP I 4 | 0 I 0 | - 1 0 | - 1 0 POP I 5 I 5 1 1 5 I - 1 0 | 10 i 1 i . — J i — ; i a . i F i g u r e 3.2 The a c t i o n s e l e c t e d i s i n s e r t 36 or i n s e r t a go on symbol (•';") . The e v a l u a t i o n r e s u l t s f o r the second e r r o r are giv e n i n f i g u r e 3.3. The a c t i o n s e l e c t e d i s i n s e r t 14 or i n s e r t a co l o n (":"). The e v a l u a t i o n r e s u l t s f o r the t h i r d e r r o r are given i n f i g u r e 3.4. The a c t i o n s e l e c t e d i s i n s e r t 111 or i n s e r t a s k i p 79 symbol ("sjcijo"). The s y n t a c t i c ^ parse terminated s u c c e s s f u l l y . | ACTION |SYMBOL(S)1 READ | REDUCED | WEIGHT | RESULT j. 1 -|— - + h - +-| INSERT 1 4 | 2 1 8 | -10 | 0 | INSERT 1 11 I 2 1 8 | -10 | 0 I INSERT I I t 1 5 1 27 | 4 I 36 | INSERT 1 39 | 2 ! 8 | -10 | • 0 | INSERT 1 78 j 5 1 19 | -10 | 14 | INSERT 1 83 | 1 I 0 | -10 | -9 J INSERT 1 105 | 5 1 23 | -10 | 18 | REPLACE I 4 | 1 t 0 | -17 | -16 | REPLACE I 11 1 1 1 0 | -17 | -16 | REPLACE 1 14 | 1 1 1 I -3 | -1 J REPLACE 1 39 | 1 I 0 | -17 | -16 | REPLACE 1 78 | 5 I 15 | -17 | 3 | REPLACE I 83 | 1 I 0 | -17 | -16 | REPLACE 1 105 | 1 ! 0 | -17 | -16 | DELETE I 1 I 0 I 0 | -7 | -10 | DELETE I 2 | 0 I 0 | -17 | -10 | DELETE I 3 | 0 I 0 | -27 | -10 | DELETE I 4 | 0 I o t -28 | -10 j DELETE I 5 J 5 I 14 | -35 | -16 I POP | 1 | 5 I 28 | -10 | 23 | POP I 2 | 0 I 0 | -10 | -10 | POP 1 3 | 5 | 28 | -10 | 23 1 POP ! 4 | 5 I 28 | -10 | 23 I POP 1 5 J 0 1 0 | -10 | -10 Fi g u r e 3.3 [ ACTION L — _ — — — — TsYMBOL (S)T READ — i — 1 j REDUCED T WEIGHT T RESU | INSERT j | I 4 | 1 — + -0 _j_ | 1 -10 | -9 | INSERT I 5 | 5 | 24 1 0 | 29 | INSERT 1 6 | 1 | 2 1 0 | 3 | INSERT 1 7 | 1 | 2 1 4 | 7 | INSERT 1 9 | 1 0 1 0 | 1 | INSERT 1 10 | 1 | 2 1 0 | 3 | INSERT | 11 | 1 | 0 1 -10 | -9 | INSERT | 12 | 1 j 2 i 0 | 3 | INSERT 1 16 | 1 | 2 1 0 | 3 | INSERT I 18 | 2 | 2 1 "10 | -6 | INSERT | 19 | 1 | 0 I 1 I 2 | INSERT 1 20 | 1 | 0 I 0 | 1 | INSERT 1 23 | 5 | 24 I 0 | 29 | INSERT 1 27 | 5 | 25 I 0 | 30 | INSERT | 29 | 1 I 2 I 0 | 3 | INSERT 1 30 | 1 | 0 | 0 | 1 | INSERT 1 31 | 1 0 I o | 1 F i g u r e 3.4 continued on next page 80 1 ACTION ISYjMBOL ( r INSERT t 32 INSERT 1 34 INSERT 1 35 INSERT 1 37 INSERT 1 38 INSERT 1 39 INSERT I 41 INSERT | 42 INSERT | 65 INSERT f 66 INSERT I 67 INSERT | 68 INSERT I 69 INSERT | 70 INSERT I 71 INSERT | 72 INSERT | 74 INSERT I 76 INSERT | 78 INSERT | 80 INSERT I 83 INSERT | 88 INSERT | 89 INSERT 1 90 INSERT | 91 INSERT | 92 INSERT J 93 INSERT | 94 INSERT | 95 INSERT | 96 INSERT | 97 INSERT | 98 INSERT | 99 INSERT | 100 INSERT | 101 INSERT | 104 INSERT | 105 INSERT | 106 INSERT | 107 INSERT J 108 INSERT | 109 INSERT | 110 INSERT I 111 INSERT | 114 INSERT 1 115 INSERT | 116 INSERT | 117 INSERT I 118 INSERT | 121 INSERT I 122 READ | REDUCED | WEIGHT 5 1 5 A .-| — _ _ _ _ 0 I "10 I -9 2 I 0 | 3 0 I 0 | 1 0 I 0 | 1 0 I 0 | 1 0 I -10 | -9 24 I 0 | 29 2 I 0 | 3 0 I 0 | 1 2 I 0 | 3 2 I 0 ] 3 2 I 0 | 3 2 I 0 | 3 2 I 0 | 3 24 | 0 | 29 2 t 0 | 3 1 I 0 | 2 23 I o 1 28 8 1 - i o | 3 1 1 0 | 2 0 1 -10 | -9 1 I -16 | -14 1 I -16 | -14 1 I -16 J -14 1 I "16 | -14 1 I - 1 6 | -14 1 1 "16 | -14 1 I -16 j -14 1 I "16 I -14 1 I -16 | -14 0 I o | 1 24 I 0 | 29 2 I 0 J 3 0 1 0 | 1 0 1 "10 | -9 2 I 0 | 3 0 I "10 | -9 24 I 0 | 29 2 I 0 | 3 2 I 0 | 3 24 I 0 | 29 2 \ 0 | 3 23 I 10 | 38 24 I 0 | 29 2 I 0 | 3 0 I 0 | 1 0 I - i o | -9 23 1 5 | 33 0 1 0 | 1 25 1 0 | 30 F i g u r e 3.4 continued on next page 81 j ACTION |SYMBOL (S) | HEAD | REDUCED | WEIGHT | RESULT | " _| 1 j. -| -Lj _-| 1 - j — — INSERT | 123 1 1 I 0 I 0 I 1 INSERT | 124 | 1 | 2 | 0 I 3 INSERT | 125 I 1 I 0 I -10 | -9 REPLACE I 4 I 1 I 0 I -20 | -19 REPLACE | 5 I 5 | 7 I -10 | 2 REPLACE I 6 I 1 I 2 I -10 | -7 REPLACE I v I 1 I 2 I -6 | -3 REPLACE I 9 I 1 I 0 1 -10 | -9 REPLACE I 10 I 1 I 2 1 -10 | -7 REPLACE I 11 I 1 I 0 1 -20 J -19 REPLACE I 12 I 1 I 2 1 -10 j -7 REPLACE I 16 I 1 1 2 I -10 | -7 REPLACE | 18 1 1 I 0 I -20 \ -19 REPLACE I 19 I 1 I 0 1 -9 | -8 REPLACE | 20 I 1 I 0 1 -10 | -9 REPLACE | 23 I 5 | 7 1 -10 | 2 REPLACE | 27 I 5 | 8 1 -10 | 3 REPLACE | 29 1 1 I 2 1 -10 | -7 REPLACE | 30 I 1 I 0 1 -10 | -9 REPLACE I 31 I 1 I 0 1 -10 | -9 REPLACE | 32 I 1 I 0 1 -20 | -19 REPLACE I 34 I 1 I 2 1 -10 | -7 REPLACE I 35 I 1 I 0 1 -10 | -9 REPLACE I 37 l 1 1 0 1 -10 \ -9 REPLACE | 38 1 1 I 0 1 -10 | -9 REPLACE I 39 | 1 | 0 1 -20 | -19 REPLACE 1 41 I 5 | 7 I -10 | 2 REPLACE | 42 I 1 i 2 1 -10 | -7 REPLACE | 65 | 1 | 0 1 -10 | -9 REPLACE | 66 I 1 I 2 1 -10 \ -7 REPLACE I 67 I 1 I 2 1 -10 | -7 REPLACE | 68 | 1 | 2 1 -10 | -7 REPLACE | 69 I 1 I 2 1 -10 | -7 REPLACE | 70 I 1 I 2 1 -10 | -7 REPLACE I 71 1 5 | 7 I -10 | 2 REPLACE | 72 1 1 I 2 i -10 | -7 REPLACE | 74 | 1 | 1 | -10 | -8 REPLACE I 76 I 5 | 6 I -10 | 1 REPLACE | 78 I 1. I 0 I -20 1 -19 REPLACE | 80 | 1 j 1 I -10 1 -8 REPLACE | 83 I 1 I 0 | -20 1 -19 REPLACE j 88 I 1 I 1 | -26 | -24 REPLACE | 89 I 1 I 1 | -26 | -24 REPLACE | 90 I 1 I 1 I -26 | -24 REPLACE 1 91 I 1 I 1 | -26 | -24 REPLACE | 92 I 1 I 1 1 -26 | -24 REPLACE I 93 I 1 I 1 I -26 | -24 REPLACE | 94 I 1 I 1 I -26 | -24 REPLACE I 95 1 I -26 J t -24 F i g u r e 3.4 continued on next page 82 | ACTIOS | SYjMBO'L (S) j READ | REDUCED | WEIGHT \ RESULT REPLACE 1 96 | 1 | 1 -26 | -24 REPLACE I 97 ] 1 | 0 -10 I -9 REPLACE 1 98 J 5 I 7 -10 | 2 REPLACE 1 99 | 1 | 2 -10 | -7 REPLACE \ 100 | 1 | 0 -10 | -9 REPLACE 1 101 | 1 1 0 -20 | -19 REPLACE I 104 I 1 | 2 -10 J -7 REPLACE I 105 | 1 | 0 -20 | -19 REPLACE I 106 | 5 J 7 -10 | 2 REPLACE I 107 | 1 J 2 -10 | -7 REPLACE I 108 ) 1 | 2 -10 | -7 REPLACE I 109 | 5 | 7 -10 | 2 REPLACE I 110 | 1 1 2 -10 J -7 REPLACE I 1 1 1 1 5 1 6 0 I 11 REPLACE | 114 | 5 J 7 -10 I 2 REPLACE I 115 1 1 I 2 -10 | -7 REPLACE 1 116 | 1 | 0 -10 I -9 REPLACE 1 117 | 1 J 0 -20 J -19 REPLACE 1 118 1 5 I 6 -5 | 6 REPLACE 1 121 | 1 | 0 ' -10 I -9 REPLACE I 122 | 5 I 8 -10 j 3 REPLACE I 123 | 1 | 0 -10 ! -9 REPLACE I 124 | 1 | 2 -10 | -7 REPLACE I 125 | 1 | 0 -20 | -19 DELETE I 1 I 0 I 0 -10 I -10 DELETE I 2 | 5 | 6 -11 I 0 DELETE I 3 | 5 | 5 -18 | -8 DELETE I 4 | 5 I 15 -28 | -8 DELETE I 5 | 5 I 27 -29 j 3 POP I 1 I 0 I 0 -10 | -10 POP I 2 | 5 | 11 -10 | 6 POP I 3 | 0 I 0 -10 | -10 POP I 4 | 0 I 0 -10 J -10 POP I 5 | 0 I 0 -10 | -10 L. i i, L. .... '- i _ ^ j . i ,_ „ ,.. . •• J. i -______ F i g u r e 3.4 83 CHAPTER 4 ALGOL 68 IMPLEMENTATION In t h i s chapter, we w i l l d i s c u s s the context f r e e grammar given i n the APPENDIX and the e f f e c t the grammar and the LR (k) parser have had on the design and s t r u c t u r e of the compiler, 4.1 The Grammar In 1968, the ALGOL 68 Report [R] was r e l e a s e d . Since that time, v a r i o u s changes to the language have been proposed. These changes are now being c o n s i d e r e d f o r i n t r o d u c t i o n i n t o the Revised Report on ALGOL 68. The ALGOL 68 implementation group at the U n i v e r s i t y of B r i t i s h Columbia have endevoured to remain i n touch with a l l the proposed changes and, i n p a r t i c u l a r , those which have been u n o f f i c i a l l y and o f f i c i a l l y accepted. The context f r e e grammar given i n the APPENDIX d e s c r i b e s the context f r e e s t r u c t u r e of the proposed language f o r the ALGOL 68 Revised Report. The CFG i s an LALR (3) grammar. The grammar c o n s i s t s of 444 prod u c t i o n s with 125 t e r m i n a l symbols and 153 nonterminal symbols. The parse r t a b l e s were generated i n 80 seconds compute time on an IBM Systems 360/67. The CFSM c o n s i s t s of 719 s t a t e s , of which 128 are inadequate and was generated i n 18 seconds compute time. The lookahead symbol s e t s were computed i n 42 84 seconds (with the lookahead symbol s e t s p r i n t e d out) using the LALR (k) a l g o r i t h m . Four inadequate s t a t e s r e q u i r e d three symbol lookahead, t h i r t y - f o u r r e q u i r e d two symbol lookahead and the r e s t only r e q u i r e d one symbol lookahead. The DPDA was c o n s t r u c t e d i n 3 seconds compute time. On o b s e r v a t i o n of the CFG, we n o t i c e that t h e r e are symbols i n the grammar which are not found i n the ALGOL 68 Report nor w i l l they be found i n the Revised Report. The " s t a r t " and " s t o p " symbols are e q u i v a l e n t to f- and •], r e s p e c t i v e l y , and are i n t r o d u c e d i n t o the i n p u t s t r i n g by the l e x i c a l a n a l y z e r of the compiler. There are no c h a r a c t e r r e p r e s e n t a t i o n s f o r the " s t a r t " and " s t o p " symbols. In production number 14 we f i n d a " s e r i a l open" symbol and i n p r o d u c t i o n number 255 we f i n d a " d e c l a r e r open", symbol, n e i t h e r of which can be found i n the ALGOL 68 Report. These symbols as w e l l as " r e p l i c a t e alignment" and " r e p l i c a t e l i t e r a l " symbols (productions 184 and 120, r e s p e c t i v e l y ) have been i n t r o d u c e d i n t o the grammar to make the grammar LR (k). However, t h e r e are no c h a r a c t e r r e p r e s e n t a t i o n s f o r them; t h a t i s to say, these symbols are " t r a n s f o r m a t i o n s " of other t e r m i n a l symbols. For example, " s e r i a l open" and " d e c l a r e r open" symbols are t r a n s f o r m a t i o n s of the "open" symbol, and " r e p l i c a t e alignment" and " r e p l i c a t e l i t e r a l " symbols are t r a n s f o r m a t i o n s of the " l e t t e r n" symbol. The t r a n s f o r m a t i o n i s a p p l i e d to an open symbol or a l e t t e r n symbol i n order that the context an a r b i t r a r y d i s t a n c e down the i n p u t s t r i n g i s known. For the open symbol, i t i s 85 d e s i r e a b l e to know, beforehand, how to parse the sequence of symbols ". (-a:b)n or "(£§§1 x, y". In the f i r s t case, " (a:b)" may be a s e r i a l c l a u s e i n which "a:" i s a l a b e l and "b" i s a u n i t or " (a:b)" may be f o l l o w e d by " r e a l " , i n which case we have an a c t u a l row of mode d e c l a r a t o r where "a" (a unit) i s the lower bound and "b" (a unit) i s the upper bound. " ( r e a l x, y" may be the s t a r t of a s e r i a l c l a u s e where "y" may be f o l l o w e d by the symbol sequence "; x := 3.1U; read ( y ) " or " ( r e a l x, y" may be the s t a r t of a r o u t i n e t e x t where "y" may be f o l l o w e d by " ) r e a l : x+y". I f i t i s a s e r i a l c l a u s e , then " r e a l " i s an a c t u a l d e c l a r e r ; whereas, i f i t i s the parameters p l a n f o r a r o u t i n e t e x t , then " r e a l " i s a fo r m a l d e c l a r e r . The semantics f o r " a c t u a l " and " f o r m a l " d e c l a r e r s are d i f f e r e n t . Thus, i f we f i n d a go on symbol (";") or a completion symbol ( " e x i t " ) , then the "open" symbol i s changed t o a " s e r i a l open" symbol and i f the matching " c l o s e " symbol i s f o l l o w e d by a row of mode d e c l a r a t o r , then the "open" symbol i s changed to a " d e c l a r e r open" symbol; otherwise, the "open" symbol remains unchanged. The same e x p l a n a t i o n f o l l o w s f o r the " l e t t e r n" symbol. Here i t i s d e s i r a b l e to know whether or not to apply c e r t a i n p r o d u c t i o n r u l e s or t o parse the dynamic r e p l i c a t o r . In order to determine what a c t i o n to take, we must know the context an a r b i t r a r y d i s t a n c e down the i n p u t s t r i n g , e.g., what t e r m i n a l symbol f o l l o w s the dynamic r e p l i c a t o r . I f we f i n d a " s t r i n g d e n o t a t i o n " symbol then we change the " l e t t e r n" symbol t o a " r e p l i c a t e l i t e r a l " symbol and i f we f i n d a " l e t t e r k", " l e t t e r 1", " l e t t e r p", " l e t t e r x" or " l e t t e r y" symbol then we change 86 the " l e t t e r n" symbol to a " r e p l i c a t e alignment" symbol; otherwise, the " l e t t e r n" symbol remains unchanged. 4.2 The Compiler Now t h a t we have a f e e l f o r the grammar, we can c o n s i d e r the s t r u c t u r e of the compiler. In the p r e v i o u s s e c t i o n , i t became apparent t h a t the s y n t a c t i c parse of a sentence i n the language could not commence u n t i l the l e x i c a l a n a l y s i s of the sentence was complete. We wish to mention here t h a t c e r t a i n i n f o r m a t i o n must be obtained from the l e x i c a l a n a l y s i s such that the parser can d i s t i n g u i s h an " o p e r a t o r i n d i c a t i o n " from a "mode i n d i c a t i o n " or a " p r i o r i t y o p erator i n d i c a t i o n " . Since and " o p e r a t o r " or "mode" d e c l a r a t i o n may occur b e f o r e or a f t e r an a p p l i e d occurrence o f the " o p e r a t o r " or "mode", i t r e q u i r e s two passes of the i n p u t s t r i n g to r e c o g n i z e and a d j u s t the t e r m i n a l symbols a c c o r d i n g l y . With t h a t i n mind, we can now d e s c r i b e and j u s t i f y the s t r u c t u r e of the U n i v e r s i t y of B r i t i s h Columbia ALGOL 68 compiler implementation. The compiler c o n s i s t s of f i v e passes where the f i r s t t h r e e passes each scan the i n p u t s t r i n g once and the l a s t two passes tour the r e s u l t a n t parse graph ( c a l l e d a •program t r e e 1 s e e 'Program Tree R e p r e s e n t a t i o n ' i n Chapter 2 of [ T ] ) . The f i v e passes are 1) L e x i c a l Analyzer and C o r r a l Scan, 2) Ambiguity Scan, 3) S y n t a c t i c Parse, 4) Mode Dependent Parse and 5) Code Generation. Each pass 1 w i l l be c a l l e d and executed to completion b e f o r e the next pass i s invoked. 87 Pass 1, the L e x i c a l Analyzer and C o r r a l Scan, performs the i n i t i a l a n a l y s i s of the input s t r i n g , an ALGOL 68 program. The l e x i c a l a n a l y z e r scans the program, c h a r a c t e r by c h a r a c t e r , from l e f t to r i g h t . The c h a r a c t e r s t r i n g s r e s u l t i n g from the a n a l y s i s are then handled a c c o r d i n g to whether they are den o t a t i o n s , tags or b o l d f a c e . A tag or b o l d f a c e may be looked up i n a t a b l e and i f i t i s not t h e r e then i t i s entered i n t o the t a b l e . A d e n o t a t i o n may be converted i n t o i n t e r n a l form (other than a s t r i n g denotation) and placed i n a t a b l e ( i f i t i s not a l r e a d y t h e r e ) . The r e s u l t of t h i s process i s c a l l e d a lexeme which i s an i n t e g e r p a i r c o n s i s t i n g of the r e p r e s e n t a t i o n and a c c e s s i o n numbers. The r e p r e s e n t a t i o n number i s the number which the parser r e c o g n i z e s and the a c c e s s i o n number i n d i c a t e s which member of t h a t r e p r e s e n t a t i o n ( i . e . , which r e a l d e n o t a t i o n ) . The lexeme i s passed t o the C o r r a l Scan which analyses the in p u t s t r i n g f o r ranges and r e c o g n i z e s mode, operator and p r i o r i t y d e c l a r a t i o n s . The C o r r a l Scan c o n s t r u c t s two a r r a y s of halfword i n t e g e r p a i r s . The f i r s t a r r a y i s the lexeme stream c o n s i s t i n g only of the r e p r e s e n t a t i o n and a c c e s s i o n number p a i r s . The second array i s an a u x i l i a r y a r r a y c o n t a i n i n g pragmatic i n f o r m a t i o n about the lexeme stream, e.g., the r e s u l t of the range a n a l y s i s . The mode, operator and p r i o r i t y d e c l a r a t i o n s are entered under the a p p r o p r i a t e b o l d f a c e . We are merely e n t e r i n g the f a c t t h a t t h e r e was a mode, operator or p r i o r i t y d e c l a r a t i o n f o r t h a t i n d i c a t i o n and, i n the case of a 88 p r i o r i t y d e c l a r a t i o n , we a l s o note the p r i o r i t y assigned. The second pass, the Ambiguity Scan, scans the lexeme stream one lexeme at a time from l e f t to r i g h t and uses the a u x i l i a r y stream to c o n s t r u c t the ranges w i t h i n the program. The purpose of the Ambiguity Scan i s to r e s o l v e "open" symbols and " l e t t e r n" symbols as d e s c r i b e d p r e v i o u s l y and to r e c o g n i z e the a p p l i e d occurrence of o p e r a t o r s . I f the o p e r a t o r i s dyadic then i t a s s i g n s i t s p r i o r i t y a c c o r d i n g to programmer d e f i n e d d e c l a r a t i o n s or a c c o r d i n g to the Standard Prelude [ R ] . Pass 3 i s the S y n t a c t i c Parse of the i n p u t s t r i n g . The p a r s e r w i l l parse the i n p u t s t r i n g (the, r e p r e s e n t a t i o n numbers i n the lexeme stream) and output a sequence of p r o d u c t i o n numbers. As each: p r o d u c t i o n number i s output, the semantic r o u t i n e s f o r c o n s t r u c t i n g the program t r e e are c a l l e d with the production number (see ' B u i l d i n g the Program T r e e ' i n Chapter 3 of [ T ] ) . At the end of the s y n t a c t i c parse, we have a program t r e e which r e p r e s e n t s the s y n t a c t i c a l s t r u c t u r e of the program. Before Pass 3 t e r m i n a t e s , the mode t a b l e i s " e g u i v a l e n c e d " [R ]. Pass 4 i s - the Mode Dependent Parse. We have performed a s y n t a c t i c parse independent of mode and now we must parse the program t r e e dependent on the modes of tags and o p e r a t o r s . The a p p l i e d occurrence of o p e r a t o r s and tags are i d e n t i f i e d and the program t r e e i s modified to s u i t with the a d d i t i o n a l i n f o r m a t i o n such as c o e r c i o n sequences, e.g., d e r e f e r e n c i n g , deproceduring, widening, e t c . 89 Pass 5 i s the Code Generation pass of the program t r e e . Here i t i s a s t r a i g h t forward tour of the program t r e e g e n e r a t i n g code as we proceed and using a l l the i n f o r m a t i o n s t o r e d w i t h i n the program t r e e as a r e s u l t of the pr e v i o u s f o u r passes. 90 CONCLUSION We have d i s c u s s e d the theory and implementation of the LB (k) p a r s i n g technique as d e s c r i b e d by De Remer [D] and we have implemented a p a r s e r f o r ALGOL 68, p r o v i d i n g i t with a powerful s y n t a c t i c e r r o r recovery mechanism. We f e e l c o n f i d e n t that the LR (k) p a r s i n g technique i s very powerful i n our a p p l i c a t i o n and has given us an e f f i c i e n t parser f o r a l a r g e and complex grammar. However, i t i s w e l l understood t h a t LR(k) p a r s i n g i s not the e n d - a l l of a l l p a r s i n g techniques and t h a t i t may someday be r e p l a c e d by another technique w i t h i n our implementation. The e r r o r recovery mechanism t h a t i s developed has shown to be q u i t e powerful and r e p r e s e n t s a groundwork f o r f u t u r e i n v e s t i g a t i o n . The f u n c t i o n s F1 and F2 do not have to be f i x e d and we f e e l i t would be an i n t e r e s t i n g A r t i f i c i a l I n t e l l i g e n c e p r o j e c t to determine optimum v a l u e s f o r these f u n c t i o n s ( p o s s i b l y i n r e l a t i o n to the programming h a b i t s of the programmers t h a t use i t ) . 91 BIBLIOGRAPHY A Anderson, T., Eve, J . , Horning, J . J . E f f i c i e n t LR(1) P a r s e r s . ACTA I n f o r m a t i c a 2, pp. 12-39(1973). D DeRemer, F r a n k l i n L. P r a c t i c a l T r a n s l a t o r s f o r LR(k) Languages. Ph.D. T h e s i s , Massachusetts I n s t i t u t e of Technology, Cambridge, Massachusetts, October, 1969. H S U Hopcroft, John E. and Ullman, J e f f r e y D. Formal Languages and t h e i r R e l a t i o n to Automata. Addison-Wesley, 1969. K Knuth, D.E. On the T r a n s l a t i o n o f Languages From L e f t to R i g h t . Information C o n t r o l 8, October, 1965, pp. 607-639. R van Wijngaarden, A. ( E d i t o r ) , M a i l l o u x , B.J., Peck, J.E.L., and Koster, C.H.A. Report on the A l g o i t h m i c Language ALGOL 68. Numerische Mathematik, m, 1969, pp. 79-218. T Thomson, J . D. Mode Checking i n ALGOL 68. MSc. T h e s i s , U n i v e r s i t y of B r i t i s h Columbia, August, 1973. W Wirth* N. PL360, "A Programming Language f o r the 360 Computers". JACM 15(1968) 37. 92 APPENDIX See s e c t i o n 2.3 f o r a d e s c r i p t i o n of the grammar format and see chapter 4 f o r d e t a i l s about the grammar. (001) program (001): s t a r t symbol, p a r t i c u l a r program(002), stop symbol. (002) p a r t i c u l a r program(001,002,003): l a b e l sequence (004), enclosed clause(007) ; (003) enclosed c l a u s e (007). (004) l a b e l sequence (002,004,005,005,013,406,407): l a b e l (006) ; (005) l a b e l sequence (004), l a b e l (006). (006) l a b e l (004,005,006,404): tag symbol, c o l o n symbol. (007) enclosed clause(002,003,007,008,009,010,011,036,120,170, 188,195,214): c l o s e d c l a u s e (012); (008) c o l l a t e r a l c l a u s e (408) ; (009) p a r a l l e l c l a u s e (413); (010) c o n d i t i o n a l c l a u s e (414); (011) case c l a u s e (424). (012) c l o s e d c l a u s e (007,012,013,014,015): open symbol, u n i t (016), c l o s e symbol; (013) open symbol, l a b e l sequence (004) , u n i t (016), c l o s e symbol; (014) s e r i a l open symbol, s e r i a l c l a u s e (355), c l o s e symbol; (015) begin symbol, s e r i a l c l a u s e ( 3 5 5 ) , end symbol. (016) u n i t (012,013,016,017,018,019,020,021, 175, 176, 177, 178, 178, 204^204,204,205,205,206,206,207,208,209,209,210,211, 213,263,263,264,265,266,309,349,350,351,353,381,385, 394,401,402,41 1,411,412,419,436): a s s i g n a t i o n ( 0 2 1 ) ; (017) t e r t i a r y ( 0 2 2 ) ; 93 (018) r o u t i n e t e x t (309); (019) i d e n t i t y r e l a t i o n (314) ; (020) loop (316). (021) a s s i g n a t i o n ( 0 1 6 , 0 2 1 ) : t e r t i a r y (022), becomes symbol, u n i t (016). (022) t e r t i a r y (;017,021,022,023,024,025,026,027,028,029,030,031, 032,314,314,315,315): secondary(033); (023) p r i o r i t y 1 formula (270) ; (024) p r i o r i t y 2 formula (289) ; (025) p r i o r i t y 3 formula (290) ;i (026) p r i o r i t y 4 formula(291); (027) p r i o r i t y 5 formula (292); (028) p r i o r i t y 6 formula(293); (029) p r i o r i t y 7 formula(294); (030) p r i o r i t y 8 formula (295) ;• (031) p r i o r i t y 9 formula (296) ; (032) o p e r a t o r i n d i c a t i o n (297) , monadic operand (307) . (033) secondary(022,033,034,035,035,308): primary (036) ; (034) generator (268) ; (035) t a g symbol, of symbol, secondary(033). (036) primary(033,036,037,038,039,040,041,042,043,043,044,044, 045^045,046,046,047): encl o s e d c l a u s e (007); (037) go to symbol, tag symbol; (038) tag symbol; (039) s k i p symbol; (040) n i l symbol; (041) d e n o t a t i o n (048) ; 94 (042) format t e x t ( 0 5 9 ) ; (043) primary(036), open symbol, t r i m s c r i p t l i s t i n g (199), c l o s e symbol; (044) primary (036) , sub symbol, t r i m s c r i p t l i s t i n g (199) , bus symbol; (045) primary(036), sub symbol, bus symbol; (046) primary (036), open symbol, c l o s e symbol; (047) c a s t (214). (048) ' denotation(041,048,049,050,051,052,053,054,055,056): s h o r t i n t e g r a l d e n o t a t i o n symbol; (049) i n t e g r a l d e n o t a t i o n symbol; (050) long r e a l d e n o t a t i o n symbol; (051) r e a l d e n o t a t i o n symbol; (052) boolean denotation(057); (053) s t r i n g d e n o t a t i o n symbol; (054) empty symbol; (055) shorft b i t s d e n o t a t i o n symbol; (056) b i t s d e n o t a t i o n symbol. (057) boolean denotation(052,057,058): t r u e symbol; (058) f a l s e symbol. (059) format t e x t (042,059,060): format begin symbol, c o l l e c t i o n l i s t (061), format end symbol; (060) format begin symbol, format end symbol. (061) c o l l e c t i o n list(059,061,062,062,063,063,197): c o l l e c t i o n ( 0 6 4 ) ; (062) c o l l e c t i o n l i s t (061), comma symbol; (063) c o l l e c t i o n l i s t (061), comma symbol, c o l l e c t i o n (064). (064) c o l l e c t i o n ; (061,063,064,065,066,067,068,06 9,070,071,072) : p i c t u r e (073) ; / 95 (065) i n s e r t i o n ( 1 7 9 ) , r e p l i c a t i o n ( 1 9 5 ) , c o l l a t i o n ( 1 9 7 ) , i n s e r t i o n ( 1 7 9 ) ; (066) i n s e r t i o n (179), r e p l i c a t i o n ( 1 9 5 ) , c o l l a t i o n ( 1 9 7 ) ; (067) i n s e r t i o n ( 1 7 9 ) , c o l l a t i o n (197) , i n s e r t i o n ( 1 7 9 ) ; (068) i n s e r t i o n (179) , c o l l a t i o n (197) ; (069) r e p l i c a t i o n ( 1 9 5 ) , c o l l a t i o n (197), i n s e r t i o n (179) ; (070) r e p l i c a t i o n ( 1 9 5 ) , c o l l a t i o n ( 1 9 7 ) ; (071) c o l l a t i o n ( 1 9 7 ) , i n s e r t i o n (179) ; (072) c o l l a t i o n ( 1 9 7 ) . (073) p i c t u r e (064,073,074,075,076,077,078,079,0 80,081,082,083): i n t e g r a l p a t t e r n ( 0 8 4 ) ; (074) i n s e r t i o n (179); (075) b i t s p a t t e r n (125) ; (076) r e a l p a t t e r n (128) ; (077) boolean p a t t e r n ( 1 4 6 ) ; (078) complex p a t t e r n (153); (079) s t r i n g p a t t e r n (156) (080) format p a t t e r n (170) ; (081) format p a t t e r n ( 1 7 0 ) , i n s e r t i o n ( 1 7 9 ) ; (082) l o o s e g e n e r a l frame(171); (083) l o o s e g e n e r a l frame(171), i n s e r t i o n ( 1 7 9 ) . (084) i n t e g r a l pattern(073,084,085,086,087): s i g n mould (088), i n t e g r a l mould (098); (085) i n t e g r a l mould (098) ; (086) i n t e g r a l c h o i c e p a t t e r n (112), i n s e r t i o n ( 1 7 9 ) ; (087) i n t e g r a l c h o i c e p a t t e r n (112). (088) s i g n mould (084,088,089,128,138,140,141): l o o s e r e p l i c a t a b l e zero frame(090), s i g n frame (094); (089) l o o s e s i g n frame (096). 96 (090) loose r e p l i c a t a b l e zero frame (088,090,091): i n s e r t i o n ( 1 7 9 ) , r e p l i c a t a b l e zero frame(092); (091) r e p l i c a t a b l e zero frame (092). (092) r e p l i c a t a b l e zero frame (090,091,092,093): r e p l i c a t i o n ( 1 9 5 ) , l e t t e r z symbol; (093) l e t t e r z symbol. (094) s i g n frame (088,094,095,096,097): p l u s symbol; (095) minus symbol, (096) loose s i g n frame (089,096,'097) : i n s e r t i o n (179) , s i g n frame (094); (097) s i g n frame(094). (098) i n t e g r a l mould(084,085,098,099,125,131,131,132,133,138, 139, 141, 143): simple i n t e g r a l mould (100), i n s e r t i o n (179) ; (099) simple i n t e g r a l mould (100). (100) simple i n t e g r a l mould (098,099,100,101, 101): l o o s e r e p l i c a t a b l e s u p p r e s s i b l e d i g i t frame (102); (101) simple i n t e g r a l mould (100), loose r e p l i c a t a b l e s u p p r e s s i b l e d i g i t frame (102). (102) lo o s e r e p l i c a t a b l e s u p p r e s s i b l e d i g i t frame (100, 101, 102, 103) : i n s e r t i o n ( 1 7 9 ) , r e p l i c a t a b l e s u p p r e s s i b l e d i g i t frame (104); (103) r e p l i c a t a b l e s u p p r e s s i b l e d i g i t . f r a m e (104). (104) r e p l i c a t a b l e s u p p r e s s i b l e d i g i t frame(102,103,104,105): r e p l i c a t i o n (195) ,'" s u p p r e s s i b l e d i g i t frame (106); (105) s u p p r e s s i b l e d i g i t frame (106). (106) s u p p r e s s i b l e d i g i t frame (104,105, 106,107): l e t t e r s symbol, d i g i t frame (108); (107) d i g i t frame (108). (108) d i g i t fcane (106,107,108,109, 110,111): l e t t e r z symbol; (109) l e t t e r u symbol; 97 (110) l e t t e r v symbol; (111) l e t t e r d symbol. (112) i n t e g r a l c h o i c e pattern(086,087,112,113): i n s e r t i o n ( 1 7 9 ) , l e t t e r c symbol, open symbol, l i t e r a l l i s t (114), c l o s e symbol; (113) l e t t e r c symbol, open symbol, l i t e r a l l i s t (114), c l o s e symbol. (114) l i t e r a l list(112,113,114,115, 115) : l i t e r a l (116) ; (115) l i t e r a l l i s t (114), comma symbol, l i t e r a l (116). (116) l i t e r a l (114,115,116,117,118,119, 152, 152, 179, 180, 184,186): r e p l i c a t e l i t e r a l (120), s t r i n g d e n o t a t i o n symbol, r e p l i c a t e d l i t e r a l sequence (122); (117) r e p l i c a t e l i t e r a l ( 1 2 0 ) , s t r i n g d e n o t a t i o n symbol; (118) s t r i n g d e n o t a t i o n symbol, r e p l i c a t e d l i t e r a l sequence(122); (119) s t r i n g d e n o t a t i o n symbol. (120) r e p l i c a t e literal(116,117,120,121,124): r e p l i c a t e l i t e r a l symbol, enclosed c l a u s e ( 0 0 7 ) ; (121) i n t e g r a l d e n o t a t i o n symbol. (122) r e p l i c a t e d l i t e r a l sequence(116,118,122,123,123): r e p l i c a t e d l i t e r a l ( 1 2 4 ) ; (123) r e p l i c a t e d l i t e r a l sequence (122), r e p l i c a t e d l i t e r a l (124). (124) r e p l i c a t e d l i t e r a l ( 1 2 2 , 1 2 3 , 1 2 4 ) : r e p l i c a t e l i t e r a l ( 1 2 0 ) , s t r i n g d e n o t a t i o n symbol. (125) b i t s pattern(075,125): r a d i x mould (126), i n t e g r a l mould(098). (126) r a d i x mould(125,126,127) : i n s e r t i o n ( 1 7 9 ) , i n t e g r a l d e n o t a t i o n symbol, l e t t e r r symbol; (127) i n t e g r a l d e n o t a t i o n symbol, l e t t e r r symbol. (128) r e a l p a t t e r n (076, 128, 129, 130, 153, 153): 98 s i g n mould(088), r e a l mould (131); (129) r e a l mould (131); (130) f l o a t i n g p o i n t mould(138). (131) r e a l mould(128,129,131,132,133,140,142): i n t e g r a l mould (098), s u p p r e s s i b l e p o i n t frame (134), i n t e g r a l mould (098) ; (132) i n t e g r a l mould (098), s u p p r e s s i b l e p o i n t frame(134); (133) l o o s e s u p p r e s s i b l e p o i n t frame (136), i n t e g r a l mould (098). (134) s u p p r e s s i b l e p o i n t frame (131, 132, 134, 135, 136, 137): l e t t e r s symbol, p o i n t symbol; (135) p o i n t symbol. (136) l o o s e s u p p r e s s i b l e p o i n t frame (133, 136, 137): i n s e r t i o n ( 1 7 9 ) , s u p p r e s s i b l e p o i n t frame (134); (137) s u p p r e s s i b l e p o i n t frame (134). (138) f l o a t i n g p o i n t mould (130,138, 139): stagnant mould (140), s u p p r e s s i b l e exponent frame (144), s i g n mould(088), i n t e g r a l mould (098) ; (139) stagnant mould (140), s u p p r e s s i b l e exponent frame (144), i n t e g r a l mould (098). (140) stagnant mould (138, 139,140, 141, 142, 143): s i g n mould (088), r e a l mould (131); (141) s i g n mould(088), i n t e g r a l mould (098) ; (142) r e a l mould (131) ; (143) i n t e g r a l mould (098). (144) s u p p r e s s i b l e exponent frame (138, 139, 144,145): l e t t e r s symbol, l e t t e r e symbol; (145) l e t t e r e symbol. (146) boolean p a t t e r n (077,146,147): simple boolean p a t t e r n (148), i n s e r t i o n (179); (147) simple boolean p a t t e r n (148). (148) simple boolean p a t t e r n (146, 147,148, 149, 150, 151) : 99 i n s e r t i o n ( 1 7 9 ) , l e t t e r b symbol, boolean c h o i c e mould(152); 149) i n s e r t i o n ( 1 7 9 ) , l e t t e r b symbol; 150) l e t t e r b symbol, boolean c h o i c e mould (152); [ 151) l e t t e r b symbol. 152) boolean c h o i c e mould(148,150,152): open symbol, l i t e r a l ( 1 1 6 ) , comma symbol, l i t e r a l ( 1 1 6 ) , c l o s e symbol. !153) complex p a t t e r n (078, 153) : r e a l p a t t e r n (128), s u p p r e s s i b l e complex frame(154), r e a l p a t t e r n (128) . 154) s u p p r e s s i b l e complex frame (153,154, 155): l e t t e r s symbol, l e t t e r i symbol; [155) l e t t e r i symbol. 156) s t r i n g p a t t e r n (079,156, 157, 158, 159): simple s t r i n g p a t t e r n (160), i n s e r t i o n (179) ; ;157) simple s t r i n g p a t t e r n ( 1 6 0 ) ; 158) l o o s e s t r i n g frame (168), i n s e r t i o n ( 1 7 9 ) ; ;159) l o o s e s t r i n g frame (168). 160) simple s t r i n g pattern(156,157,160,160;161): simple s t r i n g p a t t e r n ( 1 6 0 ) , l o o s e r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (162); 161: l o o s e r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (162), 162) lo o s e r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (160,161, 162, 163): i n s e r t i o n (179) , r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (164); ;163) r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (164). 164) r e p l i c a t a b l e s u p p r e s s i b l e c h a r a c t e r frame (162, 163, 164, 165) : r e p l i c a t i o n ( 1 9 5 ) , s u p p r e s s i b l e c h a r a c t e r frame(166); [165) s u p p r e s s i b l e c h a r a c t e r frame (166). 166) s u p p r e s s i b l e c h a r a c t e r frame (164,165,166, 167): l e t t e r s symbol, l e t t e r a symbol; 100 (167) l e t t e r a symbol. (168) lo o s e s t r i n g frame (158, 159, 168, 169): i n s e r t i o n (179), l e t t e r t symbol; (169) l e t t e r t symbol. (170) format pattern(080,081, 170) : l e t t e r f symbol, enclosed c l a u s e ( 0 0 7 ) . (171) l o o s e g e n e r a l frame (082,083,171,172): i n s e r t i o n ( 1 7 9 ) , g e n e r a l frame (173); (172) g e n e r a l frame (173). (173) g e n e r a l frame(171,172,173,174): l e t t e r g symbol; (174) l e t t e r g symbol, width s p e c i f i c a t i o n b r i e f pack (175). (175) width s p e c i f i c a t i o n b r i e f pack(174,175,176): open symbol, u n i t ( 0 1 6 ) , c l o s e symbol; (176) open symbol, u n i t (016), a f t e r s p e c i f i c a t i o n (177), c l o s e symbol. (177) a f t e r s p e c i f i c a t i o n ( 1 7 6 , 1 7 7 , 1 7 8 ) : comma symbol, u n i t (016); (178) comma symbol, u n i t (016), comma symbol, u n i t ( 0 1 6 ) . (179) insertion(065,065,066,067,067,068,069,071,074,081,083, 086,090,096,098, 102,:112, 126, 136, 146, 148, 149, 156, 158, 162,168,171,179,180,181): l i t e r a l (116), i n s e r t sequence(182); (180) l i t e r a l ( 1 1 6 ) ; (181) i n s e r t sequence(182). (182) i n s e r t sequence(179,181,182,183,183): i n s e r t (184) ; (183) i n s e r t sequence (182) , i n s e r t (184). (184) i n s e r t (182, 183 , 184, 185, 186<, 187) : r e p l i c a t e alignment(188), alignment(190), l i t e r a l (116) ; (185) r e p l i c a t e alignment (188), alignment(190) ; (186) alignment(190), l i t e r a l (116); 101 {187) alignment (190) . (188) r e p l i c a t e alignment (184,185,188, 189): r e p l i c a t e alignment symbol, enclosed c l a u s e (007); (189) i n t e g r a l d e n o t a t i o n symbol. (190) alignment (184,185,186,187, 190,191, 192, 193, 194) : l e t t e r k symbol; (191) l e t t e r x symbol; (192) l e t t e r y symbol; (193) l e t t e r 1 symbol; (194) l e t t e r p symbol. (195) replication(065,066,069,070,092,104,164,195,196) : l e t t e r n symbol, enclosed c l a u s e (007); (196) i n t e g r a l d e n o t a t i o n symbol. (197) c o l l a t i o n (065, 066,067, 068, 069, 070, 071,072, 197,i198) : open symbol, c o l l e c t i o n l i s t ( 0 6 1 ) , c l o s e symbol; (198) open symbol, c l o s e symbol. (199) t r i m s c r i p t listing(043,044,199,200,200,201,201,202,203) : t r i m s c r i p t ( 2 0 4 ) ; (200) t r i m s c r i p t l i s t i n g ( 1 9 9 ) , comma symbol; (201) t r i m s c r i p t l i s t i n g ( 1 9 9 ) , comma symbol, t r i m s c r i p t ( 2 0 4 ) ; (202) comma symbol; (203) comma symbol, t r i m s c r i p t ( 2 0 4 ) . (204) trimscript(199,201,203,204,205,206,207,208,209,210,211, 212,213) : u n i t (016), c o l o n symbol, u n i t ( 0 1 6 ) , a t symbol, u n i t (016) ; (205) u n i t (016), c o l o n symbol, u n i t (016); (206) u n i t (016), c o l o n symbol, at symbol, u n i t ( 0 1 6 ) ; (207) u n i t ( 0 1 6 ) , c o l o n symbol; (208) u n i t (016) ; 102 (209) colon symbol, u n i t ( 0 1 6 ) , at symbol, unit{016); (210) c o l o n symbol, u n i t (016) ;? (211) c o l o n symbol, at symbol, u n i t ( 0 1 6 ) ; (212) colon symbol; (213) a t symbol, u n i t ( 0 1 6 ) . (214) cast(047,214) : d e c l a r e r (215), enclosed c l a u s e (007). (215) declarer(214,215,215,216,217,218,219,220,221,221,222,223, 225,226,248,249,268,269,310, 311,312, 313,370,371, 372, 373^375,437,438): r e f e r e n c e to symbol, d e c l a r e r ( 2 1 5 ) ; (216) procedure symbol; procedure plan (222); (217) union of symbol, open symbol, d e c l a r e r l i s t (225), c l o s e symbol; (218) p r i m i t i v e d e c l a r a t o r ( 2 2 7 ) ; (219) mode i n d i c a t i o n symbol; (220) s t r u c t u r e symbol, open symbol, f i e l d l i s t ( 2 4 8 ) , c l o s e symbol; (221) fl o w e r (252), d e c l a r e r ( 2 1 5 ) . (222) procedure plan(216,222,223,365) : d e c l a r e r l i s t pack (224), d e c l a r e r ( 2 1 5 ) ; (223) d e c l a r e r (215) . (224) d e c l a r e r l i s t pack (222,224): d e c l a r e r open symbol, d e c l a r e r l i s t ( 2 2 5 ) , c l o s e symbol. (225) d e c l a r e r list(217,224,225,226,226): d e c l a r e r ( 2 1 5 ) ; (226) d e c l a r e r l i s t ( 2 2 5 ) , comma symbol, d e c l a r e r (215). (227) p r i m i t i v e declarator(218,227,228,229,230, 231,232,233, 234, 235;236,237,238,239/240/241,242,243,244,245,246,247): vo i d symbol; (228) s h o r t i n t e g r a l symbol; (229) i n t e g r a l symbol; (230) r e a l symbol; (231) long r e a l symbol;" (232) s h o r t bytes symbol; (233) bytes symbol; (234) long bytes symbol; (235) long long bytes symbol; (236) long long long bytes symbol; (237) long long long long bytes symbol; (238) b i t s symbol; (239) s h o r t b i t s symbol; (240) compl symbol; (241) long compl symbol; (242) s t r i n g symbol; (243) sema symbol; (244) f i l e symbol; (245) boolean symbol; (246) c h a r a c t e r symbol;: (247) format symbol. (248) f i e l d list(220,248,249,249) : d e c l a r e r (215) , tag l i s t (250); (249) f i e l d l i s t ( 2 4 8 ) , comma symbol, d e c l a r e r ( 2 1 5 ) , tag l i s t ( 2 5 0 ) . (250) tag l i s t (;248,249, 250, 251 ,251,312, 313) : tag symbol; (251) tag l i s t ( 2 5 0 ) , comma symbol, tag symbol. (252) flower(221,252,253,254): f l e x i b l e symbol, rower (255); (253) e i t h e r symbol, rower (255); (254) rower (255). 104 (255) rower(252,253,254,255,256,257,258): d e c l a r e r open symbol, p a i r l i s t ( 2 5 9 ) , c l o s e symbol; (256) d e c l a r e r open symbol, c l o s e symbol; (257) sub symbol, p a i r l i s t (259), bus symbol; (258) sub symbol, bus symbol. (259) p a i r list(255,257,259;260,260,261,261,262): comma symbol; (260) p a i r l i s t (259), comma symbol; (261) p a i r l i s t ( 2 5 9 ) , comma symbol, p a i r (263) ; (262) p a i r (263). (263) p a i r (261,262,263,264,265,266,267): unife(016), c o l o n symbol, u n i t ( 0 1 6 ) ; (264) u n i t ( 0 1 6 ) , c o l o n symbol; (265) u n i t (016); (266) c o l o n symbol, u n i t (016); (267) c o l o n symbol. (268) generator(034,268,269): l o c a l symbol, d e c l a r e r (215); (269) heap symbol, d e c l a r e r (215) . (270) p r i o r i t y 1 formula(023,270,271): p r i o r i t y 1 operand(271), p r i o r i t y 1 operator symbol, p r i o r i t y 2 operand(273). (271) p r i o r i t y 1 operand (270,271,272): p r i o r i t y 1 formula(270); (272) p r i o r i t y 2 operand (273). (273) p r i o r i t y 2 operand(270,272,273,274,289): p r i o r i t y 2 formula(289); (274) p r i o r i t y 3 operand(275) . (275) p r i o r i t y 3 operand(274,275,276,289,290) : p r i o r i t y 3 formula (290) ;! (276) p r i o r i t y 4 operand(277). 105 (277) p r i o r i t y 4 operand(276,277,278,290, 291) : p r i o r i t y 4 formula (291) ; (278) p r i o r i t y 5 operand(279). (279) p r i o r i t y 5 operand(278,279,280,291,292): p r i o r i t y 5 formula (292) ; (280) p r i o r i t y 6 operand (281) . (281) p r i o r i t y 6 operand(280,281,282,292, 293): p r i o r i t y 6 formula (293); (282) p r i o r i t y 7 operand(283). (283) p r i o r i t y 7 operand(282,283,284,293,294): p r i o r i t y 7 formula(294); (284) p r i o r i t y 8 operand(285) . (285) p r i o r i t y 8 operand(284,285,286,294,295): p r i o r i t y 8 formula(295); (286) p r i o r i t y 9 operand(287). (287) p r i o r i t y 9 operand(286,287,288,295,296): p r i o r i t y 9 formula (296); (288) monadic operand(307). (289) p r i o r i t y 2 formula(024,273,289): p r i o r i t y 2 operand(273), p r i o r i t y 2 operator symbol, p r i o r i t y 3 operand(275) (290) p r i o r i t y 3 formula(025,275,290): p r i o r i t y 3 operand(275), p r i o r i t y 3 o p e r a t o r symbol, p r i o r i t y 4 operand(277) (291) p r i o r i t y 4 formula(026,277,291): p r i o r i t y 4 operand(277), p r i o r i t y 4 operator symbol, p r i o r i t y 5 operand(279) (292) p r i o r i t y 5 formula(027,279,292): p r i o r i t y 5 operand(279), p r i o r i t y 5 o p e r a t o r symbol, p r i o r i t y 6 operand (281) (293) p r i o r i t y 6 formula(028,281,293): p r i o r i t y 6 operand(281), p r i o r i t y 6 o p e r a t o r symbol, p r i o r i t y 7 operand (283) (294) p r i o r i t y 7 formula(029,283,294): p r i o r i t y 7 operand(283), p r i o r i t y 7 operator symbol, p r i o r i t y 8 operand (285) 106 (295) p r i o r i t y 8 formula(030,285,295): p r i o r i t y 8 operana(285), p r i o r i t y 8 operator symbol, p r i o r i t y 9 operand(287), (296) p r i o r i t y 9 formula(031,287,296): p r i o r i t y 9 operand (287), p r i o r i t y 9 operator symbol, monadic operand (307) . (297) operator i n d i c a t i o n (032,297,298,299,300,301,302,303,304, 305,306,307,394,397,400): op e r a t o r symbol; (298) p r i o r i t y 1 operator symbol; (299) p r i o r i t y 2 operator symbol; (300) p r i o r i t y 3 operator symbol; (301) p r i o r i t y 4 o p e r a t o r symbol; (302) p r i o r i t y 5 operator symbol; (303) p r i o r i t y 6 operator symbol; (304) p r i o r i t y 7 operator symbol; (305) p r i o r i t y 8 operator symbol; (306) p r i o r i t y 9 operator symbol. (307) monadic operand (032,288,296,307,307,308): ope r a t o r i n d i c a t i o n (297), monadic operand (307); (308) secondary (033) . (309) r o u t i n e text(018,309,388,391,397): parameters plan(310), c o l o n symbol, u n i t ( 0 1 6 ) . (310) parameters plan(309,310,311): d e c l a r e r open symbol, parameters l i s t (312), c l o s e symbol, d e c l a r e r (215); (311) d e c l a r e r (215) . (312) parameters list(310,312,313,313): d e c l a r e r (215) , tag l i s t (250); (313) parameters l i s t ( 3 1 2 ) , comma symbol, d e c l a r e r ( 2 1 5 ) , tag l i s t ( 2 5 0 ) . (314) i d e n t i t y r e l a t i o n ( 0 1 9 , 3 1 4 , 3 1 5 ) : t e r t i a r y ( 0 2 2 ) , i s symbol, t e r t i a r y (022); (315) t e r t i a r y ( 0 2 2 ) , i s not symbol, t e r t i a r y (022). 1 0 7 (316) loop (020,316,317,318,319,320,321, 322,323, 324, 325,326,327, 328,329,330,331,332/333,334,335,336,337,338,339,340, 341,342,343,344,345,346,347): f o r p a r t (348), from p a r t (349), by part (350), t o p a r t (351), while p a r t (352), do c l a u s e (353); (317) f o r p a r t (348), from p a r t (349), by part (350), to p a r t (351), do c l a u s e (353) ; (318) f o r p a r t (348), from p a r t (349), by p a r t ( 3 5 0 ) , while p a r t (352), do c l a u s e (353) ; (319) f o r p a r t (348), from p a r t (349), by p a r t (350), do c l a u s e (353); (320) f o r p a r t (348), from p a r t (349), to part (351), while p a r t (352), do c l a u s e (353); (321) f o r p a r t (348) , from p a r t (349) , to p a r t (351), do c l a u s e (353) ; (322) f o r p a r t (348), from p a r t (349), while p a r t ( 3 5 2 ) , do c l a u s e ( 3 5 3 ) ; (323) f o r p a r t (348), from p a r t (349), do c l a u s e (353) ; (324) f o r p a r t (348), by p a r t (350), t o part (351), while part (352), do c l a u s e (353); (325) f o r p a r t (348), by p a r t (350), to p a r t (351), do c l a u s e (353) ; (326) f o r p a r t ( 3 4 8 ) , by p a r t ( 3 5 0 ) , while p a r t ( 3 5 2 ) , do c l a u s e (353) ; (327) f o r p a r t (348), by p a r t (350), do c l a u s e (353); (328) f o r p a r t ( 3 4 8 ) , to p a r t ( 3 5 1 ) , while p a r t ( 3 5 2 ) , do c l a u s e (353) ; (329) f o r p a r t (348), t o p a r t (351), do c l a u s e (353); (330) f o r p a r t (348), while p a r t (352) , do c l a u s e (353); (331) f o r p a r t ( 3 4 8 ) , do c l a u s e (353) ; (332) from p a r t (349), by p a r t (350), to p a r t (351), while part (352), do c l a u s e (353) ; (333) from p a r t ( 3 4 9 ) , by p a r t (350), to p a r t (351), do c l a u s e (353); (334) from p a r t ( 3 4 9 ) , by p a r t (350) , while p a r t (352), 108 do c l a u s e ( 3 5 3 ) ; (335) from p a r t (349), by p a r t (350) , do c l a u s e (353) ;! (336) from part (349), to p a r t (351), while part (352), do c l a u s e (353) ; (337) from p a r t (349), to p a r t (351), do c l a u s e (353) ;i (338) from, p a r t (349) , while part (352), do c l a u s e (353); (339) from part (349) , do c l a u s e (353); (340) by p a r t (350), to p a r t (351), while part (352), do c l a u s e ( 3 5 3 ) ; (341) by p a r t (350), to p a r t (351), do c l a u s e (353); (342) by p a r t (350), while part (352), do c l a u s e (353); (343) by p a r t (350), do c l a u s e (353); (344) to p a r t (351), while p a r t (352), do c l a u s e (353); (345) to p a r t ( 3 5 1 ) , do c l a u s e (353) ; (346) while part (352), do c l a u s e (353); (347) do c l a u s e (353) . (348) f o r p a r t (316,317,318,319,320,321, 322,323,324,325,326,327, 328,329,330,331,348): f o r symbol, tag symbol. (349) from part(316,317,318,319,320,321,322, 323,332,333,334, 335,336,337,338,339,349): from symbol, u n i t (016). (350) by p a r t (316,317,318,319,324,325,326,327,3 32,333,334,335, 340,341,342,343,350): by symbol, u n i t ( 0 1 6 ) . (351) t o part(316,317,320,321,324,325,328,329,332, 333, 336,337, 340,341,344,345*351): to symbol, u n i t (016). (352) while par* (316,318,320,322,324,326,328,330,332,334,336, 338,340,342*344,346,352): while symbol, s e r i a l c l a u s e (355). (353) do clause(316,317,318,319,320,321,322,323,324,325,326, 327,328,329,330,331,332,333,334,335,336,337,33 8,339, 340,341,342,343,344,345,346,347,353,354): do symbol, u n i t ( 0 1 6 ) ; 109 rep symbol, s e r i a l c l a u s e (355), per symbol. s e r i a l clause(014,015,352,354,355,356,414,416,416,417, 417,418,420,421,421,422,422,423,424,425,428,429,431, 432^439,440,442,443): d e c l a r a t i o n prologue(357), go on symbol, parade(403); parade (403). d e c l a r a t i o n prologue (355,357,358,358): s e r i e s with def (359); d e c l a r a t i o n prologue(357) , go on symbol, s e r i e s with def (359) . s e r i e s with def(357,358,359,360): s i n g l e d e c l a r a t i o n l i s t (361); u n i t s e r i e s ( 4 0 1 ) , go on symbol, s i n g l e d e c l a r a t i o n l i s t (361). s i n g l e d e c l a r a t i o n list(359,360,361,362,362) : s i n g l e d e c l a r a t i o n (363) ;? s i n g l e . d e c l a r a t i o n l i s t (361), comma symbol, s i n g l e d e c l a r a t i o n ( 3 6 3 ) . s i n g l e declaration(361,362,363,364,365,366,367): mode symbol, mode a s s o c i a t i o n l i s t (368); i d e n t i f i e r d e c l a r a t i o n (371); o p e r a t i o n symbol, procedure plan (222), o p e r a t i o n a s s o c i a t i o n l i s t ( 3 9 2 ) ; o p e r a t i o n symbol, o p e r a t i o n a s s o c i a t e d l i s t (395); p r i o r i t y symbol, p r i o r i t y a s s o c i a t i o n l i s t ( 3 9 8 ) . mode a s s o c i a t i o n l i s t (363,368,369,369): mode a s s o c i a t i o n (370) ; mode a s s o c i a t i o n l i s t (368), comma symbol, mode a s s o c i a t i o n (370). mode a s s o c i a t i o n (368,369,370): mode i n d i c a t i o n symbol, equals symbol, d e c l a r e r (215). i d e n t i f i e r declaration(364,371,372,373,374,375,376,377, 378);: d e c l a r e r ( 2 1 5 ) , i d e n t i t y a s s o c i a t i o n l i s t ( 3 7 9 ) ; 110 (372) d e c l a r e r (215) , tagaticm l i s t ( 3 8 2 ) ; (373) heap symbol, d e c l a r e r (215), t a g a t i o n l i s t ( 3 8 2 ) ; (374) heap symbol, procedure symbol, tag i n i t l i s t (386); (375) l o c a l symbol, d e c l a r e r ( 2 1 5 ) , t a g a t i o n l i s t ( 3 8 2 ) ; (376) l o c a l symbol, procedure symbol, tag i n i t l i s t ( 3 8 6 ) ; (377) procedure symbol, i d e n t i t y i n i t l i s t ( 3 8 9 ) ; (378) procedure symbol, tag i n i t l i s t ( 3 8 6 ) . (379) i d e n t i t y a s s o c i a t i o n l i s t (371,379,380,380): i d e n t i t y a s s o c i a t i o n ( 3 8 1 ) ; (380) i d e n t i t y a s s o c i a t i o n l i s t ( 3 7 9 ) , comma symbol; i d e n t i t y a s s o c i a t i o n (381). (381) i d e n t i t y a s s o c i a t i o n (379,380,381): tag symbol, equals symbol, u n i t (016). (382) t a g a t i o n list(372,373,375,382,383,383): t a g a t i o n (384); (383) t a g a t i o n l i s t ( 3 8 2 ) , comma symbol, t a g a t i o n (384). (384) tagation(382,383,384,385): tag symbol; (385) tag symbol, becomes symbol, u n i t (016). (386) tag i n i t l i s t (374,376,378,386,387,387) : tag i n i t ( 3 8 8 ) ; (387) tag i n i t l i s t ( 3 8 6 ) , comma symbol, tag i n i t (388). (388) tag i n i t (;386,387,388) J tag symbol, becomes symbol, r o u t i n e t e x t ( 3 0 9 ) . (389) i d e n t i t y i n i t list(377,389,390,390) : i d e n t i t y i n i t ( 3 9 1 ) ; N (390) i d e n t i t y i n i t l i s t ( 3 8 9 ) , comma symbol, i d e n t i t y i n i t ( 3 9 1 ) . (391) i d e n t i t y i n i t ( 3 8 9 , 3 9 0 * 3 9 1 ) : tag symbol, equals symbol, r o u t i n e t e x t ( 3 0 9 ) . (392) o p e r a t i o n a s s o c i a t i o n list(365,392,393,393) : o p e r a t i o n a s s o c i a t i o n ( 3 9 4 ) ; 111 (393) o p e r a t i o n a s s o c i a t i o n l i s t (392) , comma symbol, o p e r a t i o n a s s o c i a t i o n (394). (394) o p e r a t i o n a s s o c i a t i o n (392,393,394): op e r a t o r i n d i c a t i o n (297), equals symbol, u n i t (016). (395) o p e r a t i o n a s s o c i a t e d l i s t (366,395,396,396): o p e r a t i o n a s s o c i a t e d (397); (396) o p e r a t i o n a s s o c i a t e d l i s t (395), comma symbol, o p e r a t i o n a s s o c i a t e d (397). (397) o p e r a t i o n associated(395,396,397): o p e r a t o r i n d i c a t i o n (297), equals symbol, r o u t i n e t e x t (309). (398) p r i o r i t y a s s o c i a t i o n l i s t (367,398,399,399): p r i o r i t y a s s o c i a t i o n ( 4 0 0 ) ; (399) p r i o r i t y a s s o c i a t i o n l i s t (398), comma symbol, p r i o r i t y a s s o c i a t i o n (400). (400) p r i o r i t y a s s o c i a t i o n (398,399,400): o p e r a t o r i n d i c a t i o n (297), equals symbol, i n t e g r a l d e n o t a t i o n symbol. (401) u n i t series(360,401,402,402,405,406,407) : u n i t ^016) ; (402) u n i t s e r i e s (401), go on symbol, u n i t (016). (403) parade(355,356,403,404,404): t r a i n (405); (404) parade (403), completion symbol, l a b e l (006), t r a i n (405) . (405) train(403,404,405,406,407,407): u n i t s eries(401) ; (406) l a b e l sequence (004) ? u n i t s e r i e s (401); (407) t r a i n (405), go on symbol, l a b e l sequence(004) , u n i t s e r i e s ( 4 0 1 ) . (408) c o l l a t e r a l c l a u s e (008,408,409,410,413): open symbol, c l o s e symbol; (409) open symbol, u n i t l i s t proper (411), c l o s e symbol; (410) begin symbol, u n i t l i s t proper (411), end symbol. 112 (411) u n i t l i s t proper(409,410,411,412,412,428,429,430,4 39,440, 441) : u n i t ( 0 1 6 ) , comma symbol, u n i t ( 0 1 6 ) ; u n i t l i s t proper (411), comma symbol, u n i t ( 0 1 6 ) . p a r a l l e l clause(009,413): p a r a l l e l symbol, c o l l a t e r a l c l a u s e (408). c o n d i t i o n a l c l a u s e (010,414,415): i f symbol, s e r i a l c l a u s e (355), i f c h o i c e (416), f i symbol; open s e r i a l ( 4 1 9 ) , open s e r i a l c h o i c e (421), c l o s e symbol. i f choice(414,416,417,417,418): then symbol, s e r i a l c l a u s e ( 3 5 5 ) , e l s e symbol, s e r i a l c l a u s e (355); then symbol, s e r i a l c l a u s e (355), e l s f symbol, s e r i a l c l a u s e (355), i f c h o i c e (416); then symbol, s e r i a l c l a u s e ( 3 5 5 ) . open serial(415,419,420,426,427): open symbol, u n i t (016); s e r i a l open symbol, s e r i a l c l a u s e (355). open s e r i a l choice(415,421,422,422,423): t h e l s e symbol, s e r i a l c l a u s e ( 3 5 5 ) , t h e l s e symbol, s e r i a l c l a u s e (355); t h e l s e symbol, s e r i a l c l a u s e (355), again symbol, s e r i a l c l a u s e (355), open s e r i a l c h o i c e (421); t h e l s e symbol, s e r i a l c l a u s e (355). case clause(011,424,425,426,427): case symbol, s e r i a l c l a u s e (355), case 1 c h o i c e (428), esac symbol; case symbol, s e r i a l c l a u s e (355), case 2 c h o i c e (431), esac symbol; open s e r i a l (419), open c o l l a t e r a l 1 ch o i c e (439), c l o s e symbol; open s e r i a l ( 4 1 9 ) , open c o l l a t e r a l 2 c h o i c e (442), c l o s e symbol. case 1 choice(424,428,429,429,430): 113 i n symbol, u n i t l i s t proper (411), out symbol, s e r i a l c l a u s e (355) ; (429) i n symbol, u n i t l i s t proper (411), ouse symbol, s e r i a l c l a u s e (355), case 1 c h o i c e (428); (430) i n symbol, u n i t l i s t proper(411). (431) case 2 choice(425,431,432,432,433): i n symbol, a l t e r n a t i v e l i s t ( 4 3 4 ) , out symbol, s e r i a l c l a u s e (355); (432) i n symbol, a l t e r n a t i v e l i s t ( 4 3 4 ) , ouse symbol, s e r i a l c l a u s e (355) , case 2 c h o i c e (431); (433) i n symbol, a l t e r n a t i v e l i s t (434). (434) a l t e r n a t i v e list(431,432,433,434,435,435,442,443,444): a l t e r n a t i v e ( 4 3 6 ) ; (435) a l t e r n a t i v e l i s t (434), comma symbol, a l t e r n a t i v e (436). (436) a l t e r n a t i v e ( 4 3 4 , 4 3 5 , 4 3 6 ) : shape (437), u n i t (016). (437) shape (436,437,438) : open symbol, d e c l a r e r (215), tag symbol, c l o s e symbol, c o l o n symbol; (438) open symbol, d e c l a r e r ( 2 1 5 ) , c l o s e symbol, c o l o n symbol. (439) open c o l l a t e r a l 1 choice(426,439,440,440,441): t h e l s e symbol, u n i t l i s t proper (411), t h e l s e symbol, s e r i a l c l a u s e (355); (440) t h e l s e symbol, u n i t l i s t proper (411), a g a i n symbol, s e r i a l c l a u s e (355), open c o l l a t e r a l 1 c h o i c e (439); (441) t h e l s e symbol, u n i t l i s t proper (411). (442) open c o l l a t e r a l 2 choice(427,442,443,443,444): t h e l s e symbol, a l t e r n a t i v e l i s t ( 4 3 4 ) , t h e l s e symbol, s e r i a l c l a u s e (355); (443) t h e l s e symbol, a l t e r n a t i v e l i s t (434), again symbol, s e r i a l c l a u s e (355), open c o l l a t e r a l 2 c h o i c e ( 4 4 2 ) ; (444) t h e l s e symbol, a l t e r n a t i v e l i s t ( 4 3 4 ) . 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0052015/manifest

Comment

Related Items